IPA: Building AI driven Kubernetes Autoscaler

Shafin Hasnat
7 min readFeb 18, 2025

Before we begin

Generative AI is evolving rapidly, improving its reasoning capabilities and problem-solving skills. Logs generated by applications running in Kubernetes contain valuable insights that can be leveraged for intelligent scaling decisions. Instead of relying solely on predefined thresholds, why not use AI to analyze logs, reason through the data, and provide scaling recommendations?

This is where IPA (Intelligent Pod Autoscaler) comes in — a Kubernetes autoscaler powered by LLM-based AI. Unlike Kubernetes’ native HPA (Horizontal Pod Autoscaler) and VPA (Vertical Pod Autoscaler), which depend on manually set threshold values, IPA takes a more dynamic approach. Defining thresholds can be tricky, especially when there’s little to no understanding of incoming traffic patterns. IPA eliminates this guesswork by externally analyzing metrics and logs, then intelligently suggesting both horizontal and vertical scaling strategies for running pods.

Introduction

The Intelligent Pod Autoscaler (IPA) is a Kubernetes operator designed to transform how applications scale in containerized environments. By combining real-time metrics collection from Prometheus with the analytical power of Large Language Models (LLMs), IPA introduces a smarter, more adaptive approach to scaling.

Traditional autoscalers rely on static, threshold-based rules, often leading to inefficiencies — either over-provisioning resources or failing to scale in time to meet demand. IPA eliminates these limitations by leveraging AI to analyze complex metric patterns, predict workload trends, and intelligently adjust both horizontal and vertical scaling.

Seamlessly integrating into Kubernetes clusters, IPA collects cluster-wide and application-specific metrics, feeding them into an LLM that detects subtle correlations and trends. This enables dynamic scaling recommendations that proactively adjust to workload fluctuations — whether it’s ensuring stability during traffic spikes or optimizing costs during low-demand periods.

For DevOps teams, IPA is a game-changer, delivering greater operational efficiency, enhanced application performance, and smarter resource utilization — keeping your applications always right-sized, at the right time.

Architecture and workflow

The Intelligent Pod Autoscaler (IPA) operates as a Custom Resource Definition (CRD) in Kubernetes, enabling users to define IPA custom resources that specify deployment details for applications running in the cluster. It is possible to scale multiple target deployments with a single IPA custom resource.

IPA architecture

Key Components of IPA Architecture

  • IPA Controller: At the heart of IPA is the controller, which continuously collects application-specific and cluster-wide metrics from Prometheus. These metrics include CPU and memory utilization, network request rates, and overall cluster resource usage. Every minute, the controller triggers a reconciliation process, packaging this data into a POST request and sending it to the IPA Agent for analysis.
  • IPA Agent: The IPA Agent can be deployed as a shared service or as a dedicated instance. This component is responsible for the heavy lifting — leveraging the power of Gemini LLM to analyze real-time metrics and predict optimal scaling decisions. Based on its insights, the agent generates precise scaling recommendations, determining the ideal number of pods and the appropriate resource requests and limits. The IPA Controller interacts with the IPA Agent via its /llmagent endpoint to fetch these recommendations.
  • Feedback Loop: Once the IPA Agent generates scaling recommendations, the controller applies the changes, updating the Kubernetes deployment with the optimal pod count and fine-tuned resource allocations. This automated feedback loop ensures that applications stay efficiently scaled, adapting seamlessly to demand fluctuations.

IPA reconciliation flow-

Flow diagram of reconciliation process

Development

Kubernetes Operator

Developing a Kubernetes operator is a straightforward process using the Kubebuilder framework, which is written in Go. The Intelligent Pod Autoscaler (IPA) uses the API group ipa.shafinhasnat.me and API version v1alpha1.

Default API types for this version are defined in api/v1alpha1/ipa_types.go and serve as the foundation for IPA’s functionality, allowing users to configure deployment details and scaling behavior. Here are the default api types of IPA operator-

type IPASpec struct {
Metadata Metadata `json:"metadata"`
}

type Metadata struct {
PrometheusUri string `json:"prometheusUri"`
LLMAgent string `json:"llmAgent"`
IPAGroup []IPAGroup `json:"ipaGroup"`
}

type IPAGroup struct {
Deployment string `json:"deployment"`
Namespace string `json:"namespace"`
Ingress string `json:"ingress,omitempty"`
}

The Reconcile method in internal/controller/ipa_controller.go is responsible for executing the previously mentioned reconciliation process of the IPA operator. This method continuously monitors the cluster, collects metrics, and communicates with the IPA Agent to determine optimal scaling decisions.

To perform these tasks, Reconcile relies on functions defined in internal/agent/agent.go, which sends API calls to in-cluster prometheus service to collect metrics and with some predefined PromQL. These queries include collecting defined deployment replica specs, memory and CPU usage rate, node available memory, defined ingress incoming traffic rate. Events of the defined deployment is also collected in this phase. These results then aggregated as string and sent to IPA Agent in /llmagent path as POST request.

To run the IPA operator in a development environment, use the following command. Running this for the first time will install Go dependencies, deploy the CRD in the local Kubernetes cluster, and start the custom resource and controller. I used Minikube to provision cluster in development phase.

make install run

Here are some useful commands-

# Build and push controller
make docker-build docker-push IMG=shafinhasnat/ipa:<version>
# Deploy and undeploy controller
make deploy IMG=shafinhasnat/ipa:<version>
make undeploy
# Build CRD installer
make build-installer IMG=shafinhasnat/ipa:<version>

IPA Agent

IPA agent is a key component of the Intelligent Pod Autoscaler. The IPA controller relies on the IPA agent to analyze metrics and generate scaling recommendations. It is a flask application that interacts with the GEMINI API to provide these recommendations. IPA internally makes a POST request with metrics strings in the body to the IPA agent. To function properly, the base URL of the IPA agent must be set in spec.metadata.llmAgent within the IPA custom resource manifest. This IPA agent logs its output to the llm.log file.

The IPA agent image shafinhasnat/ipaagent is available on Docker Hub and can be used to deploy a self-hosted instance.

docker run shafinhasnat/ipaagent -e GEMINI_API_KEY=<GEMINI_API_KEY> -p 80:5000 -d

A shared IPA agent is running in with base URL — https://ipaagent.shafinhasnat.me . Please refer to IPA agent repo.

API documentation: Environment variables to run IPA agent locally-

  • GEMINI_API_KEY (required)
  • DUMP_DATASET (optional)
[GET] / — IPA agent log dashboard
[POST] /askllm — Run LLM metrics analysis with Gemini API. Body: {“metrics”: <str>}

Usage

IPA requires a target deployment to autoscale, so applications must be deployed as a Deployment in a namespace. It is recommended to use a namespace other than default.

Another dependency for IPA is Prometheus. The IPA controller executes queries via Prometheus API calls, so the Prometheus service name must be specified in spec.metadata.prometheusUri within the IPA custom resource manifest.

If the deployment is a web application, IPA requires NGINX Ingress Controller metrics to collect HTTP request counts. The Ingress resource name should be set in spec.metadata.ingress in the manifest.

To set up a test application, deploy a Deployment named testapp in the ipatest namespace, expose it using a ClusterIP service, and connect the service with an Ingress resource named testappingress. Here are kubectl commands to create the test application eco-system.

kubectl create namespace ipaapp
kubectl create deployment testapp --image=shafinhasnat/cpuload --namespace=ipaapp --replicas=1 --port=8080
kubectl expose deployment cpuload --name=testappsvc --namespace=ipaapp --port=8001 --target-port=8080 --type=ClusterIP
kubectl create ingress testappingress --namespace=ipaapp --class=nginx --rule="cpuload.shafinhasnat.me/*=cpuload-service:8001"

Now, the application is accessible from cpuload.shafinhasnat.me host.

Setting up IPA requires creating IPA custom resource definition in the cluster first. To set up the CRD-

kubectl apply -f https://raw.githubusercontent.com/shafinhasnat/ipa/refs/heads/main/dist/install.yaml

At this point, an up and running IPA agent is required along with the target application to scale and prometheus. In this case, we are using the shared IPA agent mentioned in the above section. IPA custom resource manifest requires to configure with the correct value to function properly. Below is the IPA custom resource manifest for test application.

apiVersion: ipa.shafinhasnat.me/v1alpha1
kind: IPA
metadata:
name: ipa
spec:
metadata:
prometheusUri: http://prometheus-server.default.svc
llmAgent: https://ipaagent.shafinhasnat.me
ipaGroup:
- deployment: testapp
namespace: ipaapp
ingress: testappingress

It is possible to specify multiple deployments under sepc.metadata.ipaGroup path. the manifest with kubectl apply -f command, and IPA will start collecting metrics from the very moment and start action.

Observation

Reconciliation period for IPA is 1 minute. Every minute IPA controller collects metrics and make REST call to the specified IPA agent. The real time logs can be seen form IPA agent dashboard on / path. Based on the response from the IPA agent, the number of pods in the launched testapp deployment will scale up and down and the resource request and limit will go up and down based on the load on the application.

A script was ran to generate load to the test application for 5 minutes. Based on the load, here is the observed replica and resource scaling summary for 10 minute span.

The below scaling recommendation is given by Gemini gemini-1.5-flash . The result may vary based on the LLM model being used.

Conclusion

IPA is a hobby project aimed at leveraging AI in the DevOps ecosystem. The results can be improved by fine-tuning models with custom datasets. A similar approach can also be applied to cluster scaling.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response