IPA: Building AI driven Kubernetes Autoscaler

Before we begin
Generative AI is evolving rapidly, improving its reasoning capabilities and problem-solving skills. Logs generated by applications running in Kubernetes contain valuable insights that can be leveraged for intelligent scaling decisions. Instead of relying solely on predefined thresholds, why not use AI to analyze logs, reason through the data, and provide scaling recommendations?
This is where IPA (Intelligent Pod Autoscaler) comes in — a Kubernetes autoscaler powered by LLM-based AI. Unlike Kubernetes’ native HPA (Horizontal Pod Autoscaler) and VPA (Vertical Pod Autoscaler), which depend on manually set threshold values, IPA takes a more dynamic approach. Defining thresholds can be tricky, especially when there’s little to no understanding of incoming traffic patterns. IPA eliminates this guesswork by externally analyzing metrics and logs, then intelligently suggesting both horizontal and vertical scaling strategies for running pods.
Introduction
The Intelligent Pod Autoscaler (IPA) is a Kubernetes operator designed to transform how applications scale in containerized environments. By combining real-time metrics collection from Prometheus with the analytical power of Large Language Models (LLMs), IPA introduces a smarter, more adaptive approach to scaling.
Traditional autoscalers rely on static, threshold-based rules, often leading to inefficiencies — either over-provisioning resources or failing to scale in time to meet demand. IPA eliminates these limitations by leveraging AI to analyze complex metric patterns, predict workload trends, and intelligently adjust both horizontal and vertical scaling.
Seamlessly integrating into Kubernetes clusters, IPA collects cluster-wide and application-specific metrics, feeding them into an LLM that detects subtle correlations and trends. This enables dynamic scaling recommendations that proactively adjust to workload fluctuations — whether it’s ensuring stability during traffic spikes or optimizing costs during low-demand periods.
For DevOps teams, IPA is a game-changer, delivering greater operational efficiency, enhanced application performance, and smarter resource utilization — keeping your applications always right-sized, at the right time.
Architecture and workflow
The Intelligent Pod Autoscaler (IPA) operates as a Custom Resource Definition (CRD) in Kubernetes, enabling users to define IPA custom resources that specify deployment details for applications running in the cluster. It is possible to scale multiple target deployments with a single IPA custom resource.

Key Components of IPA Architecture
- IPA Controller: At the heart of IPA is the controller, which continuously collects application-specific and cluster-wide metrics from Prometheus. These metrics include CPU and memory utilization, network request rates, and overall cluster resource usage. Every minute, the controller triggers a reconciliation process, packaging this data into a
POST
request and sending it to the IPA Agent for analysis. - IPA Agent: The IPA Agent can be deployed as a shared service or as a dedicated instance. This component is responsible for the heavy lifting — leveraging the power of Gemini LLM to analyze real-time metrics and predict optimal scaling decisions. Based on its insights, the agent generates precise scaling recommendations, determining the ideal number of pods and the appropriate resource requests and limits. The IPA Controller interacts with the IPA Agent via its
/llmagent
endpoint to fetch these recommendations. - Feedback Loop: Once the IPA Agent generates scaling recommendations, the controller applies the changes, updating the Kubernetes deployment with the optimal pod count and fine-tuned resource allocations. This automated feedback loop ensures that applications stay efficiently scaled, adapting seamlessly to demand fluctuations.
IPA reconciliation flow-

Development
Kubernetes Operator
Developing a Kubernetes operator is a straightforward process using the Kubebuilder framework, which is written in Go. The Intelligent Pod Autoscaler (IPA) uses the API group ipa.shafinhasnat.me
and API version v1alpha1
.
Default API types for this version are defined in api/v1alpha1/ipa_types.go
and serve as the foundation for IPA’s functionality, allowing users to configure deployment details and scaling behavior. Here are the default api types of IPA operator-
type IPASpec struct {
Metadata Metadata `json:"metadata"`
}
type Metadata struct {
PrometheusUri string `json:"prometheusUri"`
LLMAgent string `json:"llmAgent"`
IPAGroup []IPAGroup `json:"ipaGroup"`
}
type IPAGroup struct {
Deployment string `json:"deployment"`
Namespace string `json:"namespace"`
Ingress string `json:"ingress,omitempty"`
}
The Reconcile
method in internal/controller/ipa_controller.go
is responsible for executing the previously mentioned reconciliation process of the IPA operator. This method continuously monitors the cluster, collects metrics, and communicates with the IPA Agent to determine optimal scaling decisions.
To perform these tasks, Reconcile
relies on functions defined in internal/agent/agent.go
, which sends API calls to in-cluster prometheus service
to collect metrics and with some predefined PromQL
. These queries include collecting defined deployment replica specs, memory and CPU usage rate, node available memory, defined ingress incoming traffic rate. Events of the defined deployment is also collected in this phase. These results then aggregated as string and sent to IPA Agent
in /llmagent
path as POST
request.
To run the IPA operator in a development environment, use the following command. Running this for the first time will install Go dependencies, deploy the CRD in the local Kubernetes cluster, and start the custom resource and controller. I used Minikube to provision cluster in development phase.
make install run
Here are some useful commands-
# Build and push controller
make docker-build docker-push IMG=shafinhasnat/ipa:<version>
# Deploy and undeploy controller
make deploy IMG=shafinhasnat/ipa:<version>
make undeploy
# Build CRD installer
make build-installer IMG=shafinhasnat/ipa:<version>
IPA Agent
IPA agent is a key component of the Intelligent Pod Autoscaler. The IPA controller relies on the IPA agent to analyze metrics and generate scaling recommendations. It is a flask application that interacts with the GEMINI API
to provide these recommendations. IPA internally makes a POST
request with metrics strings in the body to the IPA agent. To function properly, the base URL of the IPA agent must be set in spec.metadata.llmAgent
within the IPA custom resource manifest. This IPA agent logs its output to the llm.log
file.
The IPA agent image shafinhasnat/ipaagent
is available on Docker Hub and can be used to deploy a self-hosted instance.
docker run shafinhasnat/ipaagent -e GEMINI_API_KEY=<GEMINI_API_KEY> -p 80:5000 -d
A shared IPA agent is running in with base URL — https://ipaagent.shafinhasnat.me . Please refer to IPA agent repo.
API documentation: Environment variables to run IPA agent locally-
GEMINI_API_KEY
(required)DUMP_DATASET
(optional)
[GET] / — IPA agent log dashboard
[POST] /askllm — Run LLM metrics analysis with Gemini API. Body: {“metrics”: <str>}
Usage
IPA requires a target deployment to autoscale, so applications must be deployed as a Deployment in a namespace. It is recommended to use a namespace other than default.
Another dependency for IPA is Prometheus. The IPA controller executes queries via Prometheus API calls, so the Prometheus service name must be specified in spec.metadata.prometheusUri
within the IPA custom resource manifest.
If the deployment is a web application, IPA requires NGINX Ingress Controller
metrics to collect HTTP request counts. The Ingress resource name should be set in spec.metadata.ingress
in the manifest.
To set up a test application, deploy a Deployment
named testapp
in the ipatest
namespace, expose it using a ClusterIP
service, and connect the service with an Ingress resource named testappingress
. Here are kubectl
commands to create the test application eco-system.
kubectl create namespace ipaapp
kubectl create deployment testapp --image=shafinhasnat/cpuload --namespace=ipaapp --replicas=1 --port=8080
kubectl expose deployment cpuload --name=testappsvc --namespace=ipaapp --port=8001 --target-port=8080 --type=ClusterIP
kubectl create ingress testappingress --namespace=ipaapp --class=nginx --rule="cpuload.shafinhasnat.me/*=cpuload-service:8001"
Now, the application is accessible from cpuload.shafinhasnat.me
host.
Setting up IPA requires creating IPA custom resource definition in the cluster first. To set up the CRD-
kubectl apply -f https://raw.githubusercontent.com/shafinhasnat/ipa/refs/heads/main/dist/install.yaml
At this point, an up and running IPA agent is required along with the target application to scale and prometheus. In this case, we are using the shared IPA agent mentioned in the above section. IPA custom resource manifest requires to configure with the correct value to function properly. Below is the IPA custom resource manifest for test application.
apiVersion: ipa.shafinhasnat.me/v1alpha1
kind: IPA
metadata:
name: ipa
spec:
metadata:
prometheusUri: http://prometheus-server.default.svc
llmAgent: https://ipaagent.shafinhasnat.me
ipaGroup:
- deployment: testapp
namespace: ipaapp
ingress: testappingress
It is possible to specify multiple deployments under sepc.metadata.ipaGroup
path. the manifest with kubectl apply -f
command, and IPA will start collecting metrics from the very moment and start action.
Observation
Reconciliation period for IPA is 1 minute. Every minute IPA controller collects metrics and make REST call to the specified IPA agent. The real time logs can be seen form IPA agent dashboard on /
path. Based on the response from the IPA agent, the number of pods in the launched testapp deployment will scale up and down and the resource request and limit will go up and down based on the load on the application.
A script was ran to generate load to the test application for 5 minutes. Based on the load, here is the observed replica and resource scaling summary for 10 minute span.
The below scaling recommendation is given by Gemini gemini-1.5-flash
. The result may vary based on the LLM model being used.

Conclusion
IPA is a hobby project aimed at leveraging AI in the DevOps ecosystem. The results can be improved by fine-tuning models with custom datasets. A similar approach can also be applied to cluster scaling.
Source code -