Kubectl-AI: Enhance Kubernetes Management with AI
kubectl-ai Introduction
kubectl-ai
is a plugin and CLI tool that turns natural-language queries into Kubernetes commands. Instead of memorizing kubectl
flags or YAML manifests, you simply ask questions like “show me all pods in the default namespace” or “scale the nginx deployment to 5 replicas,” and kubectl-ai
figures out the exact kubectl
commands to run. Under the hood it uses large language models (LLMs) to parse your intent, then executes the corresponding kubectl
operations and returns both the raw output and an explanation in plain English.
Key benefits:
- Natural-language interface: No more guessing flag names or resource kinds.
- Context awareness: In interactive mode, follow-on questions (“What about in the staging cluster?”) carry context.
- Extensible backends: Works with commercial LLMs (Gemini, OpenAI, Azure OpenAI, Grok) or fully offline local models (Ollama, llama.cpp).
- Scriptable & pipable: Use it in CI, scripts, or combine with shell pipes (
echo … | kubectl-ai
).
Installation
You need a working kubectl
binary configured against your cluster, plus one of these ways to install kubectl-ai
itself:
Quick Install (Linux & macOS)
curl -sSL https://raw.githubusercontent.com/GoogleCloudPlatform/kubectl-ai/main/install.sh | bash
This will download the latest release, install the kubectl-ai
binary into ~/.local/bin
(or similar), and add it to your PATH for the current session.
Manual Installation (Linux, macOS & Windows)
-
Visit the releases page and download the archive matching your OS/arch (e.g.
kubectl-ai_Darwin_arm64.tar.gz
for Apple Silicon macOS). -
Unpack, make executable, and move into your PATH:
tar -zxvf kubectl-ai_Darwin_arm64.tar.gz
chmod +x kubectl-ai
sudo mv kubectl-ai /usr/local/bin/ -
Verify installation:
kubectl-ai version
Usage
Once installed, you can run kubectl-ai
in two main modes: using cloud-hosted LLMs (Gemini by default) or local models (Ollama/llama.cpp).
Using Gemini (Default)
-
Obtain a key
Go to Google AI Studio, create a project, and copy your Gemini API key. -
Export the environment variable
export GEMINI_API_KEY="your_gemini_api_key_here"
-
Run
kubectl-ai
-
Interactive shell:
kubectl-ai
You’ll get a
kubectl-ai>
prompt where you can ask a sequence of questions. -
Single-query mode:
kubectl-ai --quiet "fetch logs for nginx app in hello namespace"
-
-
Switch models (optional)
# Use a specific Gemini model
kubectl-ai --model gemini-2.5-pro-exp-03-25 "list services in default"
Using AI Models Running Locally (Ollama or llama.cpp)
You can run entirely offline by hosting your own LLM with either Ollama or llama.cpp. Below are detailed steps for each:
A. Ollama
-
Install Ollama
# macOS via Homebrew
brew install ollama
# or Linux via their install script
curl -fsSL https://ollama.com/install.sh | shAlternatively, you can download the Ollama binary directly from their website:
- Visit the Ollama download page.
- Click on the Download for macOS button.
- Once the download is complete, locate the .zip file in your ~/Downloads folder.
- Double-click the .zip file to extract its contents. This should create Ollama.app.
- Move the Ollama.app to your Applications folder for easy access.
- Open the Applications folder and double-click on Ollama.app to launch it.
- Follow any on-screen instructions to complete the installation.
- Once installed, you can run the Ollama CLI commands in your terminal.
-
Run the Ollama server
ollama serve
By default, this listens on
http://localhost:11434
. -
Pull a model
ollama pull gemma3:12b-it-qat
This downloads the
gemma3:12b-it-qat
model, which is optimized for Kubernetes tasks. -
List models
ollama list
This command will show you all the models you have downloaded and their status.
You should see
gemma3:12b-it-qat
in the list. -
Configure
kubectl-ai
export OLLAMA_HOST="http://localhost:11434"
kubectl-ai \
--llm-provider=ollama \
--model=gemma3:12b-it-qat \
--enable-tool-use-shimThe
--enable-tool-use-shim
flag ensures the prompt includes the right instructions for invokingkubectl
. This is important for the model to understand how to interact with Kubernetes. -
Use it
- Interactive:
kubectl-ai
- Single query:
kubectl-ai --quiet "describe deployment my-app"
- Interactive:
-
Remove a model
ollama remove gemma3:12b-it-qat
This command will delete the model from your local machine.
-
Stop the server
ollama stop
This command will stop the Ollama server.
B. llama.cpp
-
Install or build
# Homebrew (macOS/Linux)
brew install llama.cppThe formula is automatically updated with new llama.cpp releases.
-
Get a model
- Download a GGUF (quantized) model from Hugging Face (e.g.
alpaca-7b-q4_0.gguf
). - Place it in
~/models/
or your chosen directory.
- Download a GGUF (quantized) model from Hugging Face (e.g.
-
Run the server
./main --model ~/models/alpaca-7b-q4_0.gguf --server
This starts an OpenAI-compatible API on
http://127.0.0.1:8080
. -
Configure
kubectl-ai
export LLAMA_CPP_SERVER="http://127.0.0.1:8080"
kubectl-ai \
--llm-provider=llama.cpp \
--model=alpaca-7b-q4_0.gguf \
--enable-tool-use-shim -
Use it
- Interactive:
kubectl-ai
- Single query:
kubectl-ai --quiet "scale deployment nginx to 4 replicas"
- Interactive:
References
With these sections in place, users have everything needed to get started with kubectl-ai
—from installation through cloud (Gemini) or local (Ollama/llama.cpp) usage.