Skip to content

Local models (Ollama)

Ollama lets you run language models entirely on your own machine — requests never leave your device and no external API key is required. Kodik automatically discovers a running Ollama server and offers all pulled models for selection.

Image

If Ollama is not yet installed, Kodik will offer to install it from the UI. You can also install it manually.

OSCommand
Linuxcurl -fsSL https://ollama.com/install.sh | sh
macOSbrew install ollama
Windowswinget install -e --id Ollama.Ollama

Kodik can start and stop the Ollama server from the built-in UI:

  • Start — Kodik runs ollama serve in the background. The process stays alive even after the terminal window closes.
  • Stop — Kodik sends the appropriate shutdown command for the current OS.

You can also start the server yourself with ollama serve in any terminal.

By default Kodik connects to http://localhost:11434. If you run Ollama on a different host or port — for example inside Docker or on a remote machine — enter the appropriate URL in the Ollama provider settings.

The Ollama settings section lists your models with Pull and Delete buttons:

  • Pull — Kodik opens a terminal and runs ollama pull <model>. Watch the progress in the terminal window.
  • Delete — removes the model from disk via the Ollama API.

If the model you need is not yet pulled, find its name on ollama.com/library and enter it in the pull field.

The Test model button sends a short test message to the selected model and shows the response and response time. This is useful for verifying your setup before starting real work.

Kodik automatically discovers the context window size of each pulled model via the Ollama API (/api/show). If it cannot determine the exact size, it falls back to a default of 32 768 tokens.

For local requests Kodik uses probing: it tries a request with the current context window size. If Ollama returns an out-of-memory error for the chosen num_ctx, Kodik automatically reduces the value and retries. Both the successful and failed sizes are remembered locally per model so the next request starts from a known-good value.

Models that advertise the thinking capability (for example the Qwen3 family) get a per-model Reasoning toggle in the chat model picker. On keeps Ollama’s default behavior — the model thinks before answering and the thought process is shown in a collapsible block. Off sends think: false, so the model answers directly without a thinking pass — faster and cheaper on tokens for simple tasks. Models without the capability never receive the flag.

If your Ollama server requires a Bearer token — for example when using Ollama Cloud or a private corporate instance — enter it in the API Key field in the Ollama provider settings. Leave it empty for a standard local installation.