releases.shpreview
Home/Ollama
Ollama

Ollama

Codex App

Ollama 0.24 includes support for the Codex App, OpenAI's desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.

ollama launch codex-app
<img width="2088" height="1404" alt="CleanShot 2026-05-14 at 15 04 18@2x" src="https://github.com/user-attachments/assets/53bd7997-19fd-4809-b8f2-b6ed284369c9" />
Built-in browser

Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.

<img width="1073" height="668" alt="codex-annotate copy" src="https://github.com/user-attachments/assets/c9b762b3-83f2-47f1-8f28-d9eebc1bf5e0" />
Review mode

Review code inside the app, leave comments, and iterate without leaving your workspace.

<img width="1137" height="696" alt="codex-comments copy 2" src="https://github.com/user-attachments/assets/56316d33-59ed-4f24-aaa7-a7c0310014c4" />
Choosing a model

For difficult coding and agentic tasks:

  • kimi-k2.6 (with vision support)
  • glm-5.1

For local use without an Ollama Cloud subscription:

  • nemotron-3-super
  • gemma4:31b
  • qwen3.6
Restore anytime

To restore the previous configuration of Codex App, run:

ollama launch codex-app --restore

What's Changed

  • Reworked the MLX sampler for improved generation quality on Apple Silicon

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0...v0.24.0

What's Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.2...v0.23.3

What's Changed

  • ollama launch no longer includes Claude Desktop due to the third-party integration being limited to Anthropic models.
  • Use ollama launch claude-desktop --restore to restore Claude Desktop to its normal state.
  • /api/show responses are now cached, improving median latency by ~6.7x which will increase load speed for integrations like VS Code.
  • Improved backup workflow when managing launch integrations
  • Cleaner image generation layout in the MLX runner

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.1...v0.23.2

Gemma 4 MTP (Multi-token Processing) for the MLX runner

Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks.

ollama run gemma4:31b-coding-mtp-bf16

What's Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0...v0.23.1

Claude Desktop

Claude Desktop is now supported with Ollama Launch.

Claude Cowork and Claude Code are supported within the Claude Desktop App.

ollama launch claude-desktop
Claude Cowork
<img width="1272" height="872" alt="ca1" src="https://github.com/user-attachments/assets/1d550e3f-0272-4429-8cb2-06d32344cb77" />
Claude Code
<img width="1272" height="872" alt="ca2" src="https://github.com/user-attachments/assets/f2a5ed5f-3069-4975-bb22-ada82914a01c" />

Claude Code on the terminal can still be accessed through the CLI with:

ollama launch claude
Not supported yet
  • Web Search (coming soon)
  • Extensions

What's Changed

  • Launch Claude Desktop with ollama launch claude-desktop
  • The Ollama app now surfaces featured models from server-driven recommendations
  • Fixed OpenClaw gateway timeout on Windows by enforcing IPv4 loopback (thanks @UniquePratham)
  • Hardened Metal initialization to gracefully handle ggml kernel compilation failures

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.22.1...v0.23.0

What's Changed

  • Updated the Gemma 4 renderer for thinking and tool calling improvements
  • Model recommendations are now updated without updating Ollama
  • Aligned the desktop app's launch page with ollama launch integrations
  • Fixed the Poolside integration title in ollama launch

Full Changelog: https://github.com/ollama/ollama/compare/v0.22.0...v0.22.1

What's Changed

Kimi CLI

You can now install and run the Kimi CLI through Ollama.

ollama launch kimi --model kimi-k2.6:cloud

Kimi CLI with Kimi K2.6 excels at long horizon agentic execution tasks through a multi-agent system.

  • MLX runner adds logprobs support for compatible models
  • Faster MLX sampling with fused top-P and top-K in a single sort pass, plus repeat penalties applied in the sampler
  • Improved MLX prompt tokenization by moving tokenization into request handler goroutines
  • Better MLX thread safety for array management
  • GLM4 MoE Lite performance improvement with a fused sigmoid router head
  • Fixed model picker showing stale model after switching chats in the macOS app
  • Fixed structured outputs for Gemma 4 when think=false

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.0...v0.21.1

Hermes Agent

ollama launch hermes

Hermes learns with you, automatically creating skills to better serve your workflows. Great for research and engineering tasks.

<img width="1329" height="946" alt="image" src="https://github.com/user-attachments/assets/771d3383-95ed-4652-81e5-cf89514d25cc" />

What's Changed

  • Gemma 4 on MLX. Added support for running Gemma 4 via MLX on Apple Silicon, including a text-only MLX runtime for the model. The MLX backend also picked up mixed-precision quantization, better capability detection, and a batch of new op wrappers (Conv2d, Pad, activations, trig, masked SDPA, and RoPE-with-freqs).
  • Hermes and GitHub Copilot CLI in ollama launch. Added both integrations, which can now be configured in one command alongside the rest of the supported coding agents.
  • OpenCode moved to inline config. ollama launch opencode now writes its config inline rather than to a separate file, matching how other integrations are handled.
  • ollama launch no longer rewrites config when nothing changed. Pressing → on a configured multi-model integration, or passing --model with the current primary, used to trigger a confirmation prompt and rewrite both the editor's config file and config.json. Now it's a no-op when the resolved model list matches what's already saved.
  • Fixed ollama launch openclaw --yes so it correctly skips the channels configuration step, so non-interactive setups complete cleanly.
  • Restored the Gemma 4 nothink renderer with the e2b-style prompt.
  • Fixed the Gemma 4 compiler error that was breaking Metal builds.
  • Fixed macOS cross-compiles so they no longer trigger generate, which was breaking cmake builds on some Xcode versions.
  • Quieted cgo builds by suppressing deprecated warnings during go build.

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.7...v0.21.0

OpenClaw channel setup with ollama launch

<img width="1074" height="654" alt="image" src="https://github.com/user-attachments/assets/300ec082-c18a-4911-b3fe-82a7bc74a000" />

What's Changed

  • OpenClaw channel setup: connect WhatsApp, Telegram, Discord, and other messaging channels through ollama launch openclaw
  • Enable flash attention for Gemma 4 on compatible GPUs
  • ollama launch opencode now detects curl-based OpenCode installs at ~/.opencode/bin
  • Fix /save command for models imported from safetensors

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.4...v0.20.5

<img width="3748" height="1290" alt="Gemma 4" src="https://github.com/user-attachments/assets/c4727579-47b1-4c7b-8aa2-28eda15b71f5" />

Gemma 4

Effective 2B (E2B)

ollama run gemma4:e2b

Effective 4B (E4B)

ollama run gemma4:e4b

26B (Mixture of Experts model with 4B active parameters)

ollama run gemma4:26b

31B (Dense)

ollama run gemma4:31b

What's Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.19.0...v0.20.0-rc0

<img width="480" alt="image" src="https://github.com/user-attachments/assets/1b5ca980-b9d5-490e-99b9-f0f7b9af2c32" />

Ollama is now powered by MLX on Apple Silicon in preview

Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.

https://github.com/user-attachments/assets/600297b0-3167-46a5-8e3a-fefda3a51b84

Read more: https://ollama.com/blog/mlx

What's Changed

  • Ollama's app will now no longer incorrectly show "model is out of date"
  • ollama launch pi now includes web search plugin that uses Ollama's web search
  • Improved KV cache hit rate when using the Anthropic-compatible API
  • Fixed tool call parsing issue with Qwen3.5 where tool calls would be output in thinking
  • MLX runner will now create periodic snapshots during prompt processing
  • Fixed KV cache snapshot memory leak in MLX runner
  • Fixed issue where flash attention would be incorrectly enabled for grok models
  • Fixed qwen3-next:80b not loading in Ollama

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3...v0.19.0

Last Checked
1h ago
Domain
ollama.com
Category
Tracking since Mar 14, 2025