v0.19.0
<img width="480" alt="image" src="https://github.com/user-attachments/assets/1b5ca980-b9d5-490e-99b9-f0f7b9af2c32" />
Ollama is now powered by MLX on Apple Silicon in preview
Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.
https://github.com/user-attachments/assets/600297b0-3167-46a5-8e3a-fefda3a51b84
Read more: https://ollama.com/blog/mlx
What's Changed
- Ollama's app will now no longer incorrectly show "model is out of date"
ollama launch pinow includes web search plugin that uses Ollama's web search- Improved KV cache hit rate when using the Anthropic-compatible API
- Fixed tool call parsing issue with Qwen3.5 where tool calls would be output in thinking
- MLX runner will now create periodic snapshots during prompt processing
- Fixed KV cache snapshot memory leak in MLX runner
- Fixed issue where flash attention would be incorrectly enabled for
grokmodels - Fixed
qwen3-next:80bnot loading in Ollama
New Contributors
- @amatas made their first contribution in https://github.com/ollama/ollama/pull/15022
Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3...v0.19.0
Fetched May 26, 2026
