RepoWatch / GitHub signal

Local inference stack updates: llama.cpp b9521, Ollama v0.30.5

Published05/06/2026

Repoggml-org/llama.cpp

Core local LLM engines received updates worth reviewing for Hermes/OpenClaw agent deployments.

Direct impact on local model inference performance and compatibility in agent tooling.

What changed

ggml-org/llama.cpp: Release b9521 (2026-06-05) with CUDA commit enrolling mul_mat_vec_q_moe into pdl. Previous b9500.
ollama/ollama: Release v0.30.5 (2026-06-04) plus commit for “launch: oh-my-pi”.
abetlen/llama-cpp-python: Release v0.3.26-hip-radeon (2026-06-05) and docs fix for Gemma 4 Colab notebook.
unslothai/unsloth: Commit enabling audio input for Gemma 4 GGUFs; default chat model to Qwen.
huggingface/transformers: Release v5.10.2 (2026-06-04) with Romanian docs commit.
Supporting: astral-sh/ruff 0.15.16 release, ruff-vscode 2026.48.0.

Why it matters

These repos form the backbone of local inference for open models. llama.cpp and Ollama power most local agent setups in Hermes and OpenClaw. llama-cpp-python bindings and Unsloth’s Gemma 4 audio support expand multimodal options. Transformers and Ruff are foundational for model handling and dev tooling.

My read

Material cluster in local-inference-optimisation. Releases show steady progress on performance (CUDA, HIP) and features (audio, new defaults). Minor commits in other areas are noise. Relevant for any local model serving or agent work.

Bottom line

Worth a spike on llama.cpp b9521 and Ollama v0.30.5. Update now in local environments. Watch Unsloth for Gemma 4 multimodal experiments. Ignore the rest.