RepoWatch / GitHub signal

Llama.cpp b9415 release and local inference updates

Published30/05/2026

Repoggml-org/llama.cpp

Routine but useful updates to core local inference stack.

Directly impacts Hermes and OpenClaw local model running capabilities.

What changed

ggml-org/llama.cpp: Release b9415 (from b9401). Includes a CI fix for s390x release job.
ggml-org/whisper.cpp: v1.8.5 release.
unslothai/unsloth: Added support for lemonade-sdk llamacpp-rocm binaries.
ollama/ollama: Fixed MLX dev mode search path.
bitsandbytes-foundation/bitsandbytes: Added Windows ARM64 wheel build support with NEON optimization.
Minor commits across transformers, pytorch, datasets, and others (mostly CI, docs, and small fixes).

Why it matters

These updates target the local inference and optimisation layer used across Hermes, OpenClaw, and agent tooling. Llama.cpp and Whisper.cpp releases typically bring performance or compatibility improvements. Unsloth and bitsandbytes changes expand hardware and platform support.

My read

Mostly incremental maintenance releases and small features. Nothing revolutionary, but the releases are worth integrating for stability and new hardware support. The Windows ARM64 addition in bitsandbytes stands out for broader deployment options.

Bottom line

Bump llama.cpp to b9415 and whisper.cpp to v1.8.5. Review Unsloth if using ROCM/AMD setups. Watch only for the rest. No critical blockers.