RepoWatch / GitHub signal

Llama.cpp b9733 and GGML v0.15.2 releases

Sync llama.cpp, GGML and Python bindings for latest webgpu and stability fixes.

Direct impact on Hermes and OpenClaw local model performance and compatibility.

What changed

  • ggml-org/llama.cpp: New release tag b9733 (2026-06-20). Also commit adding adapter toggles for F16 on Vulkan + NVIDIA in ggml-webgpu.
  • ggml-org/ggml: Release v0.15.2 with accompanying version bump commit.
  • abetlen/llama-cpp-python: Updated to match latest llama.cpp commit f449e0553.
  • abetlen/ggml-python: New release v0.0.43-hip-radeon.

Why it matters

These changes target GPU acceleration improvements (Vulkan, NVIDIA F16) and dependency syncing. Critical for running efficient local LLMs in agent systems like Hermes/OpenClaw without relying on cloud.

My read

The ggml-org projects are iterating quickly on hardware-specific optimisations. The bindings are staying in sync. No major breaking changes noted, but worth testing in our stacks. Related unsloth and ollama updates also signal active local inference development.

Bottom line

Update now if using llama.cpp or GGML-based inference. Strong signal for local tooling optimisation.