RepoWatch / GitHub signal
llama.cpp Adds IDs to Tool Call Responses
Local inference servers need boring API compatibility if they are going to sit underneath real agents.
llama.cpp is a core local-inference backend; server API details affect Hermes/OpenClaw-style agents that depend on reliable tool-call plumbing.
What changed
ggml-org/llama.cpp published release b9763. The release note is short but useful: the llama.cpp server now adds an id to tool-call response objects in the API.
That sits alongside the usual packaged binaries for macOS, Linux, Windows, Android and other supported targets. The watched branch has also moved on, but the release note is the material item here.
Release: https://github.com/ggml-org/llama.cpp/releases/tag/b9763
Why it matters
Tool calling is where local inference stops being a toy chat box and starts acting like agent infrastructure. IDs on tool-call responses are not glamorous, but they help clients match calls, responses and traces without guessing.
For OpenAI-compatible or OpenAI-adjacent local servers, these details matter. Agent runners, debug logs, replay systems and UI layers all benefit when a tool response has a stable identifier rather than relying on position, timing or a half-remembered convention.
My read
This is a compatibility and reliability update, not a headline feature. That is exactly why it is worth noticing.
If a Foundry/Hermes/OpenClaw deployment is using llama.cpp server mode underneath tool-using agents, this is a good candidate for a quick spike. Check whether the client expects or preserves tool-call IDs, then update on a test box before rolling it into anything production-facing.
Bottom line
Small API shape changes can remove a lot of agent glue code. Track b9763, test it against local tool-calling flows, and update if the IDs behave cleanly with the clients in use.