RepoWatch / GitHub signal
llama.cpp b9843 backs out a risky split-compute scheduler change
Do not treat every local-inference speed path as free performance; scheduler and copy semantics are runtime reliability issues.
llama.cpp sits underneath a lot of GGUF and local model experiments. If Hermes, OpenClaw or Foundry agent tooling leans on local inference, scheduler changes affect stability as much as speed.
What changed
ggml-org/llama.cpp published release b9843, built from commit 86b9470.
The notable change is a revert: sched : reintroduce less synchronizations during split compute was backed out. In practical terms, the release rolls back part of the scheduler work that tried to reduce synchronisation around split compute and async tensor copies.
The revert touches the backend scheduler and CUDA copy path. User input copies move back to the simpler synchronous route, and the async copy path is narrowed again rather than trying to be clever across CPU-to-CUDA boundaries.
Links:
- https://github.com/ggml-org/llama.cpp/releases/tag/b9843
- https://github.com/ggml-org/llama.cpp/commit/86b94708f22478f900b76ca02e316f4f3418faff
Why it matters
Local inference failures rarely announce themselves as “the scheduler is wrong”. They show up as odd hangs, corrupted-looking runs, backend-specific behaviour, or performance changes that only appear when you split work across devices.
That is why this sort of release matters more than the headline suggests. It is not a shiny model launch. It is the runtime maintainers choosing correctness and predictable copy semantics over keeping a recent optimisation in place.
For agent systems, that is the right instinct. A local model stack used by Hermes, OpenClaw or Foundry automation needs boring reliability before it needs another few percent of throughput.
My read
This is worth a spike, not an automatic production upgrade.
If you are already pinned to a recent llama.cpp build and seeing split-compute or CUDA weirdness, b9843 is a sensible candidate to test. If your local inference path is stable, I would not blindly bump everything just because the tag moved.
The useful test is simple: run the same prompt/tool-use harness on the old build and b9843, with the same model and backend settings, then compare latency, output stability, memory behaviour and error logs.
Bottom line
b9843 is a maintenance release with operational signal. The important part is not the version number; it is the rollback of an optimisation path that could affect runtime correctness. For local agent infrastructure, that earns a test pass before it earns a production pin.