RepoWatch / GitHub signal

Relevant GitHub changes, minus the noise.

This is where GitHub updates land when they actually matter: releases, repo moves, tooling changes and commits with real operating implications.

04/07/2026huggingface/transformers

Transformers 5.13 shows the Hugging Face upgrade tax

Transformers 5.13 adds useful new model support, but the same watchlist run already shows downstream compatibility fixes landing around it.

Treat Transformers 5.13 as a staged stack upgrade, not a casual dependency bump.

github RepoWatch ai tools

03/07/2026unslothai/unsloth

Unsloth exposes an MLX-aware trainer API

Unsloth has added a public trainer path that knows about MLX, which makes Apple Silicon local fine-tuning experiments less of a private Studio-only trick.

Apple Silicon local training is still a spike, but Unsloth is turning more of it into a usable API surface.

github RepoWatch ai tools

02/07/2026ggml-org/llama.cpp

llama.cpp b9859 adds precompiled OpenCL kernels for Adreno

llama.cpp can now load a Qualcomm Adreno binary kernel library for selected OpenCL paths, making Snapdragon-class local inference a more practical test target.

This is not a blanket upgrade signal, but it is worth testing if local agent inference is moving onto Snapdragon/Adreno hardware.

github RepoWatch ai tools

01/07/2026ollama/ollama

Ollama v0.31.1 makes Gemma 4 faster on Apple Silicon

Ollama's latest release uses multi-token prediction to speed up Gemma 4 locally on Apple Silicon without extra configuration.

If local Mac-based agents are using Gemma 4 through Ollama, this is worth a measured test run rather than a blind bump.

github RepoWatch ai tools

30/06/2026ggml-org/llama.cpp

llama.cpp b9843 backs out a risky split-compute scheduler change

The latest llama.cpp release reverts async split-compute work, which is the kind of boring runtime correction that matters for local agent reliability.

Do not treat every local-inference speed path as free performance; scheduler and copy semantics are runtime reliability issues.

github RepoWatch ai tools

29/06/2026ggml-org/llama.cpp

The local inference bindings are keeping pace with llama.cpp

llama.cpp shipped b9837 while the Python bindings and adjacent ggml tooling moved with it.

The useful signal is not one dramatic release; it is the local inference stack staying aligned across C++, Python bindings and packaged runtimes.

github RepoWatch ai tools

28/06/2026ollama/ollama

Ollama fixes a brittle edge in streamed tool calls

Ollama now ignores braces inside JSON strings when deciding whether a streamed tool call is complete.

Tool calling breaks in stupid places; this one matters because code-shaped JSON arguments are normal agent traffic.

github RepoWatch ai tools

26/06/2026Significant-Gravitas/AutoGPT

AutoGPT Makes Skills More Portable

AutoGPT's platform beta now lets users upload and download AutoPilot skills from the library, which is a useful signal for where agent tooling is heading.

Agent skills are becoming assets people can move, inspect and reuse — not just memories trapped inside one runtime.

github RepoWatch ai tools

24/06/2026ggml-org/llama.cpp

Local inference stack tightens up around llama.cpp

llama.cpp, llama-cpp-python, Ollama and Unsloth all shipped practical fixes for running local models with fewer rough edges.

The useful work this week is not a new model toy; it is local inference getting less brittle.

github RepoWatch ai tools

23/06/2026ggml-org/llama.cpp

llama.cpp Adds IDs to Tool Call Responses

llama.cpp release b9763 adds IDs to tool-call response objects, a small API change with real agent-integration consequences.

Local inference servers need boring API compatibility if they are going to sit underneath real agents.

github RepoWatch ai tools

22/06/2026abetlen/llama-cpp-python

llama-cpp-python Fixes a Subtle Cached-Prompt State Bug

A new llama-cpp-python fix preserves recurrent and hybrid model state when the full prompt is already cached.

Cached prompts are only useful if the model state stays correct; this fix closes a quiet but nasty edge case.

github RepoWatch ai tools

21/06/2026ggml-org/llama.cpp

llama.cpp b9744 Release and Radeon Bindings Updates

New release for the core C/C++ inference engine plus HIP/Radeon version bumps in Python bindings.

Core local inference stack advances with a fresh release and broader AMD hardware support.

github RepoWatch ai tools inference llama-cpp

20/06/2026ggml-org/llama.cpp

Llama.cpp b9733 and GGML v0.15.2 releases

New releases and GPU adapter improvements across the core local inference stack.

Sync llama.cpp, GGML and Python bindings for latest webgpu and stability fixes.

github RepoWatch ai tools inference

19/06/2026Significant-Gravitas/AutoGPT

AutoGPT platform beta v0.6.64 adds webhook triggers

New release brings webhook support and lifecycle management for copilot presets in the AutoGPT platform.

Webhook triggers open up event-driven agent workflows.

github RepoWatch ai agents tools

18/06/2026ggml-org/llama.cpp

Llama.cpp b9693 Release

New release tag for llama.cpp including SYCL OP support updates.

Steady progress on SYCL and inference optimisations in core local LLM engine.

github RepoWatch ai tools local-inference

17/06/2026ggml-org/llama.cpp

Llama.cpp b9672 Release

llama.cpp b9672 drops with UI SSE fixes; Ollama v0.30.9 follows immediately with the update.

Core local inference engine updated; bump your stacks.

github RepoWatch ai tools local-inference

16/06/2026huggingface/transformers

Hugging Face Transformers v5.12.1

Patch release with continuous batching snapshot fixes.

Targeted fixes for inference stability.

github RepoWatch ai tools huggingface

15/06/2026ggml-org/llama.cpp

llama.cpp Releases b9637

New release and SYCL optimisation from the leading C++ LLM inference library.

Routine release but important for keeping local inference current.

github RepoWatch ai tools inference

14/06/2026ggml-org/llama.cpp

llama.cpp and local inference tooling updates

New releases and commits in llama.cpp, unsloth, llama-cpp-python and related projects improving GPU support and stability.

Incremental but useful fixes for local model inference on NVIDIA and AMD hardware.

github RepoWatch ai tools inference

13/06/2026huggingface/transformers

Transformers 5.12.0 and local inference stack updates

Hugging Face Transformers v5.12.0 released alongside updates to Unsloth, Llama.cpp, Ollama and GGML.

Key local inference frameworks received releases and fixes today.

github RepoWatch ai tools huggingface

12/06/2026ggml-org/llama.cpp

llama.cpp b9603 and ggml v0.15.0 updates

Key local inference libraries release new versions with CUDA and performance enhancements.

CUDA concat support and version bumps in core inference stack.

github RepoWatch ai local-inference llama-cpp ggml

11/06/2026huggingface/transformers

Hugging Face Transformers v5.11.0 released

Core model framework update with data-parallel support and other improvements for training and inference.

Transformers 5.11.0 adds data-parallel capabilities that could help scale local inference setups.

github RepoWatch ai tools huggingface transformers

10/06/2026ggml-org/llama.cpp

llama.cpp b9585 release with pinned conversations webui support

ggml-org/llama.cpp ships release b9585 including webui pinned conversations and other local inference updates.

Pinned conversations improve long-session usability in the llama.cpp webui.

github RepoWatch ai local-inference tools

09/06/2026ollama/ollama

Ollama v0.30.7 and GGML v0.14.0 updates

Releases in Ollama and the GGML tensor library deliver incremental improvements to local inference tooling.

Core components of the local model stack receive updates that may affect deployment and performance.

github RepoWatch ai tools local-inference

08/06/2026ggml-org/llama.cpp

llama.cpp b9553 release with RDNA3.5 HIP support

New release b9553 plus gfx1152/gfx1153 RDNA3.5 additions in llama.cpp.

HIP RDNA3.5 support lands in latest llama.cpp.

github RepoWatch ai tools llama-cpp

06/06/2026ollama/ollama

Ollama v0.30.6 released

Ollama pushes v0.30.6 with model list alignment and related fixes.

Ollama refines local LLM deployment with better model handling.

github RepoWatch ai tools ollama local-inference

05/06/2026ggml-org/llama.cpp

Local inference stack updates: llama.cpp b9521, Ollama v0.30.5

Releases in llama.cpp, Ollama, llama-cpp-python plus supporting commits in Unsloth and Transformers.

Core local LLM engines received updates worth reviewing for Hermes/OpenClaw agent deployments.

github RepoWatch ai tools local-inference

03/06/2026ollama/ollama

Ollama ships v0.30.2 — patch fix for Laguna build breakage

Ollama’s latest patch release fixes a build breakage in the Laguna patch path, keeping local LLM serving stable for tooling that depends on it.

A small but real bugfix release to a core local-inference runtime.

github RepoWatch ai tools ollama

02/06/2026ggml-org/llama.cpp

Llama.cpp b9468 Release

New build tag and hexagon backend optimisations for MUL_MAT and more in the core C++ LLM inference library.

Llama.cpp continues rapid iteration with new release and targeted optimisations.

github RepoWatch ai tools local-inference

01/06/2026ggml-org/llama.cpp

Llama.cpp b9444 release with SYCL Flash-Attention quant support

ggml-org/llama.cpp releases b9444 including SYCL support for Q4_1, Q5_0, Q5_1 in Flash-attention.

Targeted optimisation extending Flash-attention support to more quant types on Intel GPUs via SYCL.

github RepoWatch ai tools local-inference

31/05/2026ggml-org/llama.cpp

Llama.cpp b9439 release with iGPU default change

ggml-org/llama.cpp releases b9439 including a change to use only one iGPU device by default.

Foundational local inference updates that affect device handling in agent tooling.

github RepoWatch ai tools local-inference

30/05/2026ggml-org/llama.cpp

Llama.cpp b9415 release and local inference updates

New release for llama.cpp alongside updates to whisper.cpp, Unsloth, Ollama and bitsandbytes.

Routine but useful updates to core local inference stack.

github RepoWatch ai tools local-inference

29/05/2026ggml-org/llama.cpp

Llama.cpp b9401 release

New release of the core local LLM inference engine with ggml sync.

Update for latest local inference performance.

github RepoWatch ai tools local-inference

28/05/2026ggml-org/llama.cpp

llama.cpp b9371 Release

llama.cpp releases b9371 with CI refactor.

Continued updates to the core local inference engine.

github RepoWatch ai tools local-inference

27/05/2026ggml-org/llama.cpp

llama.cpp b9354 release with MiniCPM5 tokenizer

New tagged release and tokenizer addition expands local model support in llama.cpp.

Core local inference engine gets new model compatibility.

github RepoWatch ai tools local-inference

26/05/2026ggml-org/llama.cpp

llama.cpp b9333 release with Gemma4 support

New release b9333 and Gemma4ForCausalLM architecture support in the core C++ inference engine.

llama.cpp and ggml ecosystem advancing with new release and Gemma4 compatibility.

github RepoWatch ai tools local-inference

23/05/2026ggml-org/llama.cpp

llama.cpp and tinygrad push new releases

Core local inference projects tagged new versions with targeted fixes across the stack.

Releases in the local inference layer are the practical signal for agent tooling reliability.

github RepoWatch ai tools

22/05/2026bitsandbytes-foundation/bitsandbytes

4-bit inference kernels are the useful bit of today's AI plumbing

bitsandbytes added new CUDA 4-bit GEMM inference kernels while the local inference stack kept sanding down runtime failures.

The material signal is faster, less fragile local inference plumbing rather than a shiny new agent feature.

github RepoWatch ai tools

21/05/2026huggingface/transformers

Model compatibility is still the boring edge of AI tooling

Transformers 5.9.0 and llama.cpp b9264 both point at the same practical issue: AI-builder tooling has to keep absorbing new model families fast.

The useful signal is not a single shiny feature; it is the compatibility layer moving quickly enough that new models can become operational options rather than integration chores.

github RepoWatch ai tools

20/05/2026ggml-org/llama.cpp

llama.cpp keeps sanding down the local-inference path on Mac

llama.cpp b9245 and a fresh Metal pad/copy optimisation are small but relevant improvements for local model runners.

The useful local-inference gains are often not glamorous model launches; they are low-level runtime fixes that make Mac-based agent work less sluggish.

github RepoWatch ai tools

19/05/2026unslothai/unsloth

Unsloth Studio is drifting into agent workbench territory

Unsloth v0.1.405-beta adds faster GGUF inference, cloud API providers, prompt caching, external backend connections and experimental MLX support.

Unsloth is no longer just a training/fine-tuning helper; it is becoming a practical local-plus-cloud model workbench.

github RepoWatch ai tools

18/05/2026pytorch/pytorch

PyTorch fixes a CUDA Inductor atan numerics edge case

A small PyTorch compiler fix matters if you rely on CUDA-compiled model code where tiny floating-point differences can get amplified downstream.

Compiler speed is useful, but compiler numerical behaviour is what keeps production ML from quietly drifting.

github RepoWatch ai tools

17/05/2026Multi-repo watch

llama.cpp, ggml, Instructor and Codex workflow notes are the only RepoWatch items worth a look today

Out of 106 repos checked, only four updates looked worth a spike: llama.cpp embedding normalisation, ggml v0.12.0, Instructor v2 cleanup, and a useful Codex workflow note from jxnl.

Most repo movement was noise. The only changes that look worth attention today touch embeddings, local inference foundations, structured outputs, or useful agent workflow practice.

AI disruption github tools agents RAG llama.cpp ggml instructor codex RepoWatch