Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama (CVE-2026–7482)

We discovered a critical vulnerability (CVE-2026–7482, CVSS 9.1) in Ollama that enables unauthenticated attackers to leak the entire Ollama process memory, potentially impacting300,000servers globally.The leaked memory contains user messages (prompts), system prompts, and environment variables. Ollama is an open-source platform that lets you run LLMs directly on your own machine instead of relying on cloud services likeOpenAI,Anthropic, orxAI.With Ollama, you can download, manage, and interact with models likeLlama,Mistral, and others — all running locally on your hardware. With roughly 170,000 stars on GitHub, over 100 million downloads on Docker Hub, and wide adoption across enterprises, Ollama has become the standard for running open-source models locally. Creating model instances in Ollama can be done in two main ways The first is using/api/pullAPI endpoint — this downloads an existing model from the Ollama registry and makes it available locally.You get a ready-made model (likellama3ormistral) that you can use right away for inference.It’s the simplest approach when you don’t need customization. The second way is using/api/createAPI endpoint — this lets you build custom model instances by specifying configuration parameters like system prompts, quantization levels, and more.The base model can come from two sources — either pulled from a remote registry (via thefromparameter), or built from previously uploaded model files. In this research, we’ll focus on the second option — how users create models from previously uploaded files. But first — how do users upload files in the first place? Files get uploaded to the Ollama server through the/api/blobs/sha256:[sha256-digest]API endpoint.The[sha256-digest]part is exactly what you’d expect — a SHA-256 hash calculated from the file’s content.The actual file content gets sent in the HTTP body of the request. After that, to create a model instance in Ollama, the user calls/api/createwith the uploaded files as parameters in the JSON request body. the breakdown of the API request to/api/createwill be explained later in this article. ‍The next section is a bit technical. We’ve kept it as simple as possible and only included what’s necessary to understand the vulnerability. So please bear with us - we promise the vulnerability part is worth it. GGUF is a file format used to store large language models in a way that makes them efficient to load and run locally. A GGUF file contains tensors — which are basically multi-dimensional arrays of numbers that represent the model’s learned parameters (weights). Think of tensors as the “brain” of the model — they store all the knowledge the model has learned during training. The header of a GGUF file contains data that describes it, like the version of the GGUF format, the amount of tensors it contains and some key-value metadata. One metadata field worth mentioning isgeneral.file_type— this tells you (shocking) the file type of the GGUF, which determines how the numbers inside the tensors are stored.For this research, we only care about F16 (float-16) and F32 (float-32). After the GGUF header comes a list of tensor objects. Each one stores the tensor’s name, number of dimensions, data type (precision info), and an offset that points to where the actual tensor data lives later in the file. ‍ Quantization is the process of reducing the precision of numbers stored in tensors, making the model smaller and faster to run — at the cost of some accuracy. In GGUF, F32 file type stores each number using 4 bytes, while F16 uses only 2 bytes.Moving from F32 to F16 cuts memory usage in half (making the model run faster), but comes with permanent data loss — some decimal precision is gone and can’t be recovered. Going the other way, from F16 to F32, involves no data loss at all.

Read Full Article → ← Back to News

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama (CVE-2026–7482)

Related Articles

Share this article