Running LLM llama.cpp Natively on Raspberry Pi
For developers and hackers who enjoy squeezing maximum potential out of compact machines, getting a large language model like llama.cpp
running natively on a Raspberry Pi is a rewarding challenge. This guide walks you through compiling llama.cpp from source, downloading a model, and running inference - all on the Pi itself.
Prerequisites
Hardware
- Raspberry Pi 4, 5, or newer
- 64-bit Raspberry Pi OS
- 4GB RAM minimum (8GB+ recommended)
- Heatsink or fan recommended for cooling
Software
- Git
- CMake (v3.16+)
- GCC or Clang
- Python 3 (optional, for Python bindings)
Step-by-Step Guide
Install required tools
sudo apt update && sudo apt upgrade -y
sudo apt install -y git build-essential cmake python3-pip libcurl4-openssl-dev
Clone and Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j$(nproc)
This step takes sometime. Here we're compiling llama-cpp
software.
Download a Quantized Model
mkdir -p models && cd models
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf
cd ..
Let's use the model https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF for testing.
4. Run Inference
./build/bin/llama-cli -m ./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf -p "Hello, Raspberry Pi!"
Optional: Python Bindings
Note: The Python bindings have been moved to a separate repository.
git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
python3 -m pip install -r requirements.txt
python3 -m pip install .
Use in Python:
# Use in Python:
from llama_cpp import Llama
llm = Llama(model_path="./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf")
print(llm("Hello from Python!"))
Conclusion
Running llama.cpp natively on a Raspberry Pi is a geeky thrill. It teaches you about compiler optimizations, quantized models, and pushing hardware to the edge—literally. Bonus points if you run it headless over SSH.