Make Your Machine Talk: Piper TTS Offline
Table of Contents
Want to make your computer talk — using real, natural-sounding voices without needing the cloud?
In this tutorial, we’ll set up Piper TTS on a local system and give it a voice. It’s fast, offline, and perfect for voice assistants, robotics, Raspberry Pi projects, or just impressing your friends.
What You’ll Need #
- Python 3 installed (recommended: Python 3.8 or newer)
- Linux environment (x86_64 or ARM64) - or Windows with WSL
- Internet connection (just for setup)
- A speaker or headphones
- Terminal access
I have tested on Raspberry Pi OS 64-bit (Bookworm), Ubuntu 22.04, and Windows 11 using WSL. Works on most modern 64-bit Linux environments.
Install Piper Using pip #
Piper can now be installed easily via pip, but to avoid issues with system-managed Python environments, it’s best to use a virtual environment.
python3 -m venv .venv
source .venv/bin/activate
pip install piper-tts
Add a Voice #
Piper uses ONNX-based voice models and needs both the .onnx model and its matching .json config file.
You can browse the full list of supported voices here:
For this example, let’s download the English voice “Amy”.
# Download the model file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx?download=true" \
-O en_US-amy-medium.onnx
# Download the corresponding config file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json?download=true" \
-O en_US-amy-medium.onnx.json
Place both files in the same folder. Piper uses the .json file to determine things like speaking speed, noise level, and phoneme mappings.
Try It Out #
Time for the magic. Let’s speak some text. You can generate a WAV file or stream it directly to your speakers.
Option 1: Save to file and play #
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --output_file hello.wav
aplay hello.wav # or use your system's audio player
Option 2: Stream directly to aplay #
echo "This sentence is spoken first. This sentence is synthesized while the first sentence is spoken." \
| piper -m en_US-amy-medium.onnx --output-raw \
| aplay -r 22050 -f S16_LE -t raw -
This streams raw audio directly to your speakers in real-time. Make sure the sample rate and format match your voice model.
On WSL, use
wmplayer hello.wavorpowershell -c (New-Object Media.SoundPlayer 'hello.wav').PlaySync().
Use a GPU (Optional) #
If you’d like to use a GPU, install the onnxruntime-gpu package inside your virtual environment. Then run Piper with the --cuda flag.
# installs the onnxruntime-gpu package
.venv/bin/pip3 install onnxruntime-gpu
# runs piper using CUDA
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --cuda | aplay
You’ll need a working CUDA environment, such as what comes with NVIDIA’s PyTorch containers.
Fine-Tune the Voice (Optional) #
Too fast or too slow? Edit the en_US-amy-low.onnx.json config file and tweak this value:
"phoneme_duration_scale": 1.2
Or for more fine-tuning:
"inference": {
"length_scale": 1.2,
"noise_scale": 0.5,
"noise_w": 0.6
}
Higher
length_scale= slower speech. Try1.2to1.5.
Wrap Up #
You now have a fast, local, private TTS engine — no internet, no API keys. Just raw, talking hardware.
Next steps?
- Use it in your scripts or Python apps
- Hook it into a voice assistant
- Clone your own voice (stay tuned for Tutorial #2!)
If this worked, give your machine a high-five — or better yet, have it say “Thank you!”