Make Your Machine Talk: Piper TTS Offline
Want to make your computer talk — using real, natural-sounding voices without needing the cloud?
In this tutorial, we’ll set up Piper TTS on a local system and give it a voice. It’s fast, offline, and perfect for voice assistants, robotics, Raspberry Pi projects, or just impressing your friends.
What You’ll Need
- Python 3 installed (recommended: Python 3.8 or newer)
- Linux environment (x86_64 or ARM64) - or Windows with WSL
- Internet connection (just for setup)
- A speaker or headphones
- Terminal access
I have tested on Raspberry Pi OS 64-bit (Bookworm), Ubuntu 22.04, and Windows 11 using WSL. Works on most modern 64-bit Linux environments.
Install Piper Using pip
Piper can now be installed easily via pip, but to avoid issues with system-managed Python environments, it’s best to use a virtual environment.
python3 -m venv .venv
source .venv/bin/activate
pip install piper-tts
Add a Voice
Piper uses ONNX-based voice models and needs both the .onnx
model and its matching .json
config file.
You can browse the full list of supported voices here:
For this example, let's download the English voice "Amy".
# Download the model file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx?download=true" \
-O en_US-amy-medium.onnx
# Download the corresponding config file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json?download=true" \
-O en_US-amy-medium.onnx.json
Place both files in the same folder. Piper uses the .json
file to determine things like speaking speed, noise level, and phoneme mappings.
Try It Out
Time for the magic. Let’s speak some text. You can generate a WAV file or stream it directly to your speakers.
Option 1: Save to file and play
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --output_file hello.wav
aplay hello.wav # or use your system's audio player
Option 2: Stream directly to aplay
echo "This sentence is spoken first. This sentence is synthesized while the first sentence is spoken." \
| piper -m en_US-amy-medium.onnx --output-raw \
| aplay -r 22050 -f S16_LE -t raw -
This streams raw audio directly to your speakers in real-time. Make sure the sample rate and format match your voice model.
On WSL, usewmplayer hello.wav
orpowershell -c (New-Object Media.SoundPlayer 'hello.wav').PlaySync()
.
Use a GPU (Optional)
If you'd like to use a GPU, install the onnxruntime-gpu
package inside your virtual environment. Then run Piper with the --cuda
flag.
# installs the onnxruntime-gpu package
.venv/bin/pip3 install onnxruntime-gpu
# runs piper using CUDA
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --cuda | aplay
You’ll need a working CUDA environment, such as what comes with NVIDIA's PyTorch containers.
Fine-Tune the Voice (Optional)
Too fast or too slow? Edit the en_US-amy-low.onnx.json
config file and tweak this value:
"phoneme_duration_scale": 1.2
Or for more fine-tuning:
"inference": {
"length_scale": 1.2,
"noise_scale": 0.5,
"noise_w": 0.6
}
Higherlength_scale
= slower speech. Try1.2
to1.5
.
Wrap Up
You now have a fast, local, private TTS engine — no internet, no API keys. Just raw, talking hardware.
Next steps?
- Use it in your scripts or Python apps
- Hook it into a voice assistant
- Clone your own voice (stay tuned for Tutorial #2!)
If this worked, give your machine a high-five — or better yet, have it say “Thank you!”