Make Your Machine Talk: Piper TTS Offline

Want to make your computer talk — using real, natural-sounding voices without needing the cloud?

In this tutorial, we’ll set up Piper TTS on a local system and give it a voice. It’s fast, offline, and perfect for voice assistants, robotics, Raspberry Pi projects, or just impressing your friends.


What You’ll Need

  • Python 3 installed (recommended: Python 3.8 or newer)
  • Linux environment (x86_64 or ARM64) - or Windows with WSL
  • Internet connection (just for setup)
  • A speaker or headphones
  • Terminal access
I have tested on Raspberry Pi OS 64-bit (Bookworm), Ubuntu 22.04, and Windows 11 using WSL. Works on most modern 64-bit Linux environments.

Install Piper Using pip

Piper can now be installed easily via pip, but to avoid issues with system-managed Python environments, it’s best to use a virtual environment.

python3 -m venv .venv
source .venv/bin/activate
pip install piper-tts

Add a Voice

Piper uses ONNX-based voice models and needs both the .onnx model and its matching .json config file.

You can browse the full list of supported voices here:

For this example, let's download the English voice "Amy".

# Download the model file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx?download=true" \
     -O en_US-amy-medium.onnx

# Download the corresponding config file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json?download=true" \
     -O en_US-amy-medium.onnx.json

Place both files in the same folder. Piper uses the .json file to determine things like speaking speed, noise level, and phoneme mappings.


Try It Out

Time for the magic. Let’s speak some text. You can generate a WAV file or stream it directly to your speakers.

Option 1: Save to file and play

echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --output_file hello.wav
aplay hello.wav  # or use your system's audio player

Option 2: Stream directly to aplay

echo "This sentence is spoken first. This sentence is synthesized while the first sentence is spoken." \
  | piper -m en_US-amy-medium.onnx --output-raw \
  | aplay -r 22050 -f S16_LE -t raw -

This streams raw audio directly to your speakers in real-time. Make sure the sample rate and format match your voice model.

On WSL, use wmplayer hello.wav or powershell -c (New-Object Media.SoundPlayer 'hello.wav').PlaySync().

Use a GPU (Optional)

If you'd like to use a GPU, install the onnxruntime-gpu package inside your virtual environment. Then run Piper with the --cuda flag.

# installs the onnxruntime-gpu package
.venv/bin/pip3 install onnxruntime-gpu

# runs piper using CUDA
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --cuda | aplay
You’ll need a working CUDA environment, such as what comes with NVIDIA's PyTorch containers.

Fine-Tune the Voice (Optional)

Too fast or too slow? Edit the en_US-amy-low.onnx.json config file and tweak this value:

"phoneme_duration_scale": 1.2

Or for more fine-tuning:

"inference": {
  "length_scale": 1.2,
  "noise_scale": 0.5,
  "noise_w": 0.6
}
Higher length_scale = slower speech. Try 1.2 to 1.5.

Wrap Up

You now have a fast, local, private TTS engine — no internet, no API keys. Just raw, talking hardware.

Next steps?

  • Use it in your scripts or Python apps
  • Hook it into a voice assistant
  • Clone your own voice (stay tuned for Tutorial #2!)

If this worked, give your machine a high-five — or better yet, have it say “Thank you!”

Love Discord?