Make Your Machine Talk: Piper TTS Offline

Table of Contents

Want to make your computer talk — using real, natural-sounding voices without needing the cloud?

In this tutorial, we’ll set up Piper TTS on a local system and give it a voice. It’s fast, offline, and perfect for voice assistants, robotics, Raspberry Pi projects, or just impressing your friends.

What You’ll Need #

Python 3 installed (recommended: Python 3.8 or newer)
Linux environment (x86_64 or ARM64) - or Windows with WSL
Internet connection (just for setup)
A speaker or headphones
Terminal access

I have tested on Raspberry Pi OS 64-bit (Bookworm), Ubuntu 22.04, and Windows 11 using WSL. Works on most modern 64-bit Linux environments.

Install Piper Using pip #

Piper can now be installed easily via pip, but to avoid issues with system-managed Python environments, it’s best to use a virtual environment.

python3 -m venv .venv
source .venv/bin/activate
pip install piper-tts

Add a Voice #

Piper uses ONNX-based voice models and needs both the .onnx model and its matching .json config file.

You can browse the full list of supported voices here:

For this example, let’s download the English voice “Amy”.

# Download the model file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx?download=true" \
     -O en_US-amy-medium.onnx

# Download the corresponding config file
wget "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json?download=true" \
     -O en_US-amy-medium.onnx.json

Place both files in the same folder. Piper uses the .json file to determine things like speaking speed, noise level, and phoneme mappings.

Try It Out #

Time for the magic. Let’s speak some text. You can generate a WAV file or stream it directly to your speakers.

Option 1: Save to file and play #

echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --output_file hello.wav
aplay hello.wav  # or use your system's audio player

Option 2: Stream directly to `aplay` #

echo "This sentence is spoken first. This sentence is synthesized while the first sentence is spoken." \
  | piper -m en_US-amy-medium.onnx --output-raw \
  | aplay -r 22050 -f S16_LE -t raw -

This streams raw audio directly to your speakers in real-time. Make sure the sample rate and format match your voice model.

On WSL, use wmplayer hello.wav or powershell -c (New-Object Media.SoundPlayer 'hello.wav').PlaySync().

Use a GPU (Optional) #

If you’d like to use a GPU, install the onnxruntime-gpu package inside your virtual environment. Then run Piper with the --cuda flag.

# installs the onnxruntime-gpu package
.venv/bin/pip3 install onnxruntime-gpu

# runs piper using CUDA
echo "Hello from your machine!" | piper -m en_US-amy-medium.onnx --cuda | aplay

You’ll need a working CUDA environment, such as what comes with NVIDIA’s PyTorch containers.

Fine-Tune the Voice (Optional) #

Too fast or too slow? Edit the en_US-amy-low.onnx.json config file and tweak this value:

"phoneme_duration_scale": 1.2

Or for more fine-tuning:

"inference": {
  "length_scale": 1.2,
  "noise_scale": 0.5,
  "noise_w": 0.6
}

Higher length_scale = slower speech. Try 1.2 to 1.5.

Wrap Up #

You now have a fast, local, private TTS engine — no internet, no API keys. Just raw, talking hardware.

Next steps?

Use it in your scripts or Python apps
Hook it into a voice assistant
Clone your own voice (stay tuned for Tutorial #2!)

If this worked, give your machine a high-five — or better yet, have it say “Thank you!”