How to Create a Perfect text to speech in python in 10 minutes with Coqui TTS

Hello everyone, in this article, we will discover one really interesting TTS (text-to-speech) open-source, easy-to-use, free-for-commercial use project from coqui-ai. We will create a simple text-to-speech synthesizer in less than 10 minutes that sound really human-like. Let’s get started!

You can find the complete video down there.

So there are really interesting text-to-speech packages that you can use for your own project, we have chosen to discover coqui in this series of 2 articles. We will maybe explore more packages later.

Let’s start here by exploring the package that you can find here: https://github.com/coqui-ai/TTS

You can install it with python pip3 or direct from the source. We will install it simply by typing:

pip install TTS

Then we have the module installed. We can start playing with it by typing for example:

tts --text "Hello from a machine" --out_path ./audio.wav

It will produce audio that we can use and listen to.

The next step will be to create a python project and integrate the TTS module inside, so you can easily integrate that into any python project that you can have. And for that, we need multiple steps.

First import the modules:

# import all the modules that we will need to use
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

Then download one of the pre-trained models (we can also train our own model, buttttt it takes time and resources)

path = "/path/to/pip/site-packages/TTS/.models.json"

model_manager = ModelManager(path)

model_path, config_path, model_item = model_manager.download_model("tts_models/en/ljspeech/tacotron2-DDC")

voc_path, voc_config_path, _ = model_manager.download_model(model_item["default_vocoder"])

The path should be replaced by the place where pip installs all the packages on your computer. And you can find it by typing

python -m site

Then comes the last part. We just instantiate the synthesizer and use it to read a given text that can come from request parameters or wherever you may need it to be:

syn = Synthesizer(
    tts_checkpoint=model_path,
    tts_config_path=config_path,
    vocoder_checkpoint=voc_path,
    vocoder_config=voc_config_path
)

text = "Hello from a machine"

outputs = syn.tts(text)
syn.save_wav(outputs, "audio-1.wav")

And here we are. you have a fully working text-to-speech synthesizer ready to be used.

In the next story, we will go further by making it sound better, creating a flask app and deploying it on a server using docker. So stay tuned


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *