Tacotron 2 For Persian language

 · 1 min read
 · Nima Moradi
Table of contents

Tacotron 2

Pytorch implementation of DeepMind's Tacotron-2 : Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions

Folder Structure

└───tacotron2
    ├───content
    │   └───tacotron2
    │       └───filelists
    ├───filelists
    ├───outdir
    │   └───logdir
    ├───text
    │   ├───data_prepare
    │   └───__pycache__
    └───waveglow
    

Setup

  • Step (0): Get your dataset; for persain lauguge the only open source dataset is Mozilla common voice.
  • Step (0.1):note you can use our own dataset too here is kaggle link

  • Step (1): add your own test and train data parameters in filelists/. because mozilla audio is more than 211 h of audio we procced only small portion of it, convert to wave and remove files more than 10 seconds in length, you can see them in filelists.

  • Step (2): Install python requirements or build docker image
    • Install python requirements: pip install -r requirements.txt
  • Step (3): Install cuda and pytorch 1.0 .
  • Step (4): Train the model using this command.
python train.py --output_directory='/content/tts-engine/gdrive/My Drive/outdir' --log_directory='/content/tts-engine/gdrive/My Drive/logdir'
  • Step (5): Synthesize audio using tts-engine/tacotron2/inference.ipynb.

Audio samples

I listed some of audio the model genarated you can listen them in soundcloud.

Model

The model described by the authors can be divided in two parts: - Spectrogram prediction network - Wavenet vocoder