Speech Translation

End-to-end active speech translation system.

March 25, 2020 · 1 min read

As a proof of concept, developed and deployed a bi-directional Mandarin-to-English active translation pipeline. The pipeline consisted of three components speech-to-text, neural machine translation and text-to-speech. Worked with Mozilla’s DeepSpeech 2 for speech recognition, output of which was processed by Fairseq’s convolutional encoder + BiLSTM decoder neural machine translation model and then converted back to audio using Tacotron 2 for speech synthesis.