This is a great application! I think it would require deployment of two models:

1 min readJul 6, 2019

Audio-to-Text (Mozilla’s DeepSpeech would be my starting point) https://github.com/mozilla/DeepSpeech
Neural Machine Translation (NMT) model, which is specialized to translate to/from a pair of languages. (https://www.tensorflow.org/beta/tutorials/text/nmt_with_attention)

After getting 1 and 2 working, you could improve further by making the NMT layer pluggable to support many language pairs.

Happy hacking!

Written by Leigh Johnson