Experiment of building parallel voice corpus with freely available parallel text corpora and Google TTS service.
I run for several languages including Bengali-English, French-English, German-English, Hindi-English, Italian-English, Japanese-English, Spanish-English, Tamil-English. Generally speaking, it worked well and I am sure it will be useful for many research purposes. Of course, the quality of TTS output files (.mp3) will depends on the freely available TTS services that you used and you need some preprocessing (e.g. checking, editing parallel corpus) and post-editing (e.g. removing bad .mp3 files, listening parallel voice files or not) steps like in other NLP works. I will share bash shell scripts that I wrote and some parallel voice data or TTS outputs for your reference.
Thank you for visiting my GitHub and I am very happy if this small experiment will help you in some way!
Ye@Lab 4 March 2018
For English-Chinese parallel-voice, I used μtopia - Microblog Translated Posts Parallel Corpus (Release V1.1 - 19/09/2013) Check this link: http://www.cs.cmu.edu/~lingwang/microtopia/ Please note, I uploaded only 1,000 parallel mp3 files of English and Chinese language pair.
You can download and listen some more example parallel voice data and their brief information are as follows:
Bengali-English (dictionary word pairs): bn-en-dict-indian/ (62 mp3 files, 31 Bengali-English word pairs)
This is a Python implementation for Nonparametric Bayesian Double Articulation Analyzer (NPB-DAA). The NPB-DAA can directly acquire language and acoustic models from observed continuous speech signals.
This generative model is called hiererichel Dirichlet process hidden language model (HDP-HLM), which is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM.
Description
・NPB_DAA/README - There is a NPB-DAA tutorial in PDF.(In Japanese. English version is coming soon.)
・NPB_DAA/pyhsmm - Python Library for HDP-HSMM. You can get it at [ https://github.com/mattjj/pyhsmm ]. (Please check this VERSION at README)
・NPB_DAA/dahsmm - Python code for NPB-DAA
Requirement
・Ubuntu 12.04.5 LTS
sudo apt-get install
・python 2.7.3
・numpy 1.6.1
・matplotlib 1.1.1rc
・scipy 0.9.0
・scikit-learn 0.10
sudo pip install
・Paver 1.2.4
・pyzmq==14.4.0/14.5.0/14.6.0 (14.6.0)
・ipython 3.2.1
Usage
Tutroial is put in NPB_DAA/README. Please read it. (In Japanese. English version is coming soon.)
Troubleshooting
If you are in trouble, please look at this document. You can get information about environment operability confirmed and actions on error occuring often. Troubleshooting document
Spectrograms of Myanmar consonant and vowel audios that I recorded at University of Computer Studies Banmaw (UC Banmaw), Banmaw, Kachin State, Myanmar, November 2018. For the 22 consonants and 12 vowels, I prepared audio files by recording 45 female and 37 male speakers, including native speakers as well as other national ethnic races such as Kachin, Shan, and Rakhine, using a MacBook Air built-in microphone. The spectrograms of the audio files were extracted using the Sound eXchange (SoX, Swiss Army knife of sound processing programs) command line utility. I used these spectrograms for the Transfer Learning experiments.
Please note recording was done inside a hall and containing some noises such as mouse click sound.
Ye Kyaw Thu, Win Pa Pa, Andrew Finch, Aye Mya Hlaing, Hay Mar Soe Naing, Eiichiro Sumita and Chiori Hori, "Syllable Pronunciation Features for Myanmar Grapheme to Phoneme Conversion", In Proceedings of the 13th International Conference on Computer Applications (ICCA 2015), February 5~6, 2015, Yangon, Myanmar, pp. 161-167. Paper [Best Paper Award]
Ye Kyaw Thu, Win Pa Pa, Andrew Finch, Jinfu Ni, Eiichiro Sumita and Chiori Hori, 2015, "The Application of Phrase Based Statistical Machine Translation Techniques to Myanmar Grapheme to Phoneme Conversion", In Proceedings of the Pacific Association for Computational Linguistics Conference (PACLING 2015), May 19~21, 2015, Legian, Bali, Indonesia, pp. 170-176. Paper (revised paper has been published in Springer Communication in Computer and Information Science (CCIS), ISSN:1865-0929, pp. 238-250) ☝️We used myG2P dictionary + extracted 5,276 sentences of BTEC corpus for this PACLING 2015 conference paper
Ye Kyaw Thu, Win Pa Pa, Yoshinori Sagisaka, Naoto Iwahashi, "Comparison of Grapheme–to–Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary", In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), COLING 2016, December 11-17, 2016, Osaka, Japan, pp. 11–22. Paper