Data required
Read speech corpora for two Indian languages and Indian English.
Channels
- Close talking microphone, on a desktop or laptop.
- Telephone, both landline and mobile.
Annotation
The data will be annotated at phoneme, syllable, word and sentence levels.
Data Collection for Isolated Speech Recognition
Channels
- Close talking microphone, on a desktop or laptop
- Telephone, both landline and mobile
Demography
10,000 words from 300 speakers (150 male, 150 female)
Data Collection for Text to Speech Synthesis
Data Required
Data will be collected in the form of read-out phonetically balanced text which will ensure coverage of all speech sounds of the language concerned in different prosodic and phonological contexts. The phonetically balanced text will be extracted from a huge text corpus.
Channels
Speech Synthesis requires high quality recording in an anechoic chamber using high quality microphones and recording equipment.
Demography
6 speakers: 3 males and 3 females per language.
Annotation
Data to be annotated at phone, phoneme, syllable, word, and phrase level.
Tools Required for Data Collection and Annotation for ASR and TTS
Standardization of tools for data collection and annotation is required. Number of different tools are available for annotation of speech data like EMULAB and PRAAT. Convergence on representation, annotation and storage format is required. However, the project will also focus on providing converters across different formats.
Tools for Speech Recognition
The data will be annotated at phoneme, syllable, word and sentence levels. Tools need to be developed for semiautomatic annotation of speech data. These tools will also be useful for annotating speech synthesis databases. One could adopt LDC’s recording of data in the NIST format. This format is comprehensive in that it contains ALL the information about the recording environment, speaker information, sampling rate, number of channels, number of bits/sample, etc.
Tools for Speech Synthesis
Other than tools for annotating speech databases, text annotation is also required for speech synthesis:
Text Annotation Tools Required
- PoS taggers, phrase boundary markers and intonation markers.
- Identification and standardization of feature vectors for speech synthesis, vocalic/non-vocalic, pause break, distance from a pause break, duration of break, characteristics of the intonation across the entire sound unit.
|