Central Institute of Indian Languages
   
Home Contact Us Site Map

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 




 

4.   Long Term Goal

The grand vision of this project is to collect data to provide speech-to-speech translation from each and every language to each and every other language spoken in India (including Indian English). Such a system would include unlimited vocabulary speech synthesis and recognition systems for every Indian language coupled with machine translation systems between those languages. The block diagram given below describes the basic architecture of such a system.

Speech input in language A Speech Recognition in Language ARecognized Text in Language A         

    

5.   Short Term Goal

To create databases for building (a) bi-directional speech to speech translation system of read speech for a pair of Indian languages, namely, Hindi-Telugu, (b) a speech recognition system for Indian English. Further, it is desired to collect large vocabulary isolated data for the 22 Scheduled Indian languages.

Speech Recognition in Language A

Text in Language A

Machine Translation from Language A to b

Translated Text in Language B           

Text to Speech conversion in Language B

Speech Output in Language B

 

 

 

6.   Methodology for Short Term Effort

Methodologies for data collection and development of tools required for the short-term and long-term goals are given below:

Data collection Effort for Automatic Speech Recognition (ASR)

The data collection effort will involve collection of read and spontaneous speech.

Data required
      Read speech corpora for two Indian languages and Indian English.

Channels

  1. Close talking microphone, on a desktop or laptop.
  2. Telephone, both landline and mobile.

Annotation
The data will be annotated at phoneme, syllable, word and sentence levels.

Data Collection for Isolated Speech Recognition

Channels

  1. Close talking microphone, on a desktop or laptop
  2. Telephone, both landline and mobile

Demography
      10,000 words from 300 speakers (150 male, 150 female)

Data Collection for Text to Speech Synthesis

Data Required
Data will be collected in the form of read-out phonetically balanced text which will ensure coverage of all speech sounds of the language concerned in different prosodic and phonological contexts. The phonetically balanced text will be extracted from a huge text corpus.

Channels
Speech Synthesis requires high quality recording in an anechoic chamber using high quality microphones and recording equipment.

Demography
      6 speakers: 3 males and 3 females per language.

Annotation
Data to be annotated at phone, phoneme, syllable, word, and phrase level.

Tools Required for Data Collection and Annotation for ASR and TTS

      Standardization of tools for data collection and annotation is required. Number of different tools are available for annotation of speech data like EMULAB and PRAAT.  Convergence on representation, annotation and storage format is required. However, the project will also focus on providing converters across different formats.

Tools for Speech Recognition

      The data will be annotated at phoneme, syllable, word and sentence levels. Tools need to be developed for semiautomatic annotation of speech data. These tools will also be useful for annotating speech synthesis databases. One could adopt LDC’s recording of data in the NIST format. This format is comprehensive in that it contains ALL the information about the recording environment, speaker information, sampling rate, number of channels, number of bits/sample, etc.

Tools for Speech Synthesis
     
      Other than tools for annotating speech databases, text annotation is also required for speech synthesis:

Text Annotation Tools Required

  1. PoS taggers, phrase boundary markers and intonation markers.
  2. Identification and standardization of feature vectors for speech synthesis, vocalic/non-vocalic, pause break, distance from a pause break, duration of break, characteristics of the intonation across the entire sound unit.

<<Previous Next >>
 
Copyright © 2005. Central Institute of Indian Languages. All rights reserved worldwide.