Central Institute of Indian Languages
   
Home Contact Us Site Map

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 




 

7.   Applications

  • Speech to Speech translation for a pair of Indian languages, namely, Hindi and Telugu.
  • Command and control applications. 
  • Multimodal interfaces to the computer in Indian languages.  
  • E-mail readers over the telephone,
  • Readers for the visually disadvantaged.
  • Speech enabled Office Suite.  

      The effort for both Speech Recognition and Speech Synthesis will be repeated across all 22 Scheduled languages. For Speech Recognition, spontaneous speech data will be collected along with read speech. For speech synthesis, data will be collected from professional speakers, with very good voice quality. Additional speech data will be collected to come out with models for prosody (intonation, duration, etc.) to improve the naturalness of synthesized speech. A database (lexicon) of proper names (of Indian origin) will be created, with the equivalent phonetic representation for each of the names.

 

II  Character Recognition

1.   Introduction

Character Recognition refers to the conversion of printed or handwritten characters to a machine-interpretable form, or in other terms, the “reading” of text. The term has been used to address three very distinct language technologies with different applications.

      “Online” handwriting recognition or Online HWR refers to the interpretation of handwriting captured dynamically using a handheld or tablet device. It allows the creation of more natural handwriting-based alternatives to keyboards for data entry in Indian scripts, and also for imparting of handwriting skills using computers.

      “Offline” handwriting recognition or Offline HWR refers to the interpretation of handwriting captured statically as an image. It can be used for the interpretation of handwriting already recorded on paper, ranging from filled-in forms to handwritten manuscripts. 

      Optical character recognition or OCR refers to the interpretation of printed text captured as an image. It can be used for conversion of printed or typewritten material such as books and documents into electronic form.

      These different areas of language technology require different algorithms and linguistic resources. However for convenience, they have been combined under the “character recognition” umbrella.  They are all hard research problems because of the variety of writing styles and fonts encountered. Of these, OCR has seen some research in a few Indian scripts because of support from the TDIL program. However the technology is not yet mature and there is only one commercial offering. Also, there are no common linguistic resources that can be used by the community.  The other areas of Online and Offline HWR have seen very little research overall in the context of Indian scripts and no linguistic resources exist.

2.   Objectives

Long-term objectives

    • Development of standards, tools and linguistic resources (datasets) for the fields of Online HWR, Offline HWR and OCR.
    • Promotion of development of these technologies.
    • Promotion of development of important and challenging applications of these technologies in the context of Indic languages and scripts.

 

This will be achieved in variety of ways:

  • Standards development will primarily be via a mixture of email discussions and face-to-face meetings of working group members organized under the aegis of LDC-IL.
  • Tool development will be given as projects to technology institutions with the necessary inclination, skills and resources.
  • Linguistic data collection, annotation and validation will be given as projects to linguistics/computational linguistics departments of Institutes and universities with the necessary inclination, skills and resources. However for each linguistic resource developed, validation will be performed by a different institution than the one doing the collection and annotation. Use of the linguistic resources for technology development will be promoted by arranging periodic competitions (for example, for recognition of online handwritten words in specific scripts) and by objective evaluation of performance.
<<Previous Next >>

 
Copyright © 2005. Central Institute of Indian Languages. All rights reserved worldwide.