Character Recognition refers to the conversion of printed or handwritten characters to a machine-interpretable form, or in other terms, the “reading” of text. The term has been used to address three very distinct language technologies with different applications.
“Online” handwriting recognition or Online HWR refers to the interpretation of handwriting captured dynamically using a handheld or tablet device. It allows the creation of more natural handwriting-based alternatives to keyboards for data entry in Indian scripts, and also for imparting of handwriting skills using computers.
“Offline” handwriting recognition or Offline HWR refers to the interpretation of handwriting captured statically as an image. It can be used for the interpretation of handwriting already recorded on paper, ranging from filled-in forms to handwritten manuscripts.
Optical character recognition or OCR refers to the interpretation of printed text captured as an image. It can be used for conversion of printed or typewritten material such as books and documents into electronic form.
These different areas of language technology require different algorithms and linguistic resources. However for convenience, they have been combined under the “character recognition” umbrella. They are all hard research problems because of the variety of writing styles and fonts encountered. Of these, OCR has seen some research in a few Indian scripts because of support from the TDIL program. However the technology is not yet mature and there is only one commercial offering. Also, there are no common linguistic resources that can be used by the community. The other areas of Online and Offline HWR have seen very little research overall in the context of Indian scripts and no linguistic resources exist. |