L
L S I L A
S

Documenting Indian Languages

The LIS-India Experience

S.S.Bhattacharya

B.D.Jayaram

N.H.Itagi

      Language Information Services (LIS)- India is envisaged as a web based, multi-media, comprehensive, authentic and an on going information source in the public domain on the Indian languages. In terms of its coverage of content and languages, it is envisaged to address all possible questions that one would like to ask about any or all the Indian languages including English- their grammar, history, functions, scripts, the number of speakers and their spread including diasporas, bi or multilingualism, literacy and education, language technologies and digitracy, literatures and litterateurs including translations and translators, all linguistic artifacts from sign boards and place names to books, news papers, periodicals and other mass media, etc. that would be of interest to a layman as well as a linguist or any other specialist interested in the Indian Linguistic landscape with commonalities and differences, its richness, dynamics and vitality. It is envisaged to be useful for language planning and social development with a scope for social engineering.

      Now a century old Sir George Grierson's monumental Linguistic Survey of India (LSI) (1897-1927) remains to date the only nearly pan Indian survey giving descriptive account of the grammatical and other aspects of most of the Indian languages. In the post-Independence period, there have been several other pan Indian surveys but limited in scope e.g. the survey of the written languages of India, dealing with the degree and modes of use by B.P Mahapatra, G.D.McConnel and S.S. Bhattacharya of the Language Division of the Office of the Registrar General of India; a survey of Indian languages and scripts as part of the People of India project on Indian communities by the Anthropological Survey of India under the leadership of Kumar Suresh Singh; a survey of English in India by Rama Kant Agnihotri and A.L Khanna, problematising the status of English in India; and some regional surveys such as the linguistic survey of the Punjab by Harjeet Singh Gill, dealing with dialect variation in Punjabi; Dialect survey of Marathi by A. M. Ghatge of the Deccan College, Pune; a survey of language use in Himachal Pradesh by the , Mysore, under the leadership of Bal Gobind Misra and H.R. Dua; an encyclopedic survey of the Dravidian languages by the International School of Dravidian Languages, under the leadership of V. I. Subramoniam; etc. apart from the numerous studies of various kind on many of the specific Indian languages in pre and post Independence period in India and abroad. A special mention must be nade of the

      Grierson's LSI, though monumental is, nevertheless, often found partial, inadequate and outdated in terms of coverage and content. A replication of the Survey, a dire need though, in the changed circumstances at the beginning of this new millennium, would mean an entirely new enterprise in form, method and content although there is also an attempt at a new LSI, by the Language Division of the Census, with modern methodology but covering mainly the grammatical structure of all the languages and dialects of the major languages by State with the volume on Orissa being published and those on Dadra & Nagar Haveli and Sikkim under publication. Language Information Services (LIS)- India , a major Project of the (CIIL) started under the country's X Plan is envisaged to meet this need. Taking advantage of the fairly vast pool of linguistics expert human resources developed over the decades in the country, advances in knowledge and the developments of digital technology, it is envisaged as a national, ongoing and a cumulative process as compared to once in a period of time phenomenon.

      In terms of geographical coverage, the LSI excluded from the sphere of its operation the then Provinces of Madras and Burma , the States of Hyderabad and Mysore though the fourth volume of the Survey was devoted to the Munda and Dravidian Languages. LIS would also cover all Indian languages not at once as in the case of Grierson's LSI but in phases. Any work of this type has to start with an existing inventory of languages. In case of India there has been a happy coincidence of language censuses which are in some sense general surveys and the specific linguistic surveys. Grierson's LSI started as it was in 1894 could start the survey with an inventory of languages made available largely by the 1891 Census. Information on the dialects was obtained through the local officers based on their local knowledge and local inquiries with the result that there was bound to be a discrepancy especially as to the latter between the Census and the Survey which continues even to date. LSI recorded and described as many as 723 speech forms (179 languages and 544 dialects). The LIS starts with the list of 114 languages (Indo-Eurpean-20, Dravidian-17, Austro-Asiatic-14, Tibeto-Burman-62) and Others-1(Arabic of the Semito-Hamitic), inclusive of the 102 speech forms with a total number of speakers of 10,000 or more at the all-India level grouped under them as per the 1991 Census, but would not be confined to it for, some of the smaller/minor/endangered languages need to be covered as well. (Map-1)

      Presently the following 55 languages have been taken up and of these reports on 27 languages with varying degrees of coverage of different aspects have been received so far.

Sl.No.

Languages Taken up for Content Development Under LIS-India

 

Austro-Asiatic

1

Gadaba

2

Ho*

3

Juang*

4

Kharia

5

Khasi*

6

Korku*

7

Mundari

8

Nicobarese

9

Santali

 

Dravidian

10

Gondi

11

Kannada*

12

Kodava

13

Kolami

14

Kurux

15

Malayalam*

16

Tamil*

17

Telugu

18

Tulu

 

Indo-Aryan/Germanic

19

Assamese*

20

Bengali*

21

Bhili*

22

Bishnupuria

23

Dogri*

24

Gujarati

25

Halabi

26

Hindi

27

Kashmiri*

28

Konkani

29

Kului

30

Kumauni*

31

Lahnda*

32

Maithili

33

Marathi*

34

Nepali*

35

Oriya*

36

Punjabi*

37

Sanskrit*

38

Sindhi*

39

Urdu*

40

Indian English*

 

Tibeto-Burman

41

Angami

42

Ao

43

Bodo/Boro*

44

Galo

45

Kinnauri

46

Kok Borok (Tripuri)

47

Kom*

48

Lahuli

49

Lepcha

50

Manipuri*

51

Mao

52

Miri*

53

Mizo/Lushai*

54

Nissi

55

Rabha

* Data received so far on different aspects in different proportions.


      In terms of content, LSI was mainly concerned with the grammatical description of the languages and dialects though it gave information on the number of speakers based on the first Census of 1891 and their spread, a brief history of the language and literature. His main consideration in giving information on these related aspects were basically linguistic in that as he rightly thought that the claim to the difference between language and dialect is not merely a matter of mutual intelligibility of the codes but also of nationality and literature with a course of history of their own. The redefinition presently of the boundaries of some of the languages like Dogri and Maithili vis-a- vis Punjabi and Hindi is a vindication of his observations regarding the perennial question of language and dialect as early as in the beginning of the last century. Nevertheless, or because of this, the information on the other aspects related to languages remained scanty in his survey while it is also true that bulk of literature especially of certain genres like prose in general and novels in particular in the modern Indian languages is a phenomena of the of the 20 th century. Be that as it may, the point is that his survey was mainly linguistic and was aptly called so. The present programme is rightly called Language Information Service, not merely for certain politico-administratively technical matters or for being in sync with the present times, but in terms of coverage of content as well. Under LIS are covered 12 broad aspects of History and Classification, Structure, Variation, Script and Spelling, Speech Community, Demography, Language Management, Literature, Language Use, Culture, Technology and Annotated Bibliography the details of which were worked out in a meeting of experts after briefly reviewing the methodology and formats of previous such attempts starting from that of LSI and mentioned earlier in the beginning. The data as per the content list is collected or generated from secondary and primary sources with field work wherever necessary by competent persons under the supervision of an expert generally in Linguistics/Language from different parts of the country as compared to LSI which was based on three specimens of data namely translation of the Parable of the Prodigal Son, the passage locally selected and the list of words and sentences collected by local administrative machinery of district officers. LIS therefore has to be more authentic. These reports after editing would be made available with proper search options on the Institute's website in the public domain. These reports would be continually revised and updated as the technology easily allows this. The technology also allows for a multi-media documentation and presentation of these language related aspects with maps generated though Geographical Information System (GIS), spectrograms, besides the other audio visual materials.

      LIS could with its wide coverage be a rich source of information in the exploration of India as a linguistic, socio-linguistic, literary, cultural, and a semantic area. Apparently, the emergence of India, though not the Indian sub-continent with which the term 'Indian Linguistic Area' is associated, as a geopolitical entity has facilitated greater sharing of sociolinguistic features though the redrawing of the boundaries of its internal constituent geo-political units on linguistic basis also seems to have promoted the consolidation of the boundaries of major languages in terms of identity and language use. The languages of different language families traditionally identified mainly with certain regions are more and more spread out as the speakers of these languages move out from their traditional homelands into different parts of the country with increased contact with languages of other language families. The four maps from the 1991 Census Language Atlas showing the distribution of the four language families in terms of their speakers reveal that most of the states in India formed mostly on linguistic/cultural basis have sizable populations of speakers of different language families leading to increased contact and bilingualism facilitating sharing of different features. Figures of language-family wise bilingualism and trilingualism are yet to be to be worked out. But the general incidence of bilingualism has risen from 9.70% in 1961 to 19.44% in 1991 with the incidence of trilingualism (tabulated for the first time) of 7.9 % . In the states and union territories the incidence of bililingualism ranges from 9.58% in Rajasthan and Uttar Pradesh to 66.86% in Goa . It is significant to note that the incidence of bilingualism increases from the "inner" regions of the Indo-Aryan to the "outer" and non-Indo-Arayan regions with the exception of Himachal Pradesh, Punjab and Haryana. Similar is the case with trilingualism ranging from 1.67% in Rajasthan to 42.55% in Sikkim . Even more significant would be the incidence of bilingualism in the total Scheduled tribe population ranging from 6.46% in Rajasthan to 63.50% in Sikkim and trilingualism from 1.22% in Uttar Pradesh to 26.53% in Sikkim . This pattern of the incidence and distribution of bi/trilingualism is significant not merely for language contact and convergence but even language loss as well especially in the case of the speakers of minor and tribal languages though in the Indian context as a whole, as is observed, language maintenance is the norm and loss an exception. The pattern of distribution of the incidence of bilingualism/trilingualism noted above may vindicate what Grierson had observed that "In India, the Indo-Aryan languages- the tongues of civilization and of the caste system with all the power and superiority which that system confirms upon those who live under its sway- are continually superceding what may, for shortness be called aboriginal languages such as those belonging to the Dravidian, the Munda and the Tibeto-Burman families..It may be added that nowhere do we see the reverse process of a non-Aryan language superseding an Aryan. It is even rare for one Aryan-speaking nationality to abandon its language in favour of another Aryan tongue. We continually find tracts of country on the borderland between two languages, which are inhabited by communities, living side by side and each speaking its own language." However, the situations of language contact and convergence or language loss and maintenance could be said to involve major, whether Indo-Aryan or Dravidian, and minor/tribal languages. The average incidence of bilingualism among the speakers of the 18 Scheduled languages- 18.72% ranges from 11.01 % among Hindi speakers to 74.20% among the Konkanis as compared to that of 38.14% among 96 non-Scheduled languages ranging from 9.88% among the speakers of Lushai/Mizo to 86.46% among the speakers of Coorgi/Kodagu. Similarly the average incidence of trilingualism among the speakers of the 18 Scheduled languages- 7.26% ranges from 2.16% among the Tamil speakers to 44.48% among the Konkanis as compared to that of 7.26% among the 96 non Scheduled languages ranging from 0.9 % among the Juang speakers to 49.0% among the Coorgi/Kodagu speakers. English and Hindi are the preferred subsidiary languages with 8% and 3.15% of the Scheduled language speakers having reported English as the second and third language known and 6.16% and 2.60% having reported Hindi as the second or third language known. This in sum is a broad and skeletal synchronic demographic context of language contact of an emergent India as linguistic area. LIS-India is in itself not an inventory of commonalities or shared features - linguistic, socio-linguistic, literary, cultural and semantic- across languages and speech communities of different origin owing to their diffusion in the context of cohabitation and contact over the millennia within a marked geophysical-ecological entity- the Indian sub-continent, but could only be a source to look for these by the interested scholars. For the present, such an exercise has not been feasible as the data being received is in different proportions and is awaiting editing, consolidation, collation and scrutiny.

      One may end this report on a note which may not be unwarranted and certainly not preempted but not to be lost sight of- that though India has emerged as a linguistic area over the millennia, it is not without anomalies and asymmetries starting from an asymmetrical relation between, to mention only a few, the family wise distribution of the number of languages and their speakers as noted earlier, the Indian Linguistic area as comprising stable and fluid zones in terms of language identities as noted by Prof. Khubchandani and between the structural and lexical borrowing between the Indo-Aryan and the Dravidian as observed by Prof. Rajendra Singh.





 

Back TOP