|
Abstracts
1. Evolution of WordNet-like Lexicon
Yu Jiangsheng
The various specific applications of the WordNet-like lexicon in NLP, undoubtedly from the viewpoint of Computational Lexicology, require the diversification of its semantic representations. That is, the structure of the concept net must be changed according to a precise purpose sometimes. Since one can delete a node or subtree from the original lexicon, with the evolution of the WordNet-like lexicon, an approach to some analysis of its knowledge structure seems necessary. The author defined the degree of structural destruction for the evolution of a WordNet-like lexicon as an explicit standard to restrict the variation of knowledge structure, which is related to the well-defined deductive rules in the lexicon. Lastly, the visualized auxiliary construction of Chinese Concept Dictionary was introduced as an approach to a WordNet-like lexicon.
2. Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet
Marine Carpuat, Grace Ngai,
Pascale Fung, Kenneth W. Church
The growing importance of multilingual information retrieval and machine translation has made multilingual ontologies an extremely valuable resource. Since the construction of an ontology from scratch is a very expensive and time consuming undertaking, it is attractive to explore ways of automatically aligning monolingual ontologies which already exist. This paper presents a language-independent, corpus-based method that borrows from techniques used in information retrieval and machine translation, for creating a bilingual ontology by aligning WordNet with an existing Chinese ontology called HowNet. We will present results to show that our method is capable of efficiently aligning ontologies with very different structures, as well as ontologies from languages that are very different from each other.
3. Visualizing WordNet Structure
Jaap Kamps
Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn, is characterized by simply listing the word forms that can be used to express it in a synonym set (synset). As a result, the meaning a word in WordNet is determined by its sets of synonyms. This is essentially a recursive definition of word meaning. Hence meaning in WordNet is a structural notion: the meaning of a concept is determined by its position relative to the other words in the larger WordNet structure. We have implemented a set of scripts that visualize the WordNet structure from the vantage point of a particular word in the database.
4. Words with Attitude
Jaap Kamps, Maarten Marx
The traditional notion of word meaning used in natural language processing is literal or lexical meaning as used in dictionaries and lexicons. This relatively objective notion of lexical meaning is different from more subjective notions of emotive or affective meaning. Our aim is to come to grips with subjective aspects of meaning expressed in written texts, such as the attitude or value expressed in them. This paper explores how the structure of the WordNet lexical database might be used to assess affective or emotive meaning. In particular, we construct measures based on Osgood’s semantic differential technique.
5. An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery
Enrique Alfonseca, Suresh Mahandhar
Knowledge Acquisition is still the bottleneck in building many kinds of applications, such as inference engines. We describe here a procedure to automatically extend an ontology with domain-specific knowledge. The main advantage of our approach is that it is completely unsupervised, so it can be applied to different languages and domains. Our initial results have been highly successful and we believe that with some improvement in accuracy it can be applied to large ontologies.
6. Distinguishing Concepts and Instances in WordNet
Enrique Alfonseca, Suresh Mahandhar
Many lexical databases make a distinction between concepts (synsets that represent a class of things of interest) and instances (examples of concepts). However, that information is not present in WordNet. We use empirical evidence that concepts and instances are treated in different ways in language, to show that the distinction is not merely theoretical, but that it also affects how a word is used. We also describe several NLP applications that could benefit from it, and propose a criterion to annotate WordNet with that information.
7. Chinese Characters and Top Ontology in EuroWordNet
Shun Ha Sylvia Wong, Karel Pala
In this paper we continue the work by Wong & Pala (2001) and compare a selected collection of Chinese radicals, Chinese characters and their meanings with Top Ontology (TO) developed in the framework of EuroWordNet 1, 2 project (EWN). The main attention is paid to the exploration of the way(s) concepts are organized in the Chinese language and how such organization differs from and is similar to that in EWN TO. The two main issues examined are: how the basic concepts in the EWN 3rdOrderEntities are represented in Chinese and the domains in which concepts are grouped under each Chinese radicals. The result of the present study sheds light on how to improve the organization of basic concepts in existing ontologies. We discuss what potential implications this organization may have on the future development of EWN.
8. EUROTERM: Extending EWN using both the Expand and Merge Model
Stamou Sofia, Ntoulas Alexandros,
Hoppenbrouwers Jeroen,
Saiz-Noeda Maximiliano,
Christodoulakis Dimitris
EuroTerm aims at expanding EuroWordNet with domain specific terminology for a set of European languages. EuroWordNet is a lexical database representing semantic relations among basic concepts for West European languages, which are combined with a so-called Inter-Lingual-Index. EuroTerm’s main purpose is to combine effectively multilingual domain specific terminology into a common lexical database through a Terminology Alignment System, in order to expand EuroWordNet and the Inter- Lingual-Index with terms restricted to the conceptual domain of environment.
9. Expanding EWN with Domain-Specific Terminology Using Common Lexical Resources: Vocabulary Completeness and Coverage Issues
Stamou Sofia, Ntoulas Alexandros,
Kyriakopoulou Maria, Christodoulakis Dimitris
EuroTerm is a multilingual semantic network comprising domain-specific terminology for Greek, Dutch and Spanish, which will be linked to the EuroWordNet lexical database. Two approaches have been widely adopted for the development of WordNets, namely the merge and the expand model. The former is considered as the one that ensures a better representation of language particularities in a lexical database whereas the latter assures sufficient overlap in the coverage of WordNets. For the development of EuroTerm a combination of both models was followed in order to ensure vocabulary completeness and coverage across concepts.
10. Bulgarian WordNet as a Source for (Psycho) Linguistic Studies
Krassimira Petrova, Toma Nikolov
The lexical items for “Time” and “Periods of time” from WordNet and newly built core for nouns for Bulgarian WordNet are used as a source for cross-language comparison of concepts, lexemes, and lexical-semantic relations in Bulgarian and English.
11. VisDic - A New Tool for WordNet Editing
Tomas Pavelek, Karel Pala
This contribution describes a new tool (named VisDic) for browsing and editing WordNet databases. It was developed in the Natural Language Processing Laboratory at the Faculty of Informatics, Masaryk University. In fact, it is not designed as a specialized tool for processing WordNet data only, generally, it has been developed as a tool for viewing and editing any lexical database as e.g. multilingual dictionaries, monolingual dictionaries, corpora, etc. From this point of view, WordNet can be also understood as a dictionary with special features.
12. A graphical tool for browsing, searching and annotating WordNet
Arthur Cater
The paper reports on a tool for working with WordNet. The tool has four major components with interacting features. Its “Tangle Browser” summarizes the index information for an input word and related multiwords, and gives one-click access to graphical mouse-sensitive representations of the indexed synsets. Using it, portions of the WordNet linked to a synset may be quickly apprehended. The tool’s “Tree Browser” shows selected parts of the collections of noun or verb synsets1, organized as a tree based on the hypernym relation commonplace in those kinds of synset. The tool’s “Scanner” allows synsets having virtually unrestricted combinations of properties to be identified. The tool’s “Annotator” allows additional information to be recorded for synsets, in a systematic way whilst also allowing freestyle commentary. The tool may be used with versions 1.5, 1.6 and 1.7 of WordNet.
The term “synset” is a standard abbreviation for “synonym set”. Each distinguished sense of a word is grouped with nearly synonymous senses of other words, and a variety of information is provided which generally applies to all grouped word-senses. See Fellbaum (1998) for further details.
13. Cleaning-up WordNet's Top-Level
Aldo Gangemi, Nicola Guarino,
Alessandro Oltramari, Stefano Borgo
In this paper we propose an analysis and an upgrade of WordNet's top-level synset taxonomy of nouns. We briefly review WordNet and identify its main semantic limitations. Some principles from a forthcoming OntoClean methodology are applied to the ontological analysis of WordNet. A revised top-level taxonomy is proposed, which is meant to be conceptually more rigorous, cognitively transparent, and efficiently exploitable in several applications. This work is a revision and extension of [3].
14. An Ontology and a Semantic Network for Danish Time Adverbs- based on the Simple Lexicon Model
Sanni Nimb
The aim of a recently initiated ph.d.-project on Danish adverbs is to give a semantic lexical description of Danish lexical adverbs, in order to extend a Danish computational lexicon – the SIMPLE lexicon - with this word category. The Danish SIMPLE lexicon contain encodings of approx. 10,000 word senses, performed on the basis of a unified, ontology-based semantic model representing an extended qualia structure. In this paper we describe the classification of approx. 120 lexical time adverbs into different subtypes, and the establishment of a corresponding subontology of these adverbs according to the SIMPLE model. It is shown how some of the adverbs inherit information from several nodes in the ontology. Finally we discuss how the semantic relations and features used for the description of time and Aktionsart, which are already implemented in the SIMPLE lexicons, can be used as well in the lexical description of time adverbs, and how semantic relations between the different adverbs can be encoded in the lexical entry.
15. Characterizing the Definitions of Anatomical Concepts in WordNet and Specialized Sources
Olivier Bodenreider, Anita Burgun
Objectives: The objective of this study is to characterize the definitions of anatomical concepts in a general terminological system (WordNet) and a domain-specific one (a medical dictionary). Methods: Definitions were first classified into five groups with respect to the nature of the definition. The principal noun phrase (or head) of the definiens was then compared to the definiendum through a reference hierarchy of anatomical concepts. Results: This study confirms the predominance of genus-differentia definitions for anatomical terms. Hierarchical relationships are, as expected, the principal type of relationships found between the definiendum and the head of the definiens. Discussion: Differences in the characteristics of the definitions between WordNet and medical dictionaries are presented and discussed.
16. Induction of Classification from Lexicon Expansion: Assigning Domain Tags to WordNet Entries
Echa Chang, Chu-Ren Huang,
Sue-Jin Ker, Chang-Hua Yang
The goal of this paper is to present a series of induced methods to assign domain tags to WordNet entries. Our primary objective is to enrich the contextual information in WordNet specific to each synset entry. By using the available lexical sources such as Far East Dictionary and the contextual information from WordNet itself, we can find a foundation upon which we can base our categorization. Next we further examine the similarity between common lexical taxonomy and the semantic hierarchy of WordNet. Based on this observation and the knowledge of other semantic relations we enlarge the coverage of our domain assignment in
a systematic way. In the end it is found that the accuracy reflects a promising result.
17. Estonian WordNet Benefits from Word Sense Disambiguation
Neeme Kahusk, Kadri Vider
The effect of lexical resource in Word Sense Disambiguation (WSD) task is bidirectional. The results of WSD depend on goodness of lexicon, and lexicon can be improved in the process of WSD. About 10 000 content words in texts are manually disambiguated according to Estonian WordNet (EstWN) word senses. The main aim of the study is-besides gaining experience in WSD-to find out, how well the existing EstWN covers real language usage in texts.
18. Methodological Issues in the Building of the Basque WordNet: Quantitative and Qualitative Analysis
Eneko Agirre, Olatz Ansa,
Xabier Arregi, Jose Mari Arriola,
Arantza Diaz de Ilarraza, Eli Pociello,
Kepa Sarasola and Larraitz Uria
This paper describes the methodology we have adopted to ensure the quality of the Basque WordNet in terms of coverage, correctness, completeness and adequacy. The Basque WordNet follows the EuroWordNet framework and, basically, it is produced using a semi-automatic method that links Basque words to the English WordNet. We have found that in order to ensure proper linguistic quality and avoid excessive English bias, a double manual pass on the automatically produced Basque synsets is desirable: a first concept-to-concept pass to ensure correctness of the Basque words linked to the synsets, and a word-to-word pass to ensure the completeness of the word senses linked to the words. By this method, we expect to combine quick progress (as allowed by a development based on the English WordNet) with quality (as provided by a development based on a native dictionary). We have completed the concept-to-concept review of the automatically produced links for the nominal concepts, and are currently performing the word-to-word review.
19. BALKANET: A Multilingual Semantic Network for Balkan Languages
Stamou Sofia, Oflazer Kemal, Pala Karel,
Christoudoulakis Dimitris, Cristea Dan,
Tufis Dan, Koeva Svetla, Totkov George,
Dutoit Dominique, Grigoriadou Maria
BalkaNet aims at building a multilingual lexical database consisting of WordNets in several Central and Eastern European languages. Even though it will be built in a similar way with EuroWordNet, new features will be implemented ranging from structuring the Inter-Lingual-Index to ensure linking of conceptual equivalencies across WordNets to the development of an inter-networked WordNet Management so that each partner retains full responsibility and independence of his local WordNet whereas at the same time they will be able to view other WordNets and check their compatibility.
20. Extending Synsets with Medical Terms
Paul Buitelaar, Bogdan Sacaleanu
An important problematic issue with general semantic lexicons like WordNet or GermaNet is that they do not cover many terms and concepts specific to certain domains. Therefore, these resources need to be tuned to a specific domain at hand. This involves selecting those senses that are most appropriate for the domain, as well as extending the sense inventory with novel terms and novel senses that are specific to the domain. In this paper we focus on extending GermaNet synsets with domain specific terms, taking into account the domain relevance of senses (i.e. synsets).
21. Storing and Retrieving WordNet Database (and other Structured Dictionaries) in XML Lexical Database Management System
Pavel Smrz
This paper deals with an efficient storage and retrieval of various kinds of lexical information in a specialized lexical database management system. Relevant aspects of the XML format and many related technologies are surveyed first. The second section describes motivations and internals of the designed and implemented client/server system. The last section brings information about one specific sub-module of our system that will integrate lexical rules for regular polysemy and derivational morphology paradigms.
22. WordNet Web Navigation Interface: A Fast Interface to Navigate EuroWordNet Hierarchies
Eneko Agirre, Olatz Ansa,
Xabier Arregi, Kike Fernandez
This paper introduces WWNI, a new web interface for multilingual WordNets. The main features of this interface are the following: all items shown are clickable, multilingual information can be shown if desired, and the user can navigate across the hypernymy and meronymy hierarchies in a straightforward way. The multilingual WordNet database is implemented in mSql, and the cgi’s in Perl that produces dynamic html pages based on the W3C DOM model and javascript. Because of the DOM model, it requires versions equal or higher than Netscape 6 or Internet Explorer 5. As far as we know, it is the first web interface that allows for fast hierarchy navigation. It is accessible at the following URL: http://ixa.si.ehu.es/tresnak/wwni/index.html.
23. ItalWordNet: A Large Semantic Database for the Automatic Treatment of the Italian Language
Adriana Roventini, Antonietta Alonge,
Francesca Bertagna, Nicoletta Calzolari,
Rita Marinelli, Bernardo Magnini,
Manuela Speranza, Antonio Zampolli
This paper describes the main characteristics of the ItalWordNet semantic database, built within the SI-TAL Italian National Project. The database was created by extending the Italian wordnet developed within the EuroWordNet project by adding i) adjectives, adverbs and proper nouns (not dealt with within EuroWordNet); ii) a terminological subset related to the economic-financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are illustrated.
24. Word Sense Disambiguation Using Semantic Graph
Narayanan Unny E,Pushpak Bhattacharyya
This work describes a method of word sense disambiguation by finding similar words in a text. We have used some characteristic properties of the text and its constituent words for the disambiguation task. Using the WordNet, the algorithm constructs a semantic structure on the text illustrating the relations among the words of the text. This structure is then used for disambiguating the constituent words.
25. Automated Discovery of Telic Relations for WordNet
Marco De Boni, Suresh Manandhar
A method is presented for automatically extending WordNet with the telic relationships proposed in Pustejovsky’s lexicon model. The method extracts telic relationships from WordNet glosses by first selecting a telic word through a pattern matcher aided by a part-of-speech tagger and then employing a word disambiguation module to select the specific meaning (synset) of the telic word. The method is shown to be fruitful, inferring a number of useful relationships.
26. MultiWordNet Developing an Aligned Multilingual Database
Emanuele Pianta, Luisa Bentivogli,
Christian Girardi
This paper illustrates the MultiWordNet project, aimed at producing an Italian WordNet strongly aligned with the Princeton WordNet. The main conceptual differences between the MultiWordNet and the EuroWordNet conceptual models are presented first. Then two automatic procedures capable of speeding up the work of lexicographers are described. Finally, we give some details about the adopted data model and we present a graphical user interface that can be used to browse and update the aligned database.
27. Cross-linguistic Discovery of Semantic Regularity
Wim Peters, Louise Guthrie,
Yorick Wilks
The question of whether metonymy carries across languages has always been interesting for language representation and processing. Until now attempts to answer this question have always been based on small-scale analyses. With the advent of EuroWordNet (Vossen 1998), a multilingual thesaurus covering eight languages and organized along the same lines as WordNet (http://www.cogsci.princeton.edu/~wn/) we have a unique opportunity to research this question on a large scale. In this paper we systematically explore sets of concepts comprising possible metonymic relations that have been identified in WordNet. The sets of concepts are evaluated, and a contrastive analysis of their lexicalization patterns in English, Dutch and Spanish is performed. Our investigation gives insight into the cross-linguistic nature of metonymic polysemy and defines a methodology for dynamic extensions of semantic resources.
28. A Tree-structure Solution for the Development of ChineseNet
Liu Yang, Yu Jiangsheng
Yu Shiwen
In this paper, we would like to put forth the notion of tree-structure in the development of a WordNet-compatible concept dictionary. After getting the full- hyponymy information in WordNet successfully, we have further implemented a visual tree-structure control, which enables the lexicographers to operate interactively on the view of the hyponymy tree, with correspondingly automatic modifications of the database in the background. The expressing of semantics in the development thus adopts a much more intuitionistic and efficient way. ICL (the Institute of Computational Linguistics) now has benefited a lot by employing this new solution for the development of CCD (the Chinese Concept Dictionary), our ChineseNet, here in Peking University.
29. Metaphoric Expressions: an Analysis of Data from a Corpus and the ItalWordNet Database
Antonietta Alonge, Margherita Castelli
This paper reports on the work carried out so far, in the context of the ISLE project, to envisage if and how information on metaphoric expressions should be encoded in a (multilingual) lexical entry for NLP applications. When analysing corpus data we find a huge number of metaphoric expressions which can be hardly dealt with by using as reference databases resources already developed. Thus, the problem we are facing is to what extent and how information on metaphors could be encoded in a general lexicon. In this paper we address the narrower issue of what is encoded on metaphoric expressions in a WordNet-like resource – ItalWordNet – and what could instead be encoded and how. We explore the issue by taking into account the occurrences of the verb colpire (to hit/to strike/to shoot) and the noun colpo (blow/stroke/shot) in a corpus of Italian and in the ItalWordNet database.
30. EuroWordNet as a Resource for Learning Spanish Verbs
Roser Morante, M. Antònia Martí
This paper presents the ongoing research for applying EuroWordNet (EWN) as a tool for learning Spanish verbs in a CALL context. Current tendencies in L2 acquisition research encourage semantic based approaches for vocabulary teaching, defending the view that the meaning of a word is the starting point of the learning process. This paper develops a didactic proposal for using EWN for learning and teaching purposes based on the hypothesis that the process of lexicon acquisition can be facilitated by means of instruction. An instructional method should focus on helping the learner to establish lexical connections between the L1 and the L2, to semantically connect words in the L2 lexicons and to learn how to use words in context. Being EWN a multilingual semantic network, it becomes a potential resource to be integrated in a system for CALL purposes. It lacks, however, information about the use of words. In order to improve EWN as a learning tool, it has been enriched establishing links to a diatheses alternations database and to a semantically tagged corpus.
31. An Architecture for Engineering Sublanguage WordNets
Kalyan Moy Gupta, David W. Aha,
Elaine Marsh,Tucker Maney
We describe software architecture to interactively acquire and maintain sublanguage WordNets. The architecture builds upon WordNet semantic structure and includes integrated capabilities for concept element discovery, concept identification, and concept maintenance. We describe the completed components of our on-going implementation by application to Navy Lessons Learned documents. Our preliminary observations indicate that there is very little overlap between concepts discovered from sublanguage documents and WordNet.
32. Adjectives in WordNet-type Thesaurus: Estonian Experience
Heili Orav
There has been the need for computer thesauri in Estonian lexicography for a long time. For 1997 it was clear, that besides morphological and syntactical analysis a lexical database based on word semantics was needed. Compilation of the Estonian WordNet started in 1997 and the work is still in progress. The existing Estonian WordNet contains nouns, verbs and some adjectives. The present paper concentrates on adjectives. The reason: there is non-existing common solution for treatment of adjectives in the context of research on practical compiling of thesauri.
33. Nouns in WordNet and HowNet: An Analysis and Comparison of Semantic Relations
Ping-Wai Wong, Pascale Fung
This paper compares and analyses the semantic relations of nouns in WordNet and HowNet. Their main difference lies in the theory of meaning representation. WordNet, adopting the differential approach, uses synsets to differentiate one concept from the other. HowNet, using the constructive approach, uses sememes (the basic unit of meaning) to build up the meaning of a concept. Regardless of the difference in meaning representation, the semantic relations of nouns are quite similar in WordNet and HowNet. For example, hyponymy, synonymy, antonymy, meronymy and value-attribute relations are represented in both. The differences will be described in detail in the paper.
34. Experiences in Building the Indo WordNet - A WordNet for Hindi
Debasri Chakrabarti, Dipak Kumar Narayan,
Prabhakar Pandey, Pushpak Bhattacharyya
It is increasingly being understood by the practioners of information retrieval, natural language processing and knowledge engineers that a rich lexical knowledge base is the heart of any intelligent information processing system. The words of a language are extremely powerful units that bind together in extensive and unique ways to create a knowledge web. Indo WordNet {as we call the WordNet for Hindi} is an on-line lexical database. It is an attempt to build a lexical reference system for Hindi language. The design has been inspired by the famous English WordNet. For each word we find the synonym set, representing one lexical concept. These synonym sets are linked with other synonym sets through the well {known semantic relationships of hypernymy, hyponymy, meronymy, holonymy, antonymy and so on. The Indo WordNet has some unique features like graded antonymy and meronymy. It also addresses the unique Indian language phenomenon like causative, compound and conjunct verbs, both at the conceptual level and the implementation. It has an efficient underlying database design. The web interface for querying the IndoWordNet has been implemented using Php4 scripting language. The data entry interface, implemented using Java/Jfc is also simple and elegant.
35. What Does it Mean to be a Shelf? Semantic Bleaching and WordNet
Sandiway Fong
In English, denominal verbs incorporate in varying degrees the meaning of the root noun as part of the verb’s meaning. For example, one can box a present in a gift box but not in a paper bag, shelve a book on the mantelpiece but not on a spike. Other verbs such as land and warehouse exhibit bleaching to a much greater degree; for example, one can land a hydroplane on water, or warehouse parts in a barn, silo or any structure. In this paper, we describe the advantages and shortcomings in modeling semantic bleaching using WordNet’s hypernym/hyponym hierarchy, suggesting, along the way, directions for further re.nement of the isa-relation.
36. Comparing Ontology-based and Corpus-based Domain Annotations in WordNet
Bernardo Magnini, Carlo Strapparava,
Giovanni Pezzulo, Alfo Gliozzo
Domain information has been regarded as an emerging topic of interest in relation to WordNet. A lexical resource, WordNet Domains, is presented, where WordNet synsets have been annotated with domain labels such as Medicine, Architecture and Sport. This annotation reflects the lexico-semantic criteria adopted by humans involved in the annotation. However, from a corpus-based perspective, domains reflect term distribution in a given text collection. The paper proposes a preliminary investigation aiming at comparing and integrating ontology-based and corpus-based domain information.
37. Integrating Selectional Preferences in WordNet
Eneko Agirre, David Martinez
Selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous statistical models to class-to-class preferences, and presents a model that learns selectional preferences for classes of verbs, together with an algorithm to integrate the learned preferences in WordNet. The theoretical motivation is twofold: different senses of a verb may have different preferences, and classes of verbs may share preferences. On the practical side, class-to-class selectional preferences can be learned from untagged corpora (the same as word-to-class), they provide selectional preferences for less frequent word senses via inheritance, and more important, they allow for easy integration in WordNet. The model is trained on subject-verb and object-verb relationships extracted from a small corpus disambiguated with WordNet senses. Examples are provided illustrating that the theoretical motivations are well founded, and showing that the approach is feasible. Experimental results on a word sense disambiguation task are also provided.
38. Structured Access to Scientific Information
Caterina Caracciolo, Maarten de Rijke
We report on an ongoing project aimed at providing an exemplary architecture for an electronic dissemination environment for scientific handbooks. We focus on our way of facilitating navigation through and access to electronic handbooks by using a WordNet-like concept hierarchy consisting of synsets that are connected to each other and to external sources by semantic relations for navigational purposes.
39. The WordNet as a Vocabulary Management Tool For Indexing Language
Hemalata Iyer, B.A.Sharada
This paper reports an exploratory study of the ways in which WordNet can be used in conjunction with the indexing languages, such as the faceted classification schemes and thesauri to enhance information search and retrieval. Compares the features of indexing languages and natural language. Suggests the application of the WordNet, for broadening and narrowing down the searches, for providing wider map of knowledge by linking domain thesauri, for updating the faceted schemes, for full text searches and as a pre-search tool for lay-users.
40. Validity of Noun Semantic Networks for Korean Word-Sense Disambiguation
Yoo-Jin Moon, Kyungho Min
This paper presents the method to verify validity of Korean noun semantic networks that are used for the construction of the selectional restriction relation by applying the networks to the syntactic and semantic properties. In addition, this paper utilizes the integrated Korean noun and verb networks for word-sense disambiguation in the Korean sentences, through the selectional restriction relation in the sentences. Integration of Korean Noun Networks into the SENKOV system will provide the accurate and efficient knowledge base for the semantic analysis of Korean NLP.
41. Lexicons in an Object-Oriented Grammatical Model For Universal Grammar-Based Machine Translation (UGBMT)
Yukiko Sasaki Alam, Shahid Alam
This paper presents an ongoing work on designing an object-oriented machine translation model, which is termed the Universal Grammar-based machine translation (UGBMT). This model benefits from recent development in object-oriented programming and linguistic research. Language-independent linguistic entities such as categories of meanings and semantic verb classes are implemented as Java classes in the package representing Universal Grammar. So are prototypical syntactic categories and syntactic verb classes such as sentence, noun phrase, intransitive verb and ditransitive verb. Classes representing language-specific linguistic entities inherit attributes and methods from their super classes in Universal Grammar, thus avoiding unnecessary repetition of features common to individual languages and on the other hand highlighting idiosyncratic properties of individual languages.
Meanings listed in lexical entries in the lexicons of individual languages function as pointers to entries in Universal Lexicon that contain semantic informatin such as their semantic categories and argument structure.
Understanding of sentences is represented at Surface Structure with syntactic information, Deep Structure with meanings and Universal Structure with richer semantic information. What mediates process from source language to target language is meaning.
This paper demonstrates that the architecture of this model results in an intuitively transparent mechanism of translation as well as economy in programming.
43. Tamil WordNet
Devi Poongulhali P, Kavitha Noel N,
Preeda Lakshmi R
This paper on ‘Tamil WordNet’ presents the design and implementation issues involved in creating a lexical database for Tamil language. The infrastructure of the Tamil WordNet differs from its standard prototypes, to accommodate the unique features and specialties that are characteristic of Tamil language. The linguistic aspects of Tamil dictate the design of the WordNet. The implementation details such as the design of the lexicographic files, database tables, grinder utility, etc have been discussed in the course of the paper. An application to demonstrate the use of Tamil WordNet has also been looked up on.
44. Adapting GermaNet for the Web
Claudia Kunze, Lothar Lemnitzer
This paper deals with the adoption of the lexical-semantic WordNet GermaNet for web-based applications. The GermaNet data have been converted into XML-conformant documents, which represent the concepts and all the basic relations defined between them, while accounting for the peculiarities of the German WordNet. We also compare GermaNet to the Princeton WordNet in order to unify the diverging representations for the use within a polylingual framework.
45. Tamil WordNet
S Rajendran, S Arulmozi,
B Kumara Shanmugam,
S Baskaran, S Thiagarajan
A WordNet plays an important role both in the development of NLP applications such as a Machine Translation system and a Question-Answering system as well as for lexical studies of a language. While WordNets have been compiled for most of the European languages, these resources do not exist for Indian languages. This paper presents the lexicographic and computational issues faced in an attempt to build a 'Tamil WordNet'. A working model will be ready at the time of presenting the paper.
46. Semi-Automatic Construction of Korean Noun Thesaurus by Utilizing monolingual MRD and an Existing Thesaurus
Juho Lee, Koaunghi Un,
Key-Sun Choi
Thesaurus is used as a knowledge resource in many systems of natural language processing, so it is very useful and necessary for the high quality systems, especially for dealing with semantics. In this paper, we introduce the semi-automatic method for the construction of Korean noun thesaurus. This method consists of three stages and uses a monolingual MRD and an existing thesaurus.
47. Oriya WordNet
S.Mohanty, R.C.B.Ray, P.K.Santi
Machine Translation (MT) in Oriya language is in its infancy. Nonavailability of proper Electronic dictionary has handicapped us to tackle the MT problem. This inspired us to develop the WordNet in Oriya language. We have tried to design the WordNet taking into account the speciality of this language too. In this WordNet the behaviour of each word and its category are being explained.
48. Semantic Based Text Mining
D.Manjula, Malliga, T.V Geetha
This paper discusses the incorporation of semantics in the various phases of Text Mining. Text mining is the process of extracting implicit, previously unknown and potentially useful information from textual documents. Domain Concept is also needed in information extraction because document collections are in linguistic, domain specific and application levels. The WordNet is a lexical database which does not have the domain knowledge in detail. The domain knowledge is introduced in the process of text mining to improve the retrieval performance. The interlinked domain concept trees are to be created for the domain knowledge.
50. ItalWordNet in an Annotation Task: a Chance for Discussion
Claudia Soria, Francesca Bertagna,
Nicoletta Calzolari
In this paper we suggest how the Senseval exercise can be profitably used for evaluating the lexical reference resources used for annotation. Some general reflections about the adequacy of the ItalWordNet database as a reference resource for an annotation task are proposed, focusing in particular on the lessons learned from the Senseval-2 experiment.
51. Indigenous Knowledge Systems in the Global WordNet: Focus on Car Nicobarese
R. Elangaiyan
This paper is a proposal for building WordNets for lesser known languages, especially tribal languages like Car Nicobarese. Such languages do not have adequate written literature and hence any attempt to build WordNets for them mostly depends on the data obtained from fieldwork conducted by linguists. As these languages are unique in their knowledge systems, linking them to the Global WordNet will be of immense use to the social scientists in general and the linguists in particular. This paper speaks about the uniqueness of Car Nicobarese language and the unique primitive semantic components that will have to be considered in the making of the WordNet. The concept of 'unique beginners' has also been discussed.
52. A MultilinguAl Document Authoring System
Maistros Yanis, Markantonatou Stella,
Madsen Bodil Nistrup,
Lefranc Marie – Paule, Badia Toni
The AMADA project is aiming at developing an authoring system to be invoked by users who wish to compose Web pages in their language. The documents produced will belong to some specific thematic domain. Available to the user will be a set of linguistic, in general, and terminological, in particular, resources in the language chosen by the user as well as ontologies in a selected domain. The system will facilitate authoring of linguistically qualitative and terminologically consistent hypertexts in a certain domain and certify the documents created. Users may then publish their semantically annotated documents on the Web.
|