One of the most pervasive phenomena in natural language is that of ambiguity. This is a problem which confronts language learners and natural language processing systems alike; by that token, it confronts linguists compiling a lexicon for a languae. The notion of context enforcing a certain reading of a word-i. e. selecting for a particular word sense-is central both to global dictionary entry design (this is the question of breaking a word into word senses) and local composition of individual sense definitions.
However, current dictionaries reflect a particular ‘static’ approach to dealing with this problem: the numbers of, and distinctions between, senses within an entry are ‘frozen’ into the lexicon at compile time; furthermore, definitions hardly make any provisions for the notion that boundaries between word senses may (and do, as we show below) shift with context. All natural languages have two types of ambiguities: both syntactic and semantic. The syntactic ambiguities affect the shape of parse trees and are therefore called structural ambiguities. There are four major kinds: 1. Multiple parts of speech for a single word;
2. Different parse trees for the same sentence; 3. Unresolved referents for pronouns and definite noun phrases; 4. Unclear scopes of quantifiers and negation. Semantic ambiguities, also called lexical ambiguities since they depend on the meanings of words, have been largely neglected in formal theories. Their origin and nature, however, touch upon a number of central issues that must be addressed by any theory of semantics. There are two major kinds of lexical ambiguities: 1. Homonymy, where two or more historically distinct words happen to acquire the same pronunciation and often the same spelling as well;
2. Polysemy, where a word has a number of closely related meanings. Examples of homonymy include page in a book vs. page as an attendant or ball as a rounded object vs. ball as a dance. Polysemy is a more common kind of lexical ambiguity where the differences between senses tend to be small, subtle, and hard to distinguish. One example of polysemy is the word support with its multiple meanings that were discussed earlier. Another example is the verb yield in the following sentences: Two molecules of H2 and one molecule of O2 yield two molecules of H2O.
Vehicles approaching from the entrance ramp must yield to oncoming traffic. What distinguishes homonymy from polysemy is a clear break in the range of meanings. For polysemous words, different dictionaries usually list different numbers of meanings, with each meaning blurring into the next. For homonyms, however, dictionaries usually agree upon the number of distinct groups of meanings. The word ball as a rounded object, for example, is derived from an Old English word with similar meaning; the word ball as a dance was borrowed from French in the 17th century.
The page in a book comes from the Latin pagina, and the page as an attendant comes from the Italian paggio. Each of these homonyms has polysemous variants, but there are no intermediate meanings of ball or page that blur the distinction between the homonym. As these examples illustrate, homonyms arise from distinct word forms that accidentally come together, either because of borrowing (as with ball) or because of sound changes that lose distinctive features (as with the merger of pagina and paggio to form page).
Unlike homonyms, which result from linguistic processes of borrowing and sound change, polysemous variants result from the complexities of mapping language to the world. As an example of polysemy, consider the term oil well. Most dictionaries give only one meaning for the term, and most MT systems would have no difficouty in translating it to another language. Yet one oil company found a serious ambiguity in its definition. In their geological database, an oil well was defined as any hole in the ground drilled or dug for the purpose of obtaining oil, whether or not the hole proved to be dry.
In their financial database, however, an oil well was defined as a pipe connected to one or more holes in the ground that produce oil. The financial database therefore ignored all the dry holes and omitted details about individual holes that were grouped with others in a single ‘oil well’. The discrepancy was unimportant as long as the two databases were kept separate. But when management wanted to correlate rock formations with production, they found that they could not merge the two databases.