The Knowledge Engineering Review
, Vol. 23:1, 101–115. c 2007, Cambridge University PressDOI: 10.1017/S0269888907001270 Printed in the United Kingdom
A context-sensitive framework for lexical ontologies
TONY VEALE and YANFEN HAOSchool of Computer Science and Informatics, University College DublinE-mail: [email protected], [email protected]
Human categorization is neither a binary nor a context-free process. Rather, the criteria that govern the useand recognition of certain concepts may be satisfied to different degrees in different contexts. In light ofthis reality, the idealized, static structure of a lexical ontology like WordNet appears both excessively rigidand unduly fragile when faced with real texts that draw upon different contexts to communicate differentworld-views. In this paper we describe a syntagmatic, corpus-based approach to redefining the conceptsof a lexical ontology like WordNet in a functional, gradable and context-sensitive fashion. We describehow the most diagnostic properties of concepts, on which these functional definitions are based, can beautomatically acquired from the web, and demonstrate how these properties are more predictive of howconcepts are actually used and perceived than properties derived from other sources (such as WordNetitself).
Different contexts encourage different ways of speaking. This variation comprises more than differences interminology and vocabulary – perhaps the most obvious reason for designing context-sensitive ontologies– but more insidiously comprises subtle differences in how common terms and their underlying conceptsare employed. Indeed, this variation in how ontological concepts are expressed linguistically can give riseto context-defining shibboleths; the plural term “ontologies”, for instance, is more likely to identify itsuser as a computer scientist than as a philosopher (Guarino, 1998).
An ontology is a formalized and highly structured system of concepts in which the meanings
of semantic structures can be grounded. Guarino (1998) notes that such an “an engineering artifact,constituted by a specific vocabulary” is used to describe “a certain reality”, and so one expects this systemto be fairly stable if it is to serve as a reliable bedrock of meaning. However, concepts are no more thanperspectives on the world, and these perspectives can change from context to context. For instance, whenspeaking of man-made objects, one can distinguish between the perspectives of designed functionalityand ad-hoc functionality (see Barsalou, 1983). Banks do not design credit-cards so that they may besurreptitiously used to open locks, but in the context of certain movies and genres of fiction, this is anapparently frequent usage. Likewise, lamp-stands are not designed to be used as blunt instruments, ordinner plates as projectiles, yet these can be contextually appropriate functions for such objects. Clearlythen, the categorization of a concept depends not on the intrinsic classification of the concept as defineda priori
in an ontology (though this will obviously be an important factor), but on how the concept isperceived in a particular context relative to a particular goal.
Once we accept that categorization is sensitive to context, all cognitive decisions that follow from
categorization – such as the perception of similarity between concepts – become context-sensitive also.
We see this effect in experiments performed by Morris (2006), which reveal that, in the context of an articleabout the effects of movies on suggestible teenagers, subjects reported a stronger semantic relationshipbetween the terms “sex”, “drinking” and “drag-racing”. The context in question served to highlight thedanger inherent in each of these activities, prompting the subjects to lump these ideas together under an
ad-hoc and highly context-sensitive concept of “dangerous behaviors” (see also Barsalou, 1983; Lakoff,1987; Budanitsky and Hirst, 2006). This concept is context-sensitive insofar as the specific behaviors thatcomprise it are contextually determined. Smoking, for instance, is often considered a dangerous behaviorin a medical context, but hardly seems to meet the diagnostic criteria for this concept when viewed fromthe contexts of bomb-disposal, undercover police-work or high-wire acrobatics.
A high-level division of labor between ontologies and contexts thus suggests itself: ontologies provide
intensional definitions for those concepts that are meaningful and stable across many contexts, whilecontexts provide local evidence that these concepts can be specialized in different ways and to differentdegrees in specific frames of reference. As such, we see contexts and ontologies as comprising twocomplementary pieces of the larger knowledge-representation puzzle, a view consistent with that ofGiunchiglia (1993), Ushold (2000) and Segev and Gal (2005).
This work is thus a computational exploration of the common intuition that language use reflects
conceptual structure. As noted by De Leenheer and de Moor (2005), ontologies are, in the end, lexicalrepresentations of concepts, so we should expect that the effects of context on language use will closelyreflect the effects of context on ontological structure. An understanding of the linguistic effects of context,as expressed through syntagmatic patterns of word usage, should lead therefore to the design of moreflexible ontologies that naturally adapt to their contexts of use. Given this linguistic bias, we focus ourattention in this paper to the class of ontology known as lexical ontologies
. These are ontologies likeWordNet (Fellbaum, 1998), HowNet (Dong and Dong, 2006) and the Generative Lexicon (Pustejovsky,1995) that aim to serve as a formal ontological basis for a lexical semantics by combining knowledge ofwords with knowledge of the world. Since many words and word-senses are inherently suited to somecontexts of use more than others, the problem of context is one of particular importance to the properworking of such ontologies. Our focus on WordNet-like ontologies, lightweight as they are, is largelymotivated by the fact that these ontologies have hitherto ignored the role of context in their design.
We begin in section 2 by considering the interlocking roles of contexts and ontologies. We view each
as a complementary kind of knowledge-representation, the primary distinction being one of stability:an ontology is a formal representation of concepts and their inter-relationships that is stable acrossdifferent frames of reference, while a context is a changeable set of subsumption mappings from thecore concepts of an ontology to the specific concepts employed in a given reference frame. The problemof “contextualizing an ontology” (Bouquet et al
., 2003; Obrst and Nichols, 2005) is thus seen as one oflocal categorization, in which concepts are locally imbued with the properties needed to allow them to besubsumed by the context-independent definitions of a base ontology. In section 3 we describe how thiscontextualization can be computationally realized for a lexical ontology, not by modeling contexts directlyand explicitly, but by using representative text corpora as sources of indicative linguistic behavior. Thesecorpora yield local knowledge in the form of syntagmatic patterns, whereby, for instance, the patterns “X-addicted”, “X-addled” and “X-crazed” suggest that the entity X is a kind of drug, the pattern “X-wielding”suggests that X is a kind of weapon, and “barrage of X” suggests that X is a kind of projectile. In section4 we describe how stable ontological definitions can be automatically constructed from syntagmaticassociations that are distilled from patterns of textual data on the web, in an approach that extends that ofAlmuhareb and Poesio (2005). We evaluate the reliability of these efforts in section 5, before concludingwith some final remarks in section 6.
As with any modeling task, ontological description is as much a matter of representational choice as it isone of representational verisimilitude. An ontology (qua engineering artifact) does not capture “objectivereality”, or even a small portion thereof, but merely, as Guarino (1998) is careful to point out, “a certainreality”. While a plurality of “realities” may be as confounding as a plurality of “ontologies”, the term“reality” is nonetheless appropriate insofar as an ontology is designed to encode a common world-viewthat is shared by multiple (if not all) parties (see Guarino, 1998; Patel-Schneider et al
., 2003). Therepresentational choice inherent in ontological design reflects the wide range of perspectives, biases, levelsof detail and subject-oriented divisions that are available (consciously or otherwise) to the knowledge
A context-sensitive framework for lexical ontologies
engineer. Regardless of the label one uses to motivate these choices, the notion of “context” seems to playa key role in defining the particular realities of different ontologies (e.g., see Bouquet et al
In distinguishing between ontologies and contexts, the former is often conceived as an inherently stable
world-view, while the latter is conceived as an altogether more fluid and changeable frame of reference thatis mapped to the former, or in which the former is applied. For instance, Obrst and Nichols (2005) conceiveof contexts as user-dependent and task-dependent views on an underlying ontology, while Bouquet et al
.,(2003) similarly conceive of contexts as local and private perspectives onto a shared encoding of a domain.
One role of context is to provide an additional layer of knowledge that better informs how an ontologycan be used in a given set of circumstances. In particular, Obrst and Nichols suggest that context canserve to annotate or label the shared concepts and relations of an underlying ontology, to e.g., expressthe security-level and provenance of those elements. Segev and Gal (2005) see ontologies and contextsas complementary views of a domain of discourse, where the ontology – a carefully-engineered domainmodel – serves as a unified, global knowledge representation onto which contexts – partial and task-specific or user-specific views – are mapped. These latter authors see the process of reconciling localcontexts to global ontologies as a core part of applications as diverse as email routing, opinion analysisand topic clustering.
Another role of context is to separate those parts of a knowledge-representation that are mutually
inconsistent into different, but complementary, perspectives, each perhaps owned by a different agent.
As used in the Cyc ontology (Lenat and Guha, 1990), these perspectives are called microtheories; theirpropositional content remains local and private unless explicitly inherited by other microtheories or madevisible through a process of lifting (Guha, 1991). For instance, the concept Sherlock-Holmes can beontologized as a kind of fictional character, and thus, a kind of mental product, or it might be ontologizedas a kind of detective. WordNet opts for the former course rather than the latter, thus sacrificing the abilityto reason about Holmes as if he were an actual detective, or even an actual person. In an ideal ontology,both ontological perspectives would be made available for reasoning purposes, perhaps by representingeach in a separate microtheory, or by representing each in different ontologies and providing a detailedsystem of mappings between each (e.g., as in Bouquet et al
Each of these apparent roles sees context as a means of partitioning ontological content into alternate
views of reality. Indeed, the microtheory labels used by Cyc, such as HealthMt, HistoryMt and so on,can be seen as annotations on propositions that allow Cyc’s inference processes to selectively include orexclude large swathes of the ontology’s content in a given reasoning process. In this vein, another relatedrole of context is to provide a bridge between the stable definitions of an ontology and the contingentfacts of a particular world-view. For instance, an ontology of chemical substances may be agnostic withrespect to how those substances are used, so that the same substance might be categorized as a medicinein one context, an illegal drug in another, and a poison in yet another. It is this division of labor betweenontologies and contexts that interests us most in this current work: how can we create ontologies ascollections of stable definitions that apply in all contexts, yet which are realized differently, by differentconcepts and to different degrees, in specific contexts? Given the significant design and engineering effortsthat are employed in the construction of well-formed ontologies (e.g., see Gangemi et al
., 2001), thisdivision of labor should be a clean one, so that the base ontology only posits relationships that are safe inall contexts, and each context only posits relationships that complement, rather than contradict, those ofthe base ontology.
This division of labor requires a solution to two related computational problems: how do we acquire
and represent the stable concept definitions that comprise the ontology; and how do we acquire the localcontextual distinctions that cause these definitions to be instantiated by different entities in different framesof reference? Almuhareb and Poesio (2005) describe a web-based approach to acquiring the propertystructure of concepts via text analysis of internet content, as indexed by a search engine like Google. Theirapproach indicates how both stable concept definitions and contingent realizations of those definitions canbe inferred from simple processes of text analysis. Almuhareb and Poesio use highly diagnostic searchqueries such as “the * of a|an|the C is|was
” to identify property values for a given concept C in web texts.
By acquiring properties (such as the fact that beverages have an associated temperature and strength) as
opposed to property values (such as “hot” and “cold” for coffee), these authors acquire a general framestructure for each concept that can be instantiated differently in different contexts.
We too employ a large-scale analysis of web-text to acquire stable concept definitions that will
transcend context boundaries. However, we do not currently focus on the acquisition of property structure,but on prototypical property values. While Almuhareb and Poesio (2005) demonstrate that genericproperties such as Temperature and Colour are more revealing about conceptual structure than specificvalues such as “hot” and “red” (since these values can change without affecting the nature of the concept),we do not collect arbitrary contingent attributions (such as the fact that coffee can
be cold) but highlydiagnostic and concept-defining attributions (e.g., that espresso should
be strong, that surgeons should
be delicate, that gurus should
be wise, and so on). To identify which property values are truly centralto the consensus definition of a concept, we use the highly specific comparison frame “as * as a|
an C”to collect similes involving a given concept from the web. Once acquired and validated, we articulatethe prototypical properties for a given concept as a set of logical constraints that serves as a functionaldefinition for that concept. The form of these functions is presented in section 3, while the web-basedacquisition of each function’s content is described in section 4.
De Leenheer and de Moor (2005) see a context as a mapping from a set of lexical (and potentiallyambiguous) labels to a set of language-neutral concept identifiers. In this view, the same words can denotespecializations of different concepts in different contexts. For instance, “cocaine” can denote a kind oflocal anesthetic in a medical context and a kind of illegal drug in a law-enforcement context. This is morethan a matter of lexical ambiguity; the same sense is intended in each context (i.e., the same substance) buta different ontological categorization is implied in each. Combining a mapping-theoretic view of context(e.g., Obrst and Nichols, 2005; Bouquet et al
., 2003), with the lexical emphasis offered by De Leenheerand de Moor, it is possible to obtain much of the reasoning benefits of a context without an explicit logicalrepresentation of context. For example, the similarity between chocolate and a narcotic like heroin will, inmost contexts, simply reflect the ontological fact that both are kinds of substances; certainly, taxonomicmeasures of similarity as discussed in Budanitsky and Hirst (2006) would capture little more than thisbasic categorization. However, in a context in which the addictive properties of chocolate are highly salient(in an on-line dieting forum, for instance), chocolate is more likely to be categorized as a drug and thusbe considered more similar to heroin. Look, for instance, at the similar ways in which these words can beused: one can be “chocolate-crazed” or “chocolate-addicted” and suffer “chocolate-induced” symptoms(each of these uses is to be found in chocolate-related Wikipedia articles). In a context that gives rise tothese syntagmatic patterns, it is unsurprising that chocolate should appear altogether more similar to aharmful narcotic.
A given corpus may employ syntagmatic patterns which reflect the fact that the corresponding context
views chocolate as a kind of drug, or military robots as soldiers, or certain kinds of criminal as predators.
By augmenting a base ontology with these categorizations, the ontology may become sufficientlycontextualized to reason fluently in this context. The model of corpus-based ontology augmentation wedescribe in this paper is consistent with, and complementary to, the Theory of Norms and Exploitations(TNE) proposed by Hanks (2004), in which corpus analysis is used to identify both the syntagmatic norms
of word usage (i.e., highly conventional and normative uses) and meaning-coercing exploitations
This current work attempts a synthesis of ideas from the fields of cognitive science and cognitive
linguistics (as exemplified by the work of Lakoff, 1987), corpus linguistics (as exemplified by the work ofHanks, 2004) and lexical semantics (as exemplified by the work of Pustejovsky, 1995). Throughout thispaper, the term “concept” is used in the set-theoretic sense employed by the ontology and description-logic literature (e.g., see Welty and Jenkins, 1999), which in turn corresponds to the use of the word“category” in the cognitive science literature (e.g., see Barsalou, 1983). As such, we view a concept as ahierarchically-positioned cognitive structure to which properties can be ascribed and in which membershipcan be asserted. As in a description-logic, membership in and subsumption by a concept can be stated
A context-sensitive framework for lexical ontologies
explicitly, or computed as necessary (Welty and Jenkins, ibid.) by an application of the logical criteriathat define the concept to the specific properties of a putative member. These logical criteria that definea concept form part of the ontology proper, while additional properties of a concept may be providedby a specific context. For instance, an ontology may stipulate that the concept Insurgent is subsumedby the concept Person, so that all insurgents, in any context, must be human (e.g., see Gangemi et al
.,2001). Nonetheless, different contexts may ascribe different additional properties to Insurgent. In onecontext, insurgents may be seen as craven and evil, supporting a local categorization of Insurgent as akind of Terrorist or Criminal; in another, insurgents may be seen as upright and noble, supporting a localcategorization of Insurgent as a kind of Champion or Defender.
2.2 Deriving context-specific insights from text
A syntagmatic approach to deriving ontological insights from text is hardly novel. Hearst (1992) describesa syntagmatic technique for identifying hyponymy relations in free text by using frequently occurringgenre-crossing patterns like N P
0 such as NP
2, ., NPn
. Like the approach of Charniak andBerland (1999), Hearst’s patterns seek out explicit illustrations of inter-concept subsumption, as in thephrase “drugs like Prozac, Zoloft and Paxil”. Such techniques are useful because contexts frequentlyintroduce new terms that are locally meaningful. Nonetheless, such techniques do not reveal the subtleand shifting nuances of concept usage that underpin a particular context. These differences are implicitprecisely because the existence of a context presupposes the existence of a shared body of knowledge anda common world-view. Context-specific corpora only reveal this shared knowledge indirectly, insofar asit is presupposed in the way that language is used.
Closer to the current approach is that of Cimiano, Hotho and Staab (2005), who do not look for
unambiguous “silver bullet” patterns in a text, but who instead characterize each lexical term and itsunderlying concept according to the syntagmatic patterns in which it participates. These patterns includethe use of the term as the subject, object or prepositional complement of a verb. The key intuition,expressed also in Weeds and Weir (2005), is that terms with similar distribution patterns will denoteideas that are themselves similar. Cimiano et al
. exploit the phrasal dependencies of a term as features ofthat term that can be used, through a process of conceptual clustering called Formal Concept Analysis,as introduced by Ganter and Wille (1999), to determine subsumption relations between different terms.
At no point are explicit expressions of these relations sought in a text. Rather, from a tabular mapping ofterms to their syntagmatic properties (called a Formal Context), FCA is used to infer these relations bydetermining which terms possess property descriptions that are a superset or subset of other descriptions.
These attributive descriptions serve a dual purpose: they allow an extensional comparison of differentconcepts to determine which is more general and inclusive; but they also serve as an explicit intensionalrepresentation of the conceptual terms that are ontologized. For example, the term “bike” is rideable
because of its use as an object with the verbs “ride”, “book” and “rent”, so the setrideable, bookable, rentable
provides an intensional picture of Bike.
Segev and Gal (2005) likewise see contexts as domain perspectives that are discernable from
representative texts. Since these authors see contexts as partial, user-specific domain views, they arguethat contexts can be conveyed by texts as small as email messages, and describe a means of extractingcontexts from frequency-weighted bags of words. By discerning contexts in texts in this way, contextscan be mapped to the appropriate ontological concepts, so that the category, opinion or topic of the textcan also be discerned. In mapping contexts to ontology concepts, the texts themselves are thus placed intoappropriate ontological buckets.
In this current work, we do not aim to extract contexts (as first-order objects or formal representations
of such) from texts, rather we see texts as a useful proxy for the knowledge provided by a context. Assuch, our goal is to employ the syntagmatic features of a text to infer the context-sensitive categorizationsthat are communicated by the text, and to augment the classification structures of the base ontology withthese additional context-specific viewpoints, To achieve this requires an understanding of the diagnosticproperties of concepts on which categorization in those concepts is based, as well as an understanding ofhow these properties are communicated in a text by particular word choices and syntagmatic patterns.
3 Conceptual norms and contextual exploitations
In this section we present a functional framework for defining concepts in terms of the linguistic cuesused to signal their usage in text. These cues or expectations are articulated as syntagmatic norms (Hanks,2004) that capture e.g., the most diagnostic adjectival modifiers that contribute to a lexical descriptionof the concept, the kinds of verbs for which the concept typically acts as an agent or a patient, the kindof group terms (like “army”, “herd”, “flock”, etc.) that are typically used to describe aggregations of theconcept, and so on. Each concept is assigned a different functional form that expresses the appropriatesyntagmatic expectations. This functional form is not Boolean in construction, but yields a continuousoutput in the range 0 . 1, where 1 indicates total satisfaction of the syntagmatic expectations (and thus,indirectly, the logical criteria) that comprise the concept definition.
A continuous or fuzzy categorization function allows concept definitions to be viewed as radial
in the sense of Lakoff (1987). For example, to the extent that a collocation like “armyof X” is found in a corpus, the associated context can be said to categorize X as a sub-type of Soldier.
Likewise, to the extent that the syntagm “X-addicted” has currency in a corpus, X should be seen as a kindof Drug. Interestingly, some of the most stable and unambiguous syntagmatic patterns are associated withmetaphoric conceptualizations. Thus, the syntagmatic schema “barrage of X” identifies X as a projectile,whether X is an arrow, a pointed question or an angry email. The frequency of these patterns in a corpusyields a sliding scale of inter-concept subsumption in the associated context. Thus, something may bemore representative of a particular concept in one context (e.g., Chocolate as a Narcotic in a dietingcontext) than in another.
We begin by supposing a function (attr arg
that returns a real number in the range [0 1] that
reflects the frequency of arg
0 as an adjectival modifier for the noun arg
1 in a corpus. Suppose also afunction (%isa arg
that returns a number in [0 1] reflecting the proportion of senses of arg
0 thatare descendants of arg
1 in a base-ontology like WordNet (That is, %isa
is not a binary relation but agradated function, and we mark it with “%
” to reflect this quantitative distinction). We can now definethe concept Fundamentalist in a functional fashion:
(define Fundamentalist (arg
(attr violent arg
0)(attr radical arg
Figure 1: A functional description of the concept Fundamentalist
That is, any extreme, violent or radical person or group that is either political or religious deservesto be categorized as a fundamentalist. The extent to which this person or group is a fundamentalistdepends entirely on the contextual evidence for these criteria, as captured by the use of attr
. The preciseworkings of attr
can be implemented in a number of ways, using any of a variety of corpus-baseddistributional similarity metrics, such as Dice’s coefficient or the Jaccard measure (see Lee, 1999; Weedsand Weir, 2005). Whatever measure is used, it must either return a value in the range [0 1] or be scaledto do so, so that each concept-defining function like that of Fundamentalist
will similarly return a valuein the [0 1] range. The value returned by each conceptual function thus corresponds to a context-sensitivedegree of categorization in the corresponding radial category (Lakoff, 1987). For instance, in texts thatare representative of a left-leaning liberal world-view, (Fundamentalist evangelical)
should return a valuecloser to 1.0 than in texts with a right-leaning conservative bias. If (Fundamentalist evangelical)
A context-sensitive framework for lexical ontologies
Table 1 Basic concept-defining functions and their syntagmatic correspondences
0”, “. verb
0+past by noun
0 verb+past by .”, “. verb
0”, “as adj
0 as a |
0 of noun
0+past” (e.g., egg-shaped, bite-sized)
a value of 0.61 for a given corpus, one can consider evangelic believers to be highly-representative, evenexemplary, instances of Fundamentalist in the associated context.
The programmatic, LISP-like structure of these conceptual functions explicitly mirrors the logical
structure of the concept’s intensional form, inasmuch as each function can be given an obvious logicalinterpretation. In the example of Fundamentalist, note how the mathematical functions min
and *(multiplication) are essentially used to encode a fuzzy-logic equivalent of the logical operator and
,while the function max
is used to encode a fuzzy-logic equivalent of the logical operator or
. Conceptualfunctions can be built using all of the syntagmatic patterns employed by Cimiano et al
. (2005), withsome additions (see Table 1). For instance, the “GROUP of NOUN+plural” pattern employs WordNet toidentify group membership descriptions in a corpus, where GROUP is any group-denoting WordNet term(e.g., swarm, army) or group activity (e.g., barrage, invasion, influx). This syntagm is given functionalform via the function (group arg
, which returns the extent (again in the range [0 1]) to whicharg
1 is described as a member of the group arg
0 in a given corpus. For instance, using the text of theencyclopaedia Wikipedia as a corpus, and using Dice’s coefficient as a measure of association, we find(group influx immigrant)
and (group army mercenary)
. The base functions of Table 1 thusserve as the interface between a context-independent intensional description, like that of Fundamentalistin Figure 1, and the specific linguistic evidence of a context-discerning corpus.
Some syntagmatic patterns are more directly associated with specific concepts than others. For
instance, the pattern “mint-flavored” clearly indicates that Mint is a flavor. Such hyphenated forms can beused to find figurative usage of concepts in context, as in:
(def ine Causal-Agent
0) (hyphen induce arg
0)) % e.g., drug-induced
A conceptual function may need to marshal different kinds of syntagmatic evidence to yield an overallcategorization score, and the basic function combine
allows us to combine this variety of contextualevidence into a score in the [0 1] range. If e
1, etc. are the scores associated with various pieces ofevidence (as returned by the functions of table 1), then combine
adds these scores to yield another in the[0 1] range as follows:
0 + e
1 − e
1 . en
0 (combine e
1 . en
function is thus a na¨ıve probabilistic or
function, one that naively assumes independenceamong the evidence it combines to generate scores that asymptotically approach 1.0. If a piece ofevidence is included multiple times (to reflect a greater diagnostic value), it is counted multiple times,but with a diminishing effect. Consider the use of combine
in a concept definition for Invader in Figure 2.
Note how four types of information are synthesized in this definition: general taxonomic knowledge (viathe %isa
function); adjectival modification (via the attr
function); subject-verb knowledge (via agent
);and group membership knowledge (via group
(* 0.3 (max (%isa arg
0 Person) (%isa arg
(agent invade arg
0)(attr invasive arg
0)(group invasion arg
0)(group influx arg
Figure 2: A functional description of the concept Invader
We refer to the final clause of Figure 2, ≥
2, as a “quantitative cut”: it specifies the number ofnon-zero pieces of evidence that combine
must have processed prior to this cut if it is to perform itsnormal function; if this threshold is not met, then combine
aborts (i.e., cuts) early and simply returns a0. Therefore, any concept in a given context that meets two or more of these intensional criteria (e.g.,people or groups that invade, non-human invasive organisms that form an influx, etc.) is categorized asan Invader to a degree that reflects the linguistic evidence of the corpus. Note how the contribution ofWordNet (or whatever ontology underpins the %isa
function) is here scaled by a small multiplier of 0.3.
This prevents the %isa
clause – which merely serves as a soft taxonomic preference rather than a hardconstraint – from making an undue contribution to the overall categorization score.
Consider a conceptual function for the concept Pet which, as formulated in Figure 3, combines several
different types of evidence to diagnose “pet-hood”. The definition asks the following questions of eachpotential member: is it a kind of animal? Is it docile or domesticated? Is it cute? Is it something that onecan own and care for? For those concepts that appear to meet two or more of these demands in a corpus,this definition can be used to introspectively explain why.
(max (of owner arg
0) (of care arg
0))(max(attr docile arg
0) (attr domesticated arg
0))(max(attr cute arg
Figure 3: A conceptual function for the highly context-sensitive notion of Pet
The definition of Figure 3 is also constructed to make animal-ness a soft-preference rather than ahard constraint for pet-hood, since one can conceive of human pets (favored children, slaves) and evenartificial pets (toys, robots, etc.). Suppose, in a given context, the above function assigns a categorizationscore of 0.12 to the term Iguana. Introspecting over the symbolic structure of the definition, the systemcan explain why this score was assigned in this context, by pointing out that the associated corpus speaksof iguanas as cuddly or cute with this much frequency, and as docile with that much frequency, and soon. Now suppose a zero categorization score is given to Piranha. The system can use a similar process toperform a what-if
analysis, as in a spreadsheet. Looking at the taxonomic placement of Piranha in a baseontology like WordNet, the system can determine which of the elements in the functional definition (suchas animal-ness) are applicable to Piranha. Noting from the corpus that cuddliness, cuteness and docilityare collocates of “animal”, it can then explain that Piranha is not a Pet because it is seen as neither cute,cuddly or docile in the associated context.
4 Web-based acquisition of conceptual functions
The functions of Figures 1,2 and 3 make no reference to any kind of context. Rather, they encodediagnostic knowledge of a general character about individual concepts – what one might describe as the
A context-sensitive framework for lexical ontologies
conventional wisdom about these concepts. Our definitions of Pet, Invader, Fundamentalist, Drug, etc. areintended to represent quantitative categorization functions for the correspondingly-named concepts in abroad-scope lexical ontology like WordNet. They should thus be seen not as comprising a local ontologyof their own, but as additions to a base ontology (see Giunchiglia, 1993; Ushold, 2000). Nonetheless,these functions are inherently context-sensitive, in two crucial respects. Firstly, they encode conventionalwisdom about conceptual structure in a flexible manner, not as hard constraints but as soft preferences. Inthis way, they anticipate that certain contexts may observe certain diagnostic requirements and not others,e.g., that invaders are not always human, or that pets may not always be animals.
Conventional wisdom has its own syntagmatic norms of expression. For instance, when one wishes
to highlight a specific property in a given concept, it is commonplace to compare that concept to onefor which that property is widely agreed to be diagnostic. Comparisons of the form “as ADJ as a|
anNOUN” work best when the exemplar that is used (e.g., “dry as sand
”, “hot as the sun
”) is familiar tothe target audience and is truly exemplary of the given property in a context-independent manner. Thatis, such simile-based comparisons work best when they are generally self-evident and not dependent ona private or inaccessible context to give them meaning. By searching the web for comparisons of thisform, we achieve two important ends: we identify the exemplar concepts that are most frequently used asa basis of comparison, which one can expect to be most stable across varying contexts, and which are thusmost deserving of representation in a base ontology; and, we identify the most salient properties of thoseconcepts, thereby allowing us to assign to them a corresponding functional form.
As in Almuhareb and Poesio (2005), we use the Google API to find instances of our search patterns
on the web. We use two simile patterns, one in which the wildcard operator * substitutes for the adjectivalproperty (where an exemplar noun is explicitly given), and one in which the wildcard operator substitutesfor the noun (while the adjective is given). The first pattern collects salient adjectival properties for agiven noun, while the second collects the most common noun concepts that exemplify a given adjectivalproperty. For purposes of radial category construction, we expect that adjectives which denote an end-point on a sliding scale, such as “brave” (versus “cowardly”), “hot” (versus “cold”) and “rich” (versus“poor”) will be the most commonly used adjectives in comparative phrases, and will yield the mostdiagnostic properties for categorization. We initially limit our attention then to WordNet adjectives thatare defined relative to an antonymous term. For every adjective ADJ on this list, the query “as ADJ as*
” is sent to Google and the first 200 snippets returned are scanned to extract different noun bindings(and their relative frequencies) for the wildcard *. The complete set of nouns extracted in this way is thenused to drive a second phase of the search, in which the query template “as * as a NOUN
” is used toacquire similes that may have lain beyond the 200-snippet horizon of the original search, or that hingeon non-antonymous adjectives that were not included on the original list. Together, both phases collect awide-ranging series of core samples (of 200 hits each) from across the web, yielding a set of 74,704 simileinstances (of 42,618 unique types) relating 3769 different adjectives to 9286 different nouns.
The simile frame “as ADJ as a NOUN” is relatively unambiguous as such patterns go, but a non-trivialquantity of unwanted or noisy data is nonetheless retrieved. In some cases, the NOUN value forms partof a larger noun phrase that is not lexicalized in WordNet: it may be the modifier of a compound noun(e.g., “bread lover”), or the head of complex noun phrase (such as “gang of thieves” or “wound thatrefuses to heal”). In other cases, the association between ADJ and NOUN is simply too ephemeral orunder-specified to function well in the null context of a base ontology. As a general rule, if one mustread the original document to make sense of the association, it is rejected. More surprisingly, perhaps, asubstantial number of the retrieved similes are ironic, in which the literal meaning of the simile is contraryto the meaning dictated by common sense. For instance, “as hairy as a bowling ball” (found once) is anironic way of saying “as hairless as a bowling ball” (also found just once). Many of the ironies we foundexploit contingent world knowledge, such as “as sober as a Kennedy” and “as tanned as an Irishman”.
Given the creativity involved in these constructions, one cannot imagine a reliable automatic filter to
safely identify bona-fide similes. For this reason, the filtering task is performed by human judges, who
annotated 30,991 of these simile instances (for 12,259 unique adjective/noun pairings) as non-ironic andmeaningful in a null context; these similes relate a set of 2635 adjectives to a set of 4061 different nouns.
In addition, the judges also annotated 4685 simile instances (of 2798 types) as ironic; these similes relatea set of 936 adjectives to a set of 1417 nouns. Surprisingly, ironic pairings account for over 13% of allannotated simile instances and over 20% of all annotated simile types.
WordNet is used as a source for the adjectives that drive the simile retrieval process; it is also used tovalidate the nouns (unitary or multi-word) that are described by these similes. By sense-disambiguatingthese nouns relative to the noun-senses found in WordNet, we can use their associated adjectival propertiesto assign functional forms to each of these WordNet senses. As such, we automatically construct context-sensitive categorization functions for the most commonly used concepts in the WordNet noun ontology.
Disambiguation is trivial for nouns with just a single sense in WordNet. For nouns with two or
more fine-grained senses that are all taxonomically close, such as “gladiator” (two senses: a boxer anda combatant), we consider each sense to be a suitable target. In some cases, the WordNet gloss for aparticular sense will literally mention the adjective of the simile, and so this sense is chosen. In all othercases, we employ a strategy of mutual disambiguation to relate the noun vehicle in each simile to aspecific sense. Two similes “as A
0 as N
1” and “as A
0 as N
2” are mutually disambiguating if N
2 are synonyms in WordNet, or if some sense of N
1 is a hypernym or hyponym of some sense of N
2in WordNet. For instance, the adjective “scary” is used to describe both the noun “rattler” and the noun“rattlesnake” in bona-fide (non-ironic) similes; since these nouns share a sense, we can assume that theintended sense of “rattler” is that of a dangerous snake rather than a child’s toy. Similarly, the adjective“brittle” is used to describe both saltines and crackers, suggesting that it is the bread sense of “cracker”rather than the hacker, firework or hillbilly senses (all in WordNet) that is intended.
These heuristics allow us to automatically disambiguate 10,378 bona-fide simile types (85%), yielding
a mapping of 2124 adjectival properties to 3778 different WordNet noun-senses. Likewise, 77% (2164) ofironic simile types are disambiguated automatically. A remarkable stability is observed in the alignmentof simile nouns to WordNet senses, which suggests that the disambiguation process is consistent andaccurate: 100% of the ironic vehicles always denote the same sense, no matter the adjective involved,while 96% of bona-fide vehicles always denote the same sense.
4.3 From similes to membership functions
The above filtering and word-sense disambiguation processes associate the properties stealthy, silent
with the person sense of “ninja” (denoted here as Ninja.0), leading to the following function:
(combine (attr stealthy arg
(attr silent arg
0)(attr agile arg
Figure 4: A web-derived functional form for the concept Ninja
As we cannot know which subset of these properties is sufficient for categorization, we use thequantitative cut ≥
2 to ensure that more than one property is contextually present to support acategorization as a ninja. The more properties that are present, the higher the resulting categorizationscore (aggregated via the combine
operator) will be. The hard constraint (
A context-sensitive framework for lexical ontologies
as the taxonomic constraint for all conceptual functions that represent a specialized kind of person inWordNet, ensuring that the context does not suggest categorizations that undermine the logical core ofthe ontology.
The nouns most commonly used in similes on the web will typically provide an even richer set of
properties on which to base categorization, as illustrated by the functional form of Snake in Figure 5.
(combine (attr cunning arg
Figure 5: A web-derived functional form for the animal sense of “snake” (snake.0)
Note that the taxonomic constraints (
serve toensure that the resulting in-context categorizations are broadly literal w.r.t. WordNet. By weakening (i.e.,generalizing) or removing these constraints, one could allow for contextually-appropriate metaphoriccategorizations to made, e.g., that agile animals, stealthy viruses or silent and clandestine organizationsmight be seen as ninjas, or that cunning and slippery people might be seen as snakes. The distinctionbetween literal and metaphoric categorization in a given context is often blurred, and may, in principle,be impossible to delineate. Is chocolate really an addictive drug in some dieting contexts, or is such acategorization a creative over-use of the word “drug”? While this rather vexing question falls outside thescope of the current paper, we note that the framework of conceptual functions described here providesan ideal mechanism for exploring the contextual boundaries of literal and metaphoric categorization infuture research.
In this section we provide empirical support for the two main claims of this paper. The first is the relativelyuncontroversial claim that syntagmatic patterns of usage at the textual-level reflect distinctions in conceptusage at the ontological level (e.g., Cimiano et al
., 2005; Hanks, 2006), so that the syntagmatic patterns ina given corpus can be taken to be indicative of categorization patterns in the corresponding context. Thesecond is the more novel claim that web similes are sufficiently revealing about the diagnostic propertiesof concepts to allow accurate categorization functions to be constructed for each.
We test the first claim using the HowNet ontology of Dong and Dong (2006). HowNet differs from
WordNet in many respects (e.g., the former is bilingual, linking the same definitions to both Englishand Chinese labels) but the key difference is that HowNet defines the meaning of each word sense via asimple conceptual graph. For instance, HowNet specifies that a Knight is the agent of the activity Fight,while Assassin is the agent of the activity Kill. Additionally, it states (in explicit logical terms) that thekilling performed by an Assassin has the property means
. Each of these logical definitions ishand-crafted, allowing us to test whether the syntagmatic patterns exhibited by a corpus can suggestmuch the same semantic distinctions as made by an ontologist. For simplicity, we focus here on thoseconcepts in HowNet that are defined as the agent of a given activity, like Knight and Assassin. Usingthe complete text of Wikipedia as our corpus (2 gigabytes, from a June 2005 download), we find 1626different nouns that have at least one sense that fills the agent role of a HowNet activity concept. In all,HowNet uses 262 unique verbs, such as kill, buy
, to describe those activities. Using Dice’scoefficient (Lee, 1999) to measure the association between each noun and each verb for which the noun isused as an active subject, we find that for 69% of nouns, the highest rating is given to the verb that is usedto capture the noun’s meaning in HowNet. Though one could not argue from this result that an ontology
could be automatically constructed from simple syntagmatic evidence alone, a result of 69% does stronglysuggest that the semantic criteria that guide ontology construction are readily discernable from patterns oflanguage use in a given context.
Our second claim concerns the simile-gathering process of the last section, which, aided by Google’s
practice of ranking pages according to popularity, should reveal the most frequently-used nouns incomparisons on the web, and thus, the most useful concepts to functionally describe in a lexical ontologylike WordNet. But the descriptive sufficiency of these functional forms is not guaranteed unless thediagnostic properties employed by each can be shown to be collectively rich enough, and individuallysalient enough, to predict how each lexical concept is perceived and used by members of a languagecommunity. If similes are indeed a good basis for mining the most salient and diagnostic properties ofconcepts, we should expect the set of properties for each concept to accurately predict how the concept isperceived as a whole. One measurable clue as to how a concept is perceived is its affective rating.
For instance, humans – unlike computers – tend to associate certain positive or negative feelings, or
affective values, with particular concepts. Unsavoury activities, people and substances generally possess anegative affect, while pleasant activities and people possess a positive affect. Whissell (1989) reduces thenotion of affect to a single numeric dimension, to produce a dictionary of affect
that associates a numericvalue in the range 1.0 (most unpleasant) to 3.0 (most pleasant) with over 8000 words in a range of syntacticcategories (including adjectives, verbs and nouns). So to the extent that the adjectival properties yieldedby processing similes paint an accurate picture of each noun concept, we should be able to predict theaffective rating of each concept by using a weighted average of the affective ratings of the adjectivalproperties ascribed to these concepts (i.e., where the affect rating of each adjective contributes to theestimated rating of a noun concept in proportion to its frequency of co-occurrence with that concept inour web-derived simile data). More specifically, we should expect that ratings estimated via these simile-derived properties to exhibit a higher correlation with the independent ratings of Whissell’s dictionarythan properties derived from other sources (such as WordNet itself) or from other syntagmatic patterns.
To determine if this is indeed the case, we calculate and compare this correlation between predicted
and reported affect-ratings using the following data sources:
A. Adjectives derived from annotated bona-fide (non-ironic) similes only.
B. Adjectives derived from all annotated similes (both ironic and non-ironic).
C. Adjectives derived from ironic similes only.
D. All adjectives used to modify a given noun in a large corpus (e.g., all possible uses of the function
for a corpus). We use 2-gigabytes of text from the online encyclopaedia Wikipediaas our corpus.
E. A set of 63,935 unique property-of-noun pairings extracted via the shallow-parsing of WordNet
glosses; e.g., strong
are extracted from the gloss for Espresso (“strong black coffee brewedby forcing steam under pressure .”).
Predictions of affective rating were made from each of these data sources and then correlated with theratings reported in Whissell’s dictionary of affect using a two-tailed Pearson test (p <
0.01). As expected,property values derived from bona-fide similes only (A) yielded the best correlation (+0.514) whileproperty values derived from ironic similes only (C) yielded the worst (-0.243); a middling correlationcoefficient of 0.347 was found for all similes together (B), reflecting the fact that bona-fide similesoutnumber ironic similes by a ratio of 4 to 1. A weaker correlation of 0.15 was found using the corpus-derived adjectival modifiers for each noun (D); while this data provides quite large value sets for eachnoun, these property values merely reflect the potential rather than intrinsic properties of each conceptand so do not reveal what is most diagnostic about the concept. As also noted by Almuhareb and Poesio(2005), such values reveal very little about the actual structure of a concept. Those authors address thisproblem by instead seeking to mine property types (such as Temperature) rather than their values (such ashot
), while we address the problem by only mining the most diagnostic property values.
More surprisingly, perhaps, property values derived from WordNet glosses (E) are also poorly
predictive, yielding a correlation with Whissell’s affect ratings of just 0.278. Our goal in this paper has
A context-sensitive framework for lexical ontologies
been to describe a framework for augmenting WordNet’s concepts with functional forms that both reflectthe diagnostic properties of these concepts and that allow them to categorize different concepts in differentcontexts. These results suggest that the properties needed to construct these categorization functions arenot to be found within WordNet itself, but must be acquired by observing how people actually use conceptsto construct and convey meanings in everyday language.
In this paper we have presented a context-sensitive functional framework for concept description inWordNet and other lexical ontologies. This framework serves as a flexible interface between, on onehand, the need for ontological clarity and a commitment to explicit logical definitions, and on the other,the context-sensitive utilization of these definitions in a representative body of text. These conceptualfunctions establish their own categorization boundaries based on the context, and – under the ontologist’scontrol – can blur the traditional line between the literal and metaphoric usage of a concept when it isontologically useful to do so (e.g., see Hanks, 2006).
This programmatic approach to concept definition complements the syntagmatic approach to ontology
construction outlined in Cimiano et al
. (2005), since the ontologist is here given access to the syntagmaticfeatures of a context via a flexible but powerful representation language. We have also described howconcept definitions can, like those of Cimiano et al
. and Poesio and Almuhareb (2005) be createdautomatically, by identifying the most diagnostic properties of each concept as expressed in similes onthe web. By associating conceptual functions with the most commonly used WordNet noun senses, weachieve a pair of related goals: WordNet is augmented with a robust, non-classical view of conceptualstructure; and, more importantly in the context of this special issue, WordNet is remade in a context-sensitive form. Ultimately, these two goals are flip-sides of the same coin, for insofar as context alters theperceived boundaries of familiar concepts, classically-structured ontologies like WordNet cannot be madecontext-sensitive without first being augmented with a flexible sense of categorization.
Much work remains to be done on the current framework, not least on the development of a more formal
treatment of how our approach serves to augment WordNet (or WordNet-like resources) with conceptdescriptions that can be used both to categorize in context and to reason about those categorizations.
Such a formal treatment would likely parallel that offered to classification structures by Giunchiglia et al
(2005), who formalize the notion of a classification system by expressing topic labels in a propositionalconcept language. As a lightweight lexical ontology, WordNet is itself little more than a classificationhierarchy, and the conceptual functions we now assign to its lexical entries serve much the same purposes(i.e., categorization and introspective reasoning) as the concept-language labels employed by Giunchigliaet al
More speculatively, more exploratory effort is clearly merited by the tantalizing issue of where literal
categorization ends and metaphoric categorization begins, and the role of context in blurring this boundary.
Almuhareb, A. and Poesio, M. 2005. Concept Learning and Categorization from the Web. In Proceedings of CogSci
2005, the 27th Annual Conference of the Cognitive Science Society
. New Jersey: Lawrence Erlbaum.
Bouquet, P., Giunchiglia, F., van Harmelen, F., Serafini, L. and Stuckenschmidt, H. 2003. C-OWL: Contextualizing
Ontologies. In Proceedings of 2nd International Semantic Web Conference
, LNCS vol. 2870:164-179. SpringerVerlag.
Budanitsky, A. and Hirst, G. 2006. Evaluating WordNet-based Measures of Lexical Semantic Relatedness.
Barsalou, L. W. 1983. Ad hoc categories. Memory and Cognition, 11:211-227.
Charniak, E. and Berland, M. 1999. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting
of the Assoc. for Computational Linguistics
, pp. 57-64.
Cimiano, P., Hotho, A., Staab, S. 2005. Learning Concept Hierarchies from Text Corpora using Formal Concept
Analysis. Journal of AI Research
, 24: 305-339.
De Leenheer, P. and de Moor, A. 2005. Context-driven Disambiguation in Ontology Elicitation. In Shvaiko P. &
Euzenat J. (eds.), Context and Ontologies: Theory, Practice and Applications, AAAI Technical Report WS-05-01:17-24. AAAI Press.
Dong, Z. and Dong, Q. 2006. HowNet and the Computation of Meaning
. World Scientific: Singapore.
Fellbaum, C (ed.). 1998. WordNet: An Electronic Lexical Database
. The MIT Press, Cambridge, MA.
Ganter, B and Wille R. 1999. Formal concept analysis: mathematical foundations. Springer Verlag, Berlin.
Gangemi, A., Guarino, N. and Oltramari, A. 2001. Conceptual Analysis of Lexical Taxonomies: The Case of
WordNet’s Top-Level. In C. Welty and S. Barry (eds.), Formal Ontology in Information Systems. IProceedingsof FOIS2001
. ACM Press: 285-296.
Guarino, N. (ed.) 1998. Formal Ontology and Information Systems. Amsterdam: IOS Press. Proceedings of
, June 6-8, Trento, Italy.
Giunchiglia, F. 1993. Contextual reasoning. Special issue on I Linguaggi e le Macchine
Giunchiglia, F., Marchese, M. and Zaihrayeu, I. 2005. Towards a Theory of Formal Classification. In Shvaiko P. &
Euzenat J. (eds.), Context and Ontologies: Theory, Practice and Applications, AAAI Technical Report WS-05-01.
Guha, R. V. 1991. Contexts: a formalization and some applications. Technical Report STAN-CS-91-1399
Computer Science Dept., Stanford, California.
Hanks, P. 2004. The syntagmatics of metaphor. International Journal of Lexicography
Hanks, P. 2006. Metaphoricity is a Gradable. in A. Stefanowitsch and S. Gries (eds.): Corpora in Cognitive
Linguistics. Vol. 1: Metaphor and Metonymy
. Berlin and New York: Mouton de Gruyter.
Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International
Conference on Computational Linguistics
, pp 539-545.
Lakoff, G. 1987. Women, Fire, and Dangerous Things: What Categories Reveal About the Mind
. University of
Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for
, pp. 25-32.
Lenat, D. and Guha, R. V. 1990. Building Large Knowledge-based Systems: Representation and Inference in the
Morris, J. 2006. Readers’ Perceptions of Lexical Cohesion and Lexical Semantic Relations in Text. Ph.D. thesis
Obrst, L. and Nichols, D. 2005. Context and Ontologies: Contextual Indexing of Ontological Expressions. In
Proceedings of the AAAI 2005 Workshop on Context and Ontologies
, Pittsburgh, Pennsylvania.
Patel-Schneider, P. F., Hayes, P. and Horrocks, I. 2003. Web Ontology Language (OWL) Abstract Syntax and
Semantics. Technical report, W3C, www.w3.org/TR/owl-semantics.
Pustejovsky, J. 1995. Generative Lexicon. The MIT Press, Cambridge, MA.
Segev, A. and Gal, A. 2005. Putting things in context: a topological approach to mapping contexts and ontologies.
In Shvaiko P. & Euzenat J. (eds.), Context and Ontologies: Theory, Practice and Applications, AAAI TechnicalReport WS-05-01. AAAI Press.
Ushold, M. 2000. Creating, integrating and maintaining local and global ontologies, In Proceedings of the 1st
Workshop on Ontology Learning (OL-2000), part of ECAI 2000
Weeds, J. and Weir, D. 2005. Co-occurrence retrieval: A flexible framework for lexical distributional similarity.
Welty, C. and Jenkins, J. 1999. An Ontology for Subject. Journal of Data and Knowledge Engineering
Whissell, C. 1989. The dictionary of affect in language. In Plutchnik, R. & Kellerman H. (Eds.) Emotion: Theory
. New York: Harcourt Brace, 113-131.
RESULTS: Question and Answers about the SHARP trial What do the results from the SHARP study show? The main findings of SHARP were: The patients who were allocated to take ezetimibe plus simvastatin had one-sixth fewer heart attacks, strokes or operations to unblock arteries (“major atherosclerotic events”), with similar reductions observed in all types of patient studied. Duri
5.3 Atemschutzmasken Ob Atemschutzmasken vor Infektionen allgemein wirksam schützen, ist nicht eindeutig bewiesen, da keine unumstrittenen Wirksamkeitstests vorliegen, die mit lebenden oder abgetöteten Keimen durchgeführt worden wären. Es gibt jedoch – aus der Erfahrung mit SARS – Hinweise dafür, dass die Übertragung von Viren durch Atemschutzmasken eingeschränkt werden kann. Bei