Modeling Web Knowledge for Answering Event-based Questions

Hui Yang
School of Computing, National University of Singapore
3 Science Drive 2, Singapore 117543
(65)68744895
yangh@comp.nus.edu.sg

Tat-Seng Chua
School of Computing, National University of Singapore
3 Science Drive 2, Singapore 117543
(65)68742505
chuats@comp.nus.edu.sg

Shuguang Wang
School of Computing, National University of Singapore
3 Science Drive 2, Singapore 117543
(65)68744774
wangshug@comp.nus.edu.sg

ABSTRACT

For the TREC-style questions, the query terms we get from the original questions are either too brief or often do not contain much relevant information in the corpus. It will be very difficult to find an exact answer in a large corpus because of the surface string mismatch. In order to solve this problem, we present a question answering system called QUALIFIER, which employs a novel approach to structurally model the external knowledge from the Web and other resource for Event-based question answering. The results obtained on TREC-11 QA corpus demonstrate that the approach is effective.

Keywords

Question answering, web knowledge modeling, query formulation, semantic grouping

1. INTRODUCTION

Open Domain Question Answering (QA) is an information retrieval (IR) paradigm. Modern QA systems [1, 2] combine the strengths of traditional IR, natural language processing and information extraction to provide an appropriate way to retrieve concise answers to open-domain natural language questions against the QA corpus. Most of the QA system employ a framework containing question analysis, query formulation, document retrieval, answer extraction and validation modules. Because of the abundance of information on the Web, researchers [1, 3, 5, 6] started to seek quick answers on the Web for simple, factoid questions. Apart from those Web-based QA systems who find answers from the Web directly, we use the Web as an external knowledge base to help query formulation and locate the answers in the QA corpus. In TREC-11 [7], our group employed an innovative approach to model the lexical and world knowledge from the Web and WordNet to support effective QA. This paper investigates the integration and structured use of both world and linguistic knowledge for QA. In particular, we describe a high performance question answering system called QUALIFIER (QUestion Answering by LexIcal FabrIc and External Resources) and analyze its effectiveness by using the TREC-11 benchmark.

2. QUESTION ANSWERING EVENTS

We propose a novel way to investigate the QA problem and find the solution, which we called Event-based Question Answering . The world consists of two basic types of things: entities("anything having existence (living or nonliving)") and events ("something that happens at a given place and time") and people often ask questions about them. If we apply this taxonomy to TREC-style QA task, questions can be considered as "enquiries about either entities or events". Generally, questions often show great interests in several aspects or elements of QA events, namely Location, Time, Subject, Object, Quantity, Description and Action, etc. Table 1 shows the correspondences of the most common WH-question classes and the QA event elements.

Table 1: Correspondence of WH-Questions & Event Elements

WH-Question	QA Event Elements
Who/Whose/Whom	Subject, Object
Where	Location
When	Time
What	Subject, Object, Description, Action
Which	Subject, Object,
How	Quantity, Description

Our main observation is that a QA event shows great cohesive affinity to all its elements and the elements are likely to be closely coupled by this event. Normally, the question itself provides some known elements and asks for the unknown element(s). However, for most of the cases, it is difficult to find a correct answer, i.e., the correct unknown element(s). To solve the problems of insufficient known elements and inexact known elements, we model the Web and linguistic knowledge to perform effective QA.

3. EMPLOYING WEB KNOWLEDGE

As the Web is the most rapidly growing and complete knowledge resource in the world, QUALIFIER uses it as an external knowledge resource to solve the problem of insufficient known elements. The terms in the relevant web documents are likely to be similar to or even the same as those in the QA corpus since they both contain the same information about natural facts (QA Entity) or the current or historical events (QA Event).

QUALIFIER stores the original content words in q⁽⁰⁾ to retrieve the top N_w documents in the Web search engine (E.g. Google) and then extract the terms in those documents that are highly correlated with the original query terms. That is, for "q_i⁽⁰⁾Îq⁽⁰⁾, it extracts the list of nearby non-trivial words, w_i, that are in the same sentence or snippet as q_i⁽⁰⁾. We compute the weights for all terms t_ikÎw_i as:

(1)

where d_s(t_ik/\q_i⁽⁰⁾) gives the number of web snippets or sentences that contains t_ik and q_i⁽⁰⁾; and d_s(t_ik\/q_i⁽⁰⁾) gives the number that contains either t_ik or q_i⁽⁰⁾.

Finally, QUALIFIER merges all w_i to form C_qfor q⁽⁰⁾. It then uses WordNet as a filter to adjust the term weights. The final weight of each term is normalized and the top m terms above the cut-off threshold σ are selected to expand the original query:

q⁽¹⁾= q⁽⁰⁾+{top m termsÎC_q with weights greater than σ} (2)

where m is initially set to 20 in our experiments.

The expanded query should contain more known elements of the QA event. If we classify the terms in q⁽¹⁾, they are actually corresponding to one or more of the QA event elements we discussed in Section 2. We explore the use of semantic grouping to structurally utilize the external knowledge extracted from the Web. Given any two distinct terms t_i, t_j, we compute their:

Lexical correlation:

R_l(t_i, t_j) =		1, if t_i and t_jÎ same synset;
		0, otherwise.	(3)

Co-occurrence correlation:

(4)

where

(5)

where d_s() is as defined in Eqn 1 and k_jgives the number of other terms in K_q that co-occur with t_j. Thus the max{} expression indicates that only those terms whose normalized co-occurrence probability is above 1/k_j (or average) will have a positive co-occurrence correlation value.

Distance correlation:

(6)

where Pos(t_i) (or Pos(t_j)) denotes the position of term t_i (or t_j) in a web snippet or sentence. |Pos(t_i

) - Pos(t_j)|

We then cluster the terms into different semantic groups using a modified version of the algorithm outlined in [4]. We believe that the knowledge embedded in the Web about this QA Event can be modeled and represented by the semantic groups. From the semantic groups derived, we form a structured query by selecting key terms from the semantic groups. The structured query is then passed to the document retrieval and answer selection engines to extract the exact answers from the top returned documents. For example, for the question "What Spanish explorer discovered the Mississippi River?", the semantic groups we obtained after performing the knowledge modeling and clustering are illustrated in Figure 1. The figure clearly shows that we are able to extract different aspects (or elements) of the QA event, such as the time (1541), the name of the river (Mississippi), the name of the explorer (Hernando De Soto), the nationality of the explorer (Spanish or French) and other descriptions(First, River, European). One promising advantage of our approach is that we can answer any factual questions about the elements in this QA event. For instance, "When was Mississippi River discovered?" and "Which river was discovered by Hernando De Soto?" etc.

Figure 1: Example for Structured Query Formulation

4. EVALUATION

4.1 TREC-11 Evaluation

For the NIST human assessment on the 500 questions of TREC-11 QA track. QUALIFIER answers 290 questions correctly with confidence weighted score of 0.610, which places us as one of the top three performing systems. Table 2 gives the statistics of our system and Table 3 compares the performance between our system and other Web-based QA systems in TREC-11 [7].

Table 2: Performance over TREC-11 500 Questions

# right	290	Precision	0.580
# unsupported	18	Confidence-weight score	0.610
# inexact	17	Precision of recognizing NIL	0.241
# wrong	175	Recall of recognizing NIL	0.891

Table 3: Comparison with other Web-based QA systems in TREC-11

Runs	Precision	Confidence-weighted score
nsirgn (Radev et al, U. Michigan)	0.178	0.283
aranea02a (Lin et al, MIT)	0.304	0.433
uwmtB3 (Clarke et al , U. Waterloo)	0.368	0.512
pris2002 (QUALIFIER)	0.580	0.610

4.2 Effects of Web Search Strategies

For Web search, we adopt Google as the search engine and examine snippets instead of looking at full web pages as reported in [8, 9]. We study the performance of QUALIFIER by varying the number of top ranked web pages returned N_w, and the cut-off threshold σ (see Eqn 2) for selecting the terms in C_q. Table 4 summarizes the effects of these variations on the performance of TREC-11 questions by showing the precision, which is the ratio of correct answers returned by QUALIFIER. From the results, we can see that the best result is obtained when N_w = 75 and σ = 0.2.

Table 4: The Precision Score of 25 Web Runs

σ\ N_w	10	25	50	75	100
0.1	0.492	0.492	0.494	0.500	0.504
0.2	0.536	0.536	0.538	0.548	0.544
0.3	0.506	0.506	0.512	0.512	0.512
0.4	0.426	0.426	0.430	0.432	0.428
0.5	0.398	0.398	0.412	0.418	0.412

4.3 Event-based Query Formulation

We conducted several tests on modeling the knowledge to perform event-based QA. For each run, we compute P, the precision, and CWS, the confidence-weighted score. Table 5 summarizes the test results:

Table 5: Results of Different Query Formulation Methods

Method	P	CWS
a) Baseline	0.438	0.640
b) Baseline + Web	0.548	0.754
c) Baseline + Web + WordNet	0.588	0.795
d) Baseline + Web + WordNet + semantic grouping	0.634	0.824

Here we can draw the following observations:

The baseline method uses the original query to directly search the QA corpus to retrieve the related documents. It does not use any external knowledge to modify the query.
The Simple Web-based query formulation (method b) improves the baseline performance by 25.1% in Precision and 31.5% in CWS.
The best performance (P: 0.634, CWS: 0.824) is achieved by the structured modeling of the Web and WordNet knowledge (method d) as outlined in Section 3.

5. RELATED WORK

Other researchers have recently looked to the web as a resource for question answering. The Mulder system illustrated by Kwok et al [5] submits multiple queries for one question to a web search engine and analyzes the results. Mulder performs sophisticated parsing of query and full-text of retrieved pages in order to locate the answsers. However, they did not test their system using the TREC queries. Brill et al [1] and Clarke et al [2] investigated the importance of redundancy in their question answering systems. Clarke et al use web data to reinforce the scores of promising candidate answers by providing additional redundancy from the auxiliary web corpus and global term weighting. Brill et al [1] use the search engine summary as the primary source of redundancy, and operate the system without a full-text index of documents or a database of global term weights. Similar to Mulder, they extract the answers from the Web directly. Radev et al [6] describe a number of probabilistic approaches to passage extraction, phrase extraction, and answer ranking modules employed in the typical QA systems.

In constrast, our approach differs from existing approaches on the use of the Web in one or more of the following aspects. First, our system focus on the use of web to support query formulation, instead of using the Web as the primary source of answers. Second, we fuse the knowledge from both the Web and WordNet. Third and most distinctively, we perform structured modeling of the Web and WordNet knowledge to extract most event elements and support event-based QA.

6. CONCLUSION

We have presented the techniques used in the QUALIFIER system, which employs a novel approach to Event-based QA with the modeling of the Web knowledge. Using the structured query formulation, we can achieve an answer accuracy of 0.63 and CWS of 0.82, which showed the effectiveness of our approach.

7. REFERENCES

E. Brill, J. Lin, M. Banko, S. Dumais, and A. Ng, "Data-intensive question answering", (TREC'2001).
C. Clarke, G. Cormack and T. Lynam, "Exploiting redundancy in question answering". (SIGIR'2001).
J. Lin, A. Fernandes, B. Katz, G. Marton, S. Tellex, "Extracting Answers from the Web Using Knowledge Annotation and Knowledge Mining Techniques". (TREC'2002).
J. Liu and T. S. Chua, "Building semantic perceptron net for topic spotting",(ACL'2001).
C. Kwok, O. Etzioni, D. S. Weld, "Scaling Question Answering to the Web",(WWW'2000).
D. Radev, W. Fan, H. Qi, H. Wu, A. Grewal, "Probabilistic Question Answering on the Web",(WWW'2002).
E. M. Voorhees. "Overview of the TREC 2002 Question Answering Track." (TREC'2002).
H. Yang and T. S. Chua, "The Integration of Lexical Know-ledge and External Resources for Question Answering",(TREC'2002).
H. Yang and T. S. Chua, "QUALIFIER: Question Answering by Lexical Fabric and External Resources", (EACL'2003). To appear.