Natural language processing track
XML is rapidly becoming an accepted standard for storage, communication,
and exchange of information. Most information in typical XML documents is
expressed in natural language texts. For the third year, INEX investigates
the idea of using the specifics of XML retrieval to allow users to address
content and structural needs intuitively via natural language queries.
Like in traditional information retrieval, the user need is loose,
linguistic variations are frequent, and answers are a ranked list of
relevant elements. Like in database querying, structure is of importance and
a simple list of keywords may not be sufficient to define a query.
Structured query languages have been developed, but appear to be difficult
to use. Furthermore, the size of the unit of information retrieved is
variable and XML elements naturally overlap in the document tree. Therefore
developing natural language interfaces for XML-IR is a separate research
domain requiring its own innovative solutions.
The ultimate goal is to design and build software that will analyse,
understand, and generate results in response to queries that humans express
naturally. The primary objective of retrieval would be to interpret both
structural and content constraints of an information need expressed in a
natural language query (as opposed to the rigid syntax of XPath or XQuery).
The IR system would not only select and rank suitable documents, but select
the more suitable XML elements within documents that best satisfy the
information need (both accurately and concisely).
Tasks
There are two distinct tasks in the NLP track in 2005 - NLQ2NEXI and NLP.
- NLQ2NEXI a simplified task that does not require participants to index
the collection or to implement a search engine. Instead, NLQ2NEXI requires
the translation of a natural language query, provided in the element of a
topic, into a formal NEXI query. NEXI is a much simplified version of
XPath, with an IR flavour and interpretation. NEXI is used to define all the
formal queries at INEX. The submissions of all NLQ2NEXI participants will be
evaluated by running the automatically generated formal queries on search
engine(s) that can operate on formal NEXI expressions. The objective is to
compare the results obtained with natural language queries (translated into
NEXI) with the results that are obtained by the same search engine(s)when
using the original human crafted formal NEXI expressions. This task is
designed to allow new participants with NLP expertise to join the INEX
workshop without the need to develop a search engine.
- NLP this task has no restrictions on the use of any NLP technique to
interpret the queries as they appear in the element of a
topic. Here participants are required to submit retrieval runs, but enjoy
the freedom to implement any NLP techniques in their search engine. The
objective is not only to compare between different NLP based systems, but also to compare the results obtained with natural language queries with the
results obtained with NEXI queries by any other system in the Ad-hoc track.
We wish to test whether natural language queries in the XML domain are
effective alternatives to formal queries and to quantify the trade-off in
performance. The hypothesis is that the XML mark-up should allow NLP
techniques to be more accurate with XML collections than they are with plain
text collections. We hypothesize that this becomes possible by exploiting
the richer semantics of XML documents, when compared with plain text
documents and by operating on more precise queries which serve to
disambiguate meaning through references to structure in the data.
|