Home
  Register
  Participants
  Tasks/Tracks
  Adhoc
  ° Collection
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Interactive
  Multimedia
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Relevance feedback
  Document mining
  User-case studies
  ° Results
  XML Entity Ranking
  Natural language    processing
  ° Submissions
  Heterogenous Collection
  ° Collections, Topics, Runs
  News
  Organizers
  Schedule
  Publications
  2006
  2005
  2004
  2003
  2002
 

Natural language processing track

XML is rapidly becoming an accepted standard for storage, communication, and exchange of information. Most information in typical XML documents is expressed in natural language texts. For the third year, INEX investigates the idea of using the specifics of XML retrieval to allow users to address content and structural needs intuitively via natural language queries.

Like in traditional information retrieval, the user need is loose, linguistic variations are frequent, and answers are a ranked list of relevant elements. Like in database querying, structure is of importance and a simple list of keywords may not be sufficient to define a query. Structured query languages have been developed, but appear to be difficult to use. Furthermore, the size of the unit of information retrieved is variable and XML elements naturally overlap in the document tree. Therefore developing natural language interfaces for XML-IR is a separate research domain requiring its own innovative solutions.

The ultimate goal is to design and build software that will analyse, understand, and generate results in response to queries that humans express naturally. The primary objective of retrieval would be to interpret both structural and content constraints of an information need expressed in a natural language query (as opposed to the rigid syntax of XPath or XQuery). The IR system would not only select and rank suitable documents, but select the more suitable XML elements within documents that best satisfy the information need (both accurately and concisely).

Tasks

There are two distinct tasks in the NLP track in 2005 - NLQ2NEXI and NLP.

  • NLQ2NEXI a simplified task that does not require participants to index the collection or to implement a search engine. Instead, NLQ2NEXI requires the translation of a natural language query, provided in the element of a topic, into a formal NEXI query. NEXI is a much simplified version of XPath, with an IR flavour and interpretation. NEXI is used to define all the formal queries at INEX. The submissions of all NLQ2NEXI participants will be evaluated by running the automatically generated formal queries on search engine(s) that can operate on formal NEXI expressions. The objective is to compare the results obtained with natural language queries (translated into NEXI) with the results that are obtained by the same search engine(s)when using the original human crafted formal NEXI expressions. This task is designed to allow new participants with NLP expertise to join the INEX workshop without the need to develop a search engine.
  • NLP this task has no restrictions on the use of any NLP technique to interpret the queries as they appear in the element of a topic. Here participants are required to submit retrieval runs, but enjoy the freedom to implement any NLP techniques in their search engine. The objective is not only to compare between different NLP based systems, but also to compare the results obtained with natural language queries with the results obtained with NEXI queries by any other system in the Ad-hoc track. We wish to test whether natural language queries in the XML domain are effective alternatives to formal queries and to quantify the trade-off in performance. The hypothesis is that the XML mark-up should allow NLP techniques to be more accurate with XML collections than they are with plain text collections. We hypothesize that this becomes possible by exploiting the richer semantics of XML documents, when compared with plain text documents and by operating on more precise queries which serve to disambiguate meaning through references to structure in the data.

Schedule

The NLP track will follow the general schedule exactly. There is no need or plan for any deviation at this stage.

May 31:Participants will be provided with detailed instructions and formatting criteria for candidate topics/queries.
Apr 21:Submission deadline for candidate topics.
May 05:Distribution of final set of topics/queries to participants along with detailed information on the formatting requirements of the search results.
Jul 14:Submission deadline of search results.

Organisers

Shlomo Geva
Faculty of Information Technology
Queensland University of Technology
126 Margaret Street
GPO Box 2434
Brisbane, Q 4001
Australia
Email: s.geva@qut.edu.au

Xavier Tannier
Ecole des Mines de Saint-Etienne,
Centre G2I, D�partement RIM, Equipe COCRI
Espace Fauriel, bureau 404
158, cours Fauriel
F-42023 Saint-Etienne cedex 2
FRANCE
http://www.emse.fr/~tannier/
Email: tannier@emse.fr
Phone: +33 (0)4 77 42 02 01