Home
  Springer proceedings
 Submit paper
 Submit review
  Register
  Participants
  Tasks/Tracks
  Adhoc
  ° Collection
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Interactive
  ° Guidelines
  ° Topics
  ° System
  ° Log Viewer
  ° Schedule Exp
  Multimedia
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Relevance feedback
  ° Submissions
  ° Results
  Document mining
  User-case studies
  ° Results
  XML Entity Ranking
  Natural language    processing
  ° Submissions
  ° Results
  Heterogenous Collection
  ° Collections
  ° Topics
  ° Runs
  Workshop
  News
  Organizers
  Schedule
  Publications
  2006
  2005
  2004
  2003
  2002
Mailing list Document Collection Topics Submissions        

Adhoc Track

In INEX 2006, participating organisations will be able to compare the retrieval effectiveness of their XML retrieval systems and will contribute to the construction of a new XML test collection based on Wikipedia. The test collection will also provide participants a means for future comparative and quantitative experiments. Please note that only participating organisations will have access to the constructed test collection.

INEX test collection

The test collection consists of a set of XML documents, topics and relevance assessments. The topics and the relevance judgments are obtained through a collaborative effort from the participants. Detailed guidelines on the on-line topic submission, retrieval result submission, relevance assessment task, and evaluation metrics will be provided by INEX.

Documents

INEX 2006 uses a document collection made from English documents from Wikipedia. The collection is so far made up of the full-texts, marked-up in XML, of 659,388 articles of the Wikipedia project, covering a hierarchy of 113,483 categories, and totaling more than 60 Gigabytes (4.6 Gigabytes without images). The collection has a structure containing text, more than 300,000 images and some structured part corresponding to the Wikipedia templates (about 5000 different tags). On average an article contains 161.35 XML nodes, where the average depth of an element is 6.72.

Topics

Each participating group will be asked to create a set of candidate topics, which are representative of a range of real user needs over the XML collection. The queries may be content-only (CO) or content-and-structure (CAS) queries, and broad or narrow topic queries. CO queries are free text queries, like those used in TREC, for which the retrieval system should retrieve relevant XML elements of varying granularity, while CAS queries contain explicit structural constraints, such as containment conditions. From the pooled set of candidate topics INEX will select a final set of topics to form part of the INEX test collection

Tasks

The main retrieval task to be performed in INEX is the ad-hoc retrieval of XML documents. In information retrieval literature, ad-hoc retrieval is described as a simulation of how a library might be used, and it involves the searching of a static set of documents using a new set of topics. While the principle is the same, the difference for INEX is that the library consists of XML documents, the queries may contain both content and structural conditions and, in response to a query, arbitrary XML elements may be retrieved from the library. Within the main ad-hoc retrieval task in INEX 2005, three sub-tasks were identified depending on how structural constraints are expressed in queries.

  1. In the Content-Only (CO) sub-task, queries ignore the document structure and contain only content-related conditions.
  2. An extension of the CO sub-task that includes structural hints is the +S sub-task, where a user may decide to add structural hints to his query to narrow down the number of returned elements resulting from a CO query.
  3. In the Content and Structure (CAS) sub-task, structural constraints are explicitly stated in the query and they can refer both to where to look for the relevant elements (i.e. support elements), and what type of elements to return (i.e. target elements). A structural constraint can also be interpreted as strict (i.e. the structural requirements must be followed strictly) or vague (i.e. the structural constraints are interpreted as hints and the main goal is to satisfy the overall information need). Strict and vague interpretations can be applied to both support and target elements, giving a total of four strategies for the CAS subtask.

With regards to evaluation methodology for the ad-hoc track, depending on how we assume that a user would want the output of an XML retrieval system to be, three different strategies were defined and used in INEX 2005. In a focussed strategy, we assume that a user prefers a single element that most exhaustively discusses the topic of the query (most exhaustive element), while at the same time it is most specific only to that topic (most specific element). In a thorough strategy, we assume that a user prefers all highly exhaustive and specific elements, and in a fetch and browse strategy we assume that a user is interested in highly exhaustive and specific elements that are contained only within highly relevant articles.

It is expected that the INEX 2006 ad hoc retrieval task will be based on a combination/selection of the above sub-tasks and strategies, and newly defined ones.

Relevance assessments

Relevance assessments will be provided by the participating groups using INEX's on-line assessment system. Each participating organisation will judge around 2 topics, either the topics that they originally created or if these were removed from the final set of topics, then topics that were similar to their original queries or within their expertise. Please note that assessments take about one person week per topic! Participating groups will gain access to the completed INEX test collection only after they have completed their assessment task.

Evaluation

The evaluation of the retrieval effectiveness of the XML retrieval engines used by the participants will be based on the constructed INEX test collection and uniform scoring techniques. Since its launch in 2002, INEX has been challenged by the issue of how to measure an XML information access system's effectiveness. In 2005, INEX adopted a new set of metrics, the eXtended Cumulated Gain (XCG) metrics to support the evaluation of XML retrieval engines, which will also be used in INEX 2006. These new metrics aim to provide an evaluation framework that allows to consider the dependency that exists among XML document components and, in particular, incorporate mechanisms to reward the retrieval of so-called near-misses and to address issues of overlap.

Workshop and proceedings

Participants will be able to present their approaches and final results at the INEX 2006 workshop to be held in December in Dagstuhl. All descriptions of the approaches and results will be published in the INEX workshop pre-proceedings and the Web. Revised papers will be published in the INEX post-workshop final proceedings. As for INEX 2004 and 2005, we expect the INEX final proceedings to be published in the Springer's Lecture Notes in Computer Science (LNCS) series.