Introduction
The continuous growth in XML information repositories has been matched by increasing efforts in the development of XML retrieval
systems, in large part aiming at supporting content-oriented XML retrieval. These systems exploit the available structural
information in documents, as marked up in XML, in order to implement a more focused retrieval strategy and return document
components so-called XML elements instead of complete documents in response to a user query. This focused retrieval
approach is of particular benefit for information repositories containing long documents, or documents covering a wide variety
of topics (e.g. books, user manuals, legal documents), where users effort to locate relevant content can be reduced by directing
them to the most relevant parts of these documents. For example, in response to a users query on a collection of scientific
articles marked-up in XML, an XML retrieval system may return a mixture of paragraph, section, article, etc. elements, that have
been estimated as best answers to the user's query. As the number of XML retrieval systems increases, so does the need to evaluate
their effectiveness.
The predominant approach to evaluate system retrieval effectiveness is with the use of test collections constructed specifically
for that purpose. A test collection usually consists of a set of documents, user requests referred to as topics, and relevance
assessments which specify the set of "right answers" for the user requests.
Traditional IR test collections and methodology cannot directly be applied to the evaluation of content-oriented XML retrieval
as they do not consider structure. This is because they focus mainly on the evaluation of IR systems that treat documents as
independent and well-distinguishable separate units of approximately equal size. Since content-oriented XML retrieval allows for
document components to be retrieved, multiple elements from the same document can hardly be viewed as independent units. When
allowing for the retrieval of arbitrary elements, we must also consider the overlap of elements; e.g. retrieving a complete
section consisting of several paragraphs as one element and then a paragraph within the section as a second element. This means
that retrieved elements cannot always be regarded as separate units. The size of the retrieved elements should be considered,
especially due to the task definition; e.g. retrieve minimum or maximum units answering the query, retrieve a component from
which we can access (or browse to) a maximum number of units answering the query. Finally, when multiple elements from the
same document are retrieved, a linear ordering of the result items may not be appropriate (if elements from the same document
are interspersed with elements of other documents). Single elements typically are not completely independent from their context
(the document). Thus, frequent context switches would confuse the user in an unnecessary way. It would therefore be more
appropriate to cluster together the result elements from the same document.
The evaluation of XML retrieval systems thus makes it necessary to build test collections where the evaluation paradigms are
provided according to criteria that take into account the imposed structural aspects. The INitiative for the Evaluation of XML
retrieval (INEX), which was set up in 2002, established an infrastructure and provided means, in the form of large test collections
and appropriate scoring methods, for evaluating how effective content-oriented XML search systems are.
Participating organizations will compare the retrieval effectiveness of their XML retrieval systems to others, and in doing so
will contribute to the construction of the XML test collection. The test collection will provide participants a means for future
comparative experiments.
INEX test collection
The test collection consists of a set of XML documents, topics and relevance assessments. The topics and the relevance
judgments are obtained through a collaborative effort from the participants. Detailed guidelines on the on-line topic submission,
retrieval result submission, relevance assessment task, and evaluation metrics are provided by INEX each year.
Only participating organizations gain access to the test collection
Documents
INEX 2007 uses a document collection made from English Wikipedia documents. The collection consists of the XML full-texts of
659,388 articles, and totaling more than 60 GB (4.6GB without images) and 30 million elements. On average, an article contains
161.35 XML nodes, where the average depth of an element is 6.72.
Topics
Each participating group is asked to create a set of candidate topics, which are representative of a range of real
user needs over the XML collection. The queries may be content-only (CO) or content-and-structure (CAS) queries, and
broad or narrow in topic. CO queries are free text queries, like those used in TREC, for which the retrieval system should
retrieve relevant XML elements of varying granularity. CAS queries contain explicit structural constraints (or hints), such
as containment condition or preferred retrieval structure. From the pooled set of candidate topics INEX selects a final set of
topics to form part of the INEX test collection.
Tasks
The main retrieval task performed at INEX is the ad hoc retrieval of XML documents. In information retrieval literature,
ad hoc retrieval is described as a simulation of how a library might be used, and it involves the searching of a static set
of documents using a new set of topics. While the principle is the same, the difference for INEX is that the library consists
of XML documents, the queries may contain both content and structural conditions and, in response to a query, arbitrary
XML elements may be retrieved from the library.
In addition to the ad hoc task, the following tasks are also defined
- Document mining
- Multimedia
- Entity Ranking
- Book searching
- Document collection interlinking (Link the Wiki)
The interactive track and heterogeneous track at INEX 2007 are ongoing from INEX 2006. Those interested in these tracks
should contact the tracks organizers, details can be found at:
http://inex.is.informatik.uni-duisburg.de/2006/itrack.html and http://inex.is.informatik.uni-duisburg.de/2006/het.html, respectively.
Relevance assessments
Relevance assessment will be conducted by participating groups using the INEX XRAI on-line assessment system. Each participating
organization will judge about three topics. Where possible these topics are those originally submitted by the participating group.
Assessment takes one person about two days per topic. Access to INEX assessments is only granted to groups that completed their
assessment task.
Evaluation
The evaluation of the retrieval effectiveness of the XML retrieval engines used by the participants will be based on the
constructed INEX test collection and uniform scoring techniques. Since its launch in 2002, INEX has been challenged by the
issue of how to measure an XML information access system's effectiveness.
Workshop and proceedings
Participants may present their work (including approach and final results) at the December INEX workshop in Dagstuhl, Germany.
Submitted papers will be published in the workshop pre-proceedings and made available on the Web. Final papers will be fully
peer-reviewed and the best will be published in the INEX 2007 proceedings. The proceedings of INEX 2004, 2005, and 2006 were
published in the Springer series: Lecture Notes in Computer Science (LNCS).
Data Handling Agreement
Access to this and previous years assessments are available to participating groups that complete their assessment load.
There are three sets of assessments. Between 2002 and 2005 the INEX corpus was composed of the full-texts of
12,107 articles (494 MB) from the IEEE Computer Society's publications. An additional 4,712 articles (241 MB)
were added in 2005. Since 2006 the Wikipedia collection has been used. For access to the IEEE collection please contact
Saadia Malik to obtain a data release form.
testing4
INEX is an activity of the the DELOS Network of Excellence for Digital Libraries
|