Initiative for the Evaluation of XML Retrieval

April 2003 - December 2003


[ Home | News | Participants | Schedule | {down,up}load Area | Workshop | INEX 2004 Track Proposals | Organisers ] For latest info see News
(Last update: Jan 19, 2004)
The INEX evaluation initiative is part of a large-scale effort to encourage research in information retrieval and digital libraries. The main goal of INEX is to promote the evaluation of content-oriented XML retrieval by providing a large test collection of XML documents, uniform scoring procedures, and a forum for organisations to compare their results.
In INEX 2003, participating organisations will be able to compare the retrieval effectiveness of their XML document retrieval systems and will contribute to the construction of a large XML test collection. The test collection will also provide participants a means for future comparative and quantitative experiments. Due to copyright issues, only participating organisations will have access to the constructed test collection.

INEX test collection

The test collection will consist of a set of XML documents, topics and relevance assessments. We plan a collaborative effort to derive the topics and the relevance judgments. Detailed guides and on-line topic submission, retrieval result submission, relevance assessment, and evaluation systems will be provided by INEX.

Documents

The documents in the INEX test collection are scientific articles, marked up in XML, from publications of the IEEE Computer Society covering a range of topics in the field of computer science. The collection, approximately 500 megabytes, contains over twelve thousand articles from 18 magazines/transactions from the period of 1995-2002, where an article on average consists of 1500 XML nodes.

Topics

Each group will be asked to create a set of candidate topics, which are representative of the range of real user needs over the XML collection. The queries may be content-only (CO) or content-and-structure (CAS) queries, and broad or narrow topic queries. CO queries are free text queries, like those used in TREC, for which the retrieval system should retrieve relevant XML elements of varying granularity, while CAS queries contain explicit structural constraints, such as containment conditions. From the pooled set of candidate topics a final 50 topics will be selected to form part of the INEX test collection.

Ad-hoc retrieval

The general task, to be performed with the data and the final 50 topics, will be the ad-hoc retrieval of XML documents. Participants will be able to submit up to 3 runs, each containing the top 1000 retrieval results for each of the 50 topics.

Relevance assessments

Relevance assessments will be provided by the participating groups using INEX's on-line assessment system. Each assessor will judge 1-2 topics, either the topics that they originally created or if these were removed from the final set of topics, then topics that were similar to their original queries. Please note that assessments will take about one person week per topic. Participating groups will gain access to the completed INEX test collection only after they have completed their assessment task.

Evaluation

The evaluation of the retrieval effectiveness of the XML retrieval engines used by the participants will be based on the constructed INEX test collection and uniform scoring techniques, including recall/precision measures, which take into account the structural nature of XML documents, including possible overlap of answers.
Participants will be able to present their approaches and final results at the INEX 2003 workshop in December. All results will be published in the INEX workshop proceedings and on the Web.

Data Handling Agreement

In order to have access to the data designated as the IEEE Computer Society XML Retrieval Research Collection, organizations(who didn't sign the agreement last year) participating in the INEX initiative must first fill in a data release Application Form. The signed form must be sent (by express mail) to Saadia Malik at the address above (only the original copies of the forms are accepted, no electronic or fax versions). On receipt of the forms, you will be sent information on how to download the data.
Access to the data by an individual person is to be controlled by that person's organization. The organization may only grant access to people working under its control, i.e. its own members, consultants to the organization, or individuals providing service to the organization. All application forms by individuals to access the data must be signed by a person authorized by your organization for such signatures. The individuals form must be kept by the organization for any persons being involved at its site.