The INEX evaluation initiative is part of a large-scale effort to
encourage research in information retrieval and digital libraries. The
main goal of INEX is to promote the evaluation of content-oriented XML
retrieval by providing a large test collection of XML documents,
uniform scoring procedures, and a forum for organisations to compare
their results.
In INEX 2003, participating organisations will be able to compare the
retrieval effectiveness of their XML document retrieval systems and
will contribute to the construction of a large XML test
collection. The test collection will also provide participants a means
for future comparative and quantitative experiments. Due to copyright
issues, only participating organisations will have access to the
constructed test collection.
INEX test collection
The test collection will consist of a set of XML documents, topics and
relevance assessments. We plan a collaborative effort to derive the
topics and the relevance judgments. Detailed guides and on-line topic
submission, retrieval result submission, relevance assessment, and
evaluation systems will be provided by INEX.
Documents
The documents in the INEX test collection are scientific articles,
marked up in XML, from publications of the IEEE Computer Society
covering a range of topics in the field of computer science. The
collection, approximately 500 megabytes, contains over twelve thousand
articles from 18 magazines/transactions from the period of 1995-2002,
where an article on average consists of 1500 XML nodes.
Topics
Each group will be asked to create a set of candidate topics, which
are representative of the range of real user needs over the XML
collection. The queries may be content-only (CO) or
content-and-structure (CAS) queries, and broad or narrow topic
queries. CO queries are free text queries, like those used in TREC,
for which the retrieval system should retrieve relevant XML elements
of varying granularity, while CAS queries contain explicit structural
constraints, such as containment conditions. From the pooled set of
candidate topics a final 50 topics will be selected to form part of
the INEX test collection.
Ad-hoc retrieval
The general task, to be performed with the data and the final 50
topics, will be the ad-hoc retrieval of XML documents. Participants
will be able to submit up to 3 runs, each containing the top 1000
retrieval results for each of the 50 topics.
Relevance assessments
Relevance assessments will be provided by the participating groups
using INEX's on-line assessment system. Each assessor will judge 1-2
topics, either the topics that they originally created or if these
were removed from the final set of topics, then topics that were
similar to their original queries. Please note that assessments will
take about one person week per topic. Participating groups will gain
access to the completed INEX test collection only after they have
completed their assessment task.
Evaluation
The evaluation of the retrieval effectiveness of the XML retrieval
engines used by the participants will be based on the constructed INEX
test collection and uniform scoring techniques, including
recall/precision measures, which take into account the structural
nature of XML documents, including possible overlap of answers.
Participants will be able to present their approaches and final
results at the INEX 2003 workshop in December. All results
will be published in the INEX workshop proceedings and on the
Web.
In order to have access to the data designated as the IEEE
Computer Society XML
Retrieval Research Collection, organizations(who didn't
sign the agreement last year)
participating in the INEX initiative must first fill in a data
release Application
Form.
The signed form must be sent (by express mail) to Saadia Malik
at the
address above (only the original copies of the forms are
accepted, no electronic or fax versions).
On receipt of the forms, you will be sent information on how
to download the data.
Access to the data by an individual person is to be controlled
by that person's
organization. The organization may only grant access to people
working under its
control, i.e. its own members, consultants to the
organization, or individuals providing
service to the organization. All application forms by
individuals to access the data
must be signed by a person authorized by your organization for
such signatures. The
individuals form must be kept by the organization for any
persons being involved at its
site.
|
|