Relevance feedback task
The process of information retrieval is an uncertain one. Searchers may
have less than well developed ideas of what they are searching for and the
types of information available for retrieval; they may be unable to
express a conceptual need for information in terms of a suitable query.
Early in the development of IR, researchers recognized that although users
often had difficulty expressing their informational needs precisely, they
could recognize useful information when they saw it. That is, although
searchers may be unable to readily convert informational needs into
requests, once the system presents them with an initial set of documents,
they can easily differentiate between those documents that do contain
useful information and those that do not.
This recognition led to the notion of relevance feedback (RF): users
evaluating (marking or selecting) a small set of documents as relevant or
irrelevant with respect to an informational need. RF techniques use data
from the selected documents (i.e., those returned by the system in
response to the user's original query and then evaluated by the user for
relevance) to automatically reformulate that query. They modify the
initial query and produce a revised query - the feedback query - to be
processed by the retrieval system. RF algorithms can be also used for
Automatic Query Refinement (AQR) by applying an automatic process that
marks the top results returned by the search engine as relevant and the
tail results as non relevant for use by subsequent iterations.
The aim of this track is to investigate relevance feedback in the context
of XML retrieval. In standard full text search engines, RF has been
translated into detecting a "bag of words" that are good (or bad) at
retrieving relevant information. These terms are then added to (or removed
from) the query and weighted according to their power in retrieving
relevant information. With XML documents, a more sophisticated approach -
one that can exploit the characteristics of XML - is necessary. The
approach should ideally consider not only content but also the structural
features of XML documents. The query reformulation process must therefore
infer which content and structural elements are important for effectively
retrieving relevant data.
Please note that participants in the track must register for the INEX
initiative. To have access to the test collection and in particular the
relevance assessments, participants must perform the relevance assessment
task. Participants in the RF track are also required to submit retrieval
runs to the ad-hoc task, since the ad-hoc runs will serve as baselines for
the RF task.
By 14 July 2006, participants in the Relevance Feedback (RF) task should submit
their retrieval runs (search results) as per the Ad Hoc task guidelines. The participant's
runs will serve as the baselines upon which RF will be performed. Participants should
refer to the Ad Hoc retrieval task guidelines for detailed information on the formatting
requirements of search results.
On 22 October 2006, relevance assessment data will be distributed to the participants.
RF feedback runs can then be performed using relevance information from the assessments.
To limit the number of RF
submissions we chose a subset of some common ad-hoc tracks for participants to test
their RF algorithms. Participants may submit up to 3 RF runs for each of their original
submitted Ad Hoc runs for the CO.Thorough and CAS.Thorough tracks.
Totally there could be at most 9 CO submissions (3 RF * 3 original) and 9 CAS
submissions (3 RF * 3 original).
Please note that some topics may not be used in the RF track if they are judged
inadequate for that purpose (e.g., if they do not retrieve enough relevant elements).
There are no restrictions on the number of iterations of relevance feedback for a given
query. Participants must submit their RF runs by 30 Nov 2006.
An RF run is built as follows: The relevance of up to 20 elements is checked against
the relevance assessment data and is used as input for the relevance feedback algorithm.
For most algorithms, these elements will be taken from the top-ranked elements
in the baseline run. A participant may apply several iterations
of RF where in each round, feedback for up to 20 new elements is received and a new
set of results is computed.
The submission format for the runs is derived from the AdHoc submission format, with
two new attributes of the inex-submission element:
- base_run_id - the id of the original ad-hoc run
- iterations - number of iterations used for the RF submission
Unlike the previous years, the submission must reflect *exactly* the results of the
RF run without any postprocessing such as freezing. All postprocessing will be done when
evaluating the runs. We want to make experiments with different postprocessing strategies to eliminate
the influence of elements with known relevance on the results, among them freezing
of the top-20 results used for feedback and several variants of the residual collection
method.
The XML file must follow submission.dtd(there is also a corresponding XML Schema definition).
This year we introduce some optional features to the RF runs. First feature is that we allow submission of runs generated with non-standard methods beyond top-20 feedback. For example one can decide to select some other elements as the feedback elements. For such runs, the run has to list for each topic the results whose relevance
was looked up for feedback (feedback elements), in which iteration it was used,
and at which rank in the result list it was found. This is defined by the element as described in the DTD. Note that if the run used the default 20 top elements then there is no need to specify the elements.
Another optional feature is to specify the expanded query used for the topics in the RF run.
To compare the expanded queries generated by different feedback algorithms, participants
can store, for each topic, the expanded query with the results whenever
it is appropriate for their algorithm. This can be specified by the element as
described in the DTD. The format for this expanded query is NEXI with additional, optional
weights for the terms, e.g.,
//article[about(., 0.5*XML 0.75*database -0.3*index)]
The comparison of different algorithms will be made with the following standard setting
for CO.Thorough and CO+S.Thorough:
A single iteration of feedback for the top-20 elements of the baseline run, using freezing
of the top-20 results
Each participant is required to submit at least one run for a CO.Thorough baseline.
The reported evaluation scores for each RF submission will measure the improvement
of the RF run over the original base run .
Schedule
| Jul 14: | Submission deadline of search results in the ad-hoc track |
Sep 15: | Submission deadline for relevance assessments of the ad-hoc runs |
Oct 15: | Distribution of assessment pool to participants in the RF track |
Nov 30: | Submission deadline for relevance assessments runs |
Dec 12: | Distribution of evaluation scores to participants in the RF track |
Dec 18-20: | Workshop in Schloss Dagstuhl (http://www.dagstuhl.de/) |
Organisers
Yosi Mass
Information Retrieval Group
IBM Research Lab,
Haifa 31905, Israel
Email: yosimass@il.ibm.com
|
|
Ralf Schenkel
Max-Planck-Institut für Informatik
Stuhlsatzenhausweg 85
66123 Saarbrücken
Email: schenkel@mpi-inf.mpg.de
Phone: (+49) 681 9325 504
Fax: (+49) 681 9325 599
|
|