Home
  Springer proceedings
 Submit paper
 Submit review
  Register
  Participants
  Tasks/Tracks
  Adhoc
  ° Collection
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Interactive
  ° Guidelines
  ° Topics
  ° System
  ° Log Viewer
  ° Schedule Exp
  Multimedia
  ° Topics
  ° Submissions
  ° Assessments
  ° Results
  Relevance feedback
  ° Submissions
  ° Results
  Document mining
  User-case studies
  ° Results
  XML Entity Ranking
  Natural language    processing
  ° Submissions
  ° Results
  Heterogenous Collection
  ° Collections
  ° Topics
  ° Runs
  Workshop
  News
  Organizers
  Schedule
  Publications
  2006
  2005
  2004
  2003
  2002
 

XML Entity Ranking track

Motivation

The Expert Search task in the 2005 TREC Enterprise Track has evaluated systems that return a list of entities (people's names) who are knowledgeable about a certain topic (e.g., "information retrieval").

The idea of the entity ranking track is to generalise this setting to arbitrary entity types. Consider for example a Famous Actor task. Given a topic "1930s" it should return Astaire, Chaplin, Gable and Garbo, whereas given a topic "action" should result in Schwarzenegger, Stallone and Van Damme.

A setting with semi-structured data seems particularly suited as a basis for such a system, which could use the text elements, but also structural and linking information. Notice that our primary interest is not to address the entity extraction part of the problem, but really how to associate entities to a topic text!

Task description

The track's goal is to evaluate two tasks: list completion and associative ranking (task names may still be revised).

List completion aims at extending a given list of entities with more entities of the same type (viz. Google Sets). For example, a list of SIGIR and ECIR with query context "information retrieval" should be extended with CIKM.

The goal of associative ranking is to really learn the relationship between two of such lists. Here, given the list "information retrieval" workshops, we would give a text query "databases" and the goal is to return a similar list in the field of databases, e.g., SIGMOD and VLDB.

Systems could use a variety of XML information to learn how to associate the two lists, e.g., where do Xs and Ys appear in the document structure, which co-occurrences are of particular importance, and what is the value of repeated co-occurrence.

Document Collection

We plan to use the Wikipedia collection.

Evaluation Methodology

The evaluation methodology is still under discussion. We plan to involve the track participants in a light-weight electronic voting process to assess the identified entity lists. Instead of a ground-truth with binary relevance information, we could label the answers with a probability of relevance based on the assessor's votes.

Schedule

The schedule of this pilot is still under discussion.

 

Organisers

Arjen de Vries
Centre for Mathematics and Computer Science'
CWI, room C0.11
Kruislaan 413
1098 SJ Amsterdam
The Netherlands
http://www.cwi.nl/~arjen/
Email: Arjen.de.Vries@cwi.nl
Phone:+31-(0)20-5924306
Fax: +31-(0)20-5924312
 
Nick Craswell
Microsoft Research Cambridge
JJ Thomson Avenue
Cambridge, CB3 0FB
http://research.microsoft.com/users/nickcr/
Email: nickcr@microsoft.coms
Phone: +44 1223 479 794
Fax: +44 1223 479 999