Document mining track
Many domains and applications are concerned with complex data composed of
elementary components linked according to some structural or logical
organization. In biology for example, the 3D structure of proteins results
from the interaction of its different components, phylogenetic trees are
used to model species evolution and relations, RNA structures are often
compared via tree edit distances. In the text domain, the diffusion of new
data formats like XML and HTML has considerably changed the domains of
Information Retrieval and Information Extraction (IE). For structured IR
tasks, both the logical structure and the content information have to be
considered simultaneously. For information extraction applications,
structure plays a major role for identifying the relations between the
different elements to be extracted. For building and querying heterogeneous
XML databases, learning automatically from XML collections the relations
between different formats and the transformations between different
structured document representations is a key problem. Other application
domains concerned with structured data do include image processing,
multimedia (video), natural language processing, social networks, etc.
Handling structured data has become a major challenge for these domains and
different communities have been developing for some years their own methods
for dealing with structured data. The Machine Learning community should be a major actor in
this area.
Among the many open problems for handling structured data, we will focus in
this challenge on the two generic tasks of classification and clustering and
one structure specific task which is Structure Mapping. The goal of the
challenge is therefore to explore algorithmic, theoretical and practical
issues regarding the classification, clustering and structure mapping of
structured data.
Organisers
|