Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.


The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge evaluation consists of a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. 

The organization of BioCreAtIvE was motivated by the increasing number of groups working in the area of text mining. However, despite increased activity in this area, there were no common standards or shared evaluation criteria to enable comparison among the different approaches. The various groups were addressing different problems, often using private data sets, and as a result, it was impossible to determine how good the existing systems were, whether they would scale to real applications, and what performance could be expected.

The main emphasis of BioCreAtIvE is on the comparison of methods and the community assessment of scientific progress, rather than on the purely competitive aspects.

There is a considerable difficulty in constructing suitable "gold standard" data for training and testing new information extraction systems which handle life science literature. Thus the data sets derived from the BioCreAtIvE challenge - because they have been examined by biological database curators and domain experts - serve as useful resources for the development of new applications as well as helping to improve existing ones.

Two main issues are addressed at BioCreAtIvE, both concerned with the extraction of biologically relevant and useful information from the literature. The first one is concerned with the detection of biologically significant entities (names) such as gene and protein names and their association to existing database entries. The second one is concerned with the detection of entity-fact associations (e.g. protein - functional term associations).

The first BioCreAtIvE challenge evaluation in 2003-2004 attracted considerable attention within the bioinformatics and biomedical text mining community. Overall, 27 groups from some 10 countries participated in the evaluation. The first BioCreAtIvE was organized through collaborations between text mining and NLP groups, biological database curators and bioinformatics researchers and has served as the promoting force for the organization of the second BioCreAtIvE challenge.