Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative II

Workshop [2006-08-01]

The biomedical literature contains functional characterizations of genes and proteins and serves as the main information source for biological database annotations. The growth of scientific literature databases such as PubMed, together with the increasing interest in more efficient information access demanded by the biology community, has resulted in methods that can automatically process collections of biological texts.

Text mining aims to efficiently retrieve and classify documents in response to complex user queries and to perform a deeper analysis of the literature to extract specific associations, such as protein-protein interactions and protein annotations. The goal of this workshop is to drive development and evaluation of text mining tools applied to biological relevant tasks in context of the BioCreAtIvE challenge.

BioCreAtIvE is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreAtIvE arose out of the needs of working biologists, biological curators and bioinformaticians to access the wealth of information in the literature, and to link this information to biological databases and ontologies. BioCreAtIvE focuses on the comparison of methods and community assessment of scientific progress, rather than on the purely competitive aspects. BioCreAtIvE is organized through collaborations between text mining groups, biological database curators and bioinformatics researchers.

The first BioCreAtIvE Challenge in 2004 addressed the detection of gene and protein names from text, their association to existing database entries and the extraction of protein annotations (i.e. protein - Gene Ontology concept associations).

The Second BioCreAtIvE Challenge (see was held during October of 2006; the evaluation workshop will be held in Spring 2007. It will address three tracks. The first will focus on tools for finding the mentions of genes and proteins in sentences drawn from MEDLINE abstracts. ( and is essentially similar to Task 1A (Tanabe, Xie et al. 2005) from the first BioCreAtIvE. The second track will involve producing a list of the EntrezGene identifiers for all the human genes/proteins mentioned in a collection of MEDLINE abstracts and is similar to BioCreAtIvE I Task 1B (Hirschman, Colosimo et al. 2005).

The third track of BioCreAtIvE II is a new advanced task on protein interaction detection, coordinated by the CNIO, in collaboration with two of the main protein interaction databases (MINT and INTACT). The complexity of the first large scale proteomics experiments makes it an important application for text mining, to support the extraction of experimentally validated interactions from full text articles. The BioCreAtIvE protein interaction challenge (Protein-Protein Interaction task) will include detection of articles containing information relevant to experimentally characterized protein interactions, the detection of actual confirmed protein interaction pairs, the extraction of experimental methods used to characterize the interactions, and the corresponding text evidence.

The importance of the BioCreAtIvE methodology of evaluating text mining systems on real biological problems has attracted the interest of other groups developing curated databases. For example, we supported the OregAnno database developers in organizing a RegCreative annotation jamboree.

Workshop registration

January, 18th - March, 20th

Paper and poster submission

January, 18th - February, 15th

Reviews completed

March 16th, Decision and notifications on selection for talks during BioCreAtIvE Workshop

Revision deadline

March 23rd, submission deadline for revisions


April 23rd-25th, 2007 BioCreAtIvE II Workshop, at CNIO, Madrid, Spain

Second BioCreAtIvE Workshop Sponsors

This workshop is sponsored by the European Science Foundation (ESF) Functional Genomics Programme and the European Network of Excellence (ENFIN) network for integration of computational approaches in systems biology (contract number LSHG-CT-2005-518254). MITRE's work on BioCreAtIvE has been funded in part by the National Science Foundation Grant No IIS-0640153.