RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative II

Announcement (Events) [2006-01-02]

The Second BioCreAtIvE - Critical Assessment for Information Extraction in Biology challenge (2006-2007 ) is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. 

The second BioCreAtIvE challenge will focus on: 

  1. Gene mention tagging [GM]
  2. Gene normalization [GN]
  3. Extraction of protein-protein interactions from text [PPI] 


BioCreAtIvE is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreAtIvE arose out the needs of working biologists, biological curators and bioinformaticians to access the wealth of information in the literature, and to link this information to biological databases and ontologies. BioCreAtIvE focuses on the comparison of methods and community assessment of scientific progress, rather than on the purely competitive aspects. The first BioCreAtIvE challenge evaluation in 2003-2004 [1] attracted broad interest within the bioinformatics and biomedical text mining community, with participation from 27 groups from 10 countries. BioCreAtIvE is organized through collaborations between text mining groups, biological database curators and bioinformatics researchers.


BioCreAtIve II will be held during October of 2006, with the workshop to be held in Spring 2007. It will consist of three tracks. The first will focus on finding the mentions of genes and proteins in sentences drawn from MEDLINE abstracts and is the same as Task 1A (Tanabe, Xie et al. 2005) from BioCreAtIvE I [2]. The second track will involve producing a list of the EntrezGene identifiers for all the human genes/proteins mentioned in a collection of MEDLINE abstracts and is similar to BioCreAtIvE I Task 1B (Hirschman, Colosimo et al. 2005). The third track of BioCreAtIvE II is new and will involve identifying protein- protein interactions from full text papers, including extraction of excerpts from those papers that describe experimentally derived interactions, for curation into one of two interaction databases: IntAct (Hermjakob, Montecchi-Palazzi et al. 2004) and MINT (Zanzoni, Montecchi-Palazzi et al. 2002). [3] 


  1. Hirschman  L., M. Colosimo, et al. (2005). "Overview of BioCreAtIvE task 1B: normalized gene lists." BMC Bioinformatics 6 Suppl 1: S11.
  2. Tanabe  L., Xie N., et al. (2005). "GENETAG: a tagged corpus for gene/protein named entity recognition." BMC Bioinformatics 6 Suppl 1: S3.
  3. Hermjakob H., Montecchi-Palazzi L., et al. (2004). "IntAct: an open source molecular interaction database." Nucleic Acids Res 32(Database issue): D452-5.