RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VI

BioCreative VI Challenge and Workshop (Events) [2017-02-06]

BioCreative VI Challenge and Workshop

October 18-20, Washington DC, USA

Team registration for tracks is now open!

BioCreative: Critical Assessment of Information Extraction in Biology ( ) is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreative V.5 is currently underway with text mining and language processing researchers submitting annotations to the BeCalm (Biomedical Annotation Metaserver) platform. In the meantime, we wanted to give you a heads-up of what is coming. We are planning for BioCreative VI, to be held in October in Washington DC, with the following tracks:

  • Track 1: Interactive Bio-ID Assignment (IAT-ID) Track on innovations in Biomedical Digital Curation
    Organizers: Lynette Hirschman, Cecilia Arighi and Cathy Wu
    The Bio-ID track will explore the ID assignment to selected bioentities both at the pre- and post-publication stages, with the aim of facilitating downstream article curation. To do this we are bringing together the various stakeholders to discuss functional requirements and develop interoperable digital curation tools. Built on previous BioCreative experiments, including the interactive tracks, the BioC, gene/protein/chemical extraction tracks, and BeCalm framework, the task is designed to foster the development of an integrated and interoperable workflow of multiple text mining tools for real-world testing in pilot publishing frameworks.
    More information about this track can be found under Tasks or at

  • Track 2: Text-mining services for Kinome Curation
    Organizers: Julien Gobeill, Patrick Ruch and Pascale Gaudet
    Literature triage (selection of relevant articles for curation) is a basic task performed by virtually all curated molecular biology databases. This task will focus on triage for both Protein-Disease and Protein-GO annotations related to human kinases. The full data set covers a significant fraction of the Human Kinome (300 proteins out ~500 kinases), with 30,000 annotations from 13,000 articles ready to be integrated in the neXtProt database by 2017. It contains comprehensive annotations about kinase substrates, GO Biological Processes and Diseases. Each annotation is provided with a PMID.
    The first two tasks deal with triage of abstracts or full-texts. The third task deals with passage selection: given a kinase, an axis, and a full-text regarded as relevant after SIB curation, the systems will return a snippet of max. 500 characters containing enough information to make an annotation.
    More information about this track can be found under Tasks or at

  • Track 3: Extraction of causal network information using the Biological Expression Language (BEL)
    Organizer: Juliane Fluck and Sumit Madan
    Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining.
    In BioCreative V, we tackled this complexity by extracting causal relationships represented in Biological Expression Language (BEL). BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The smallest unit is a BEL statement or BEL nanopub, expressing a single causal relationship. In the last BioCreative, there was only a limited time for participants to train on the data and, in addition, the evaluation environment became only available for the test phase. Furthermore, for the second subtask, the sentence classification, no training data was available. Therefore, we decide to present the same task based on new test data. This time, the training data for both subtask is available and, the evaluation environment can be used during the training time. As before, the challenge is organized into two tasks which will evaluate the complementary aspects of the problem:
      1-Given selected textual evidence, construct the corresponding BEL statement
      2-Given a BEL statement, detect all available textual evidence
    The description of the task, the training data and links to the papers and to the evaluation website can be found under the following URL:
    More information about this track can be found under Tasks or at

  • Track 4: Mining protein interactions and mutations for precision medicine (PM)
    Organizers: Rezarta Islamaj Dogan, Andrew Chatr-aryamontri, Sun Kim, Donald C. Comeau, Zhiyong Lu
    We aim to bring together the biomedical text mining community in a new challenge for precision medicine, focusing on identifying and extracting protein-protein interactions affected by mutations described in the biomedical literature. Two subtasks are proposed:
      1-Document Triage: Identifying relevant PubMed citations describing genetic mutations affecting protein-protein interactions
      2-Relation Extraction: Extracting PPI pairs experimentally verified to be affected by the presence of a genetic mutation
    Task datasets will be available in multiple formats (e.g. BioC) and consist of PubMed articles curated for BioGRID and other PPI databases.
    More information about this track can be found under Tasks or at

  • Track 5: Text mining chemical-protein interactions
    Organizers: Martin Krallinger, Alfonso Valencia, Analia Lourenço
    Considerable work has been done on the detection of genes/proteins and also chemical compound mentions, but despite the relevance of relations between them for both biological and well as pharmacological and clinical research only a limited number of strategies have been published to detect interactions between them. A range of different types chemical-protein/gene interactions are of key relevance for biology, including metabolic relations (e.g. substrates, products) inhibition, binding or induction associations. Our aim is to promote research in this field, and to focus on chemical-protein interactions that might be of relevance for precision medicine as well as for drug discovery and basic biomedical research. This task will consist of two subtasks:
      1- Chemical-protein interaction pair detection task: Extracting relations between chemical entities and protein/genes belonging to at least one of a pre-defined set of relation types.
      2- Chemical-protein interaction type detection task: Providing for previously detected interaction pairs (of task 1) the corresponding relation type qualifier).

    Task training and test datasets will prepared and consist of abstracts curated for chemical entity and protein/gene mentions (including mention offsets) as well as relationships between them according to a predefined set of interaction types.
    More information about this track can be found under Tasks or at


    Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks.
    To register a team go to the team registration page


  • Cecilia Arighi, University of Delaware, USA
  • Andrew Chatr-aryamontri, Institute for Research in Immunology and Cancer, Université de Montréal, Canada
  • Donald Comeau, National Center for Biotechnology Information (NCBI), NIH, USA
  • Kevin Cohen, University of Colorado, USA
  • Juliane Fluck, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany
  • Sumit Madan, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany
  • Rezarta Islamaj Dogan, National Center for Biotechnology Information (NCBI), NIH, USA
  • Pascale Gaudet, Swiss Bioinformatics Institute, Switzerland
  • Julien Gobeill, Swiss Bioinformatics Institute, Switzerland
  • Lynette Hirschman, MITRE Corporation, USA
  • Sun Kim, National Center for Biotechnology Information (NCBI), NIH, USA
  • Martin Krallinger, Spanish National Cancer Centre, CNIO, Spain
  • Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
  • Fabio Rinaldi, Swiss Bioinformatics Institute, Switzerland
  • Patrick Ruch, Swiss Bioinformatics Institute, Switzerland
  • Alfonso Valencia, Spanish National Cancer Centre, CNIO, Spain
  • Cathy Wu, University of Delaware and Georgetown University, USA
  • Back to top

    BioCreative V.5

    BioCreative V.5 CFP (Events) [2016-11-14]

    We are happy to announce the BioCreative V.5 Challenge (
    Following the impact and success of previous BioCreative-related initiatives, text mining and language 
    processing researchers world-wide are invited to used the BeCalm (Biomedical Annotation Metaserver) platform
     in order to implement text Annotation Servers and to provide their results for the following tasks:
  • TIPS (Technical interoperability and performance of annotation servers). (Technical interoperability of named entity recognition). This task will focus on the technical aspects of the evaluation of continuous text Annotation Servers (ASs) for named entity recognition. This is an open task, in the sense that it is not restricted to a particular named entity recognition task. Moreover, the Annotation Servers may be fully developed in-house or integrate third party recognition software as building block components. AS will consist of a "REST" (Representational State Transfer) API application that responds to the request by returning NER annotation results as structured data (represented in particular formats XML/BioC, JSON and TXT, see ).
  • CEMP (Chemical Entity Mention recognition). This tasks requires the recognition of chemical named entity mentions in text. This implies to return the start and end indices corresponding to all the chemical entity mentions.
  • GPRO (Gene and Protein Related Object recognition). For this task teams have to recognize mentions of gene and protein related objects (named as GPROs) in running text.
  • Participating teams do not need to send results for all of three sub-tasks. They can send results only for individual sub-tasks. In line with previous BioCreative challenges, the result of participating teams are presented at the BioCreative V.5. evaluation workshop that will take place in Madrid (Spain) in April 2017. See details of call for participation here here Task Organizers: Martin Krallinger, Spanish National Cancer Research Centre Anália Lourenço, University of Vigo Martin Pérez-Pérez, University of Vigo Gael Pérez-Rodríguez, University of Vigo Florentino Fdez-Riverola, University of Vigo Alfonso Valencia, Spanish National Cancer Research Centre


    BioCreative 2016


    BioCreative Workshop 2016 (Events) [2016-03-02]

    Call for Participation

    August 1-2, 2016 Corvallis, Oregon, USA

    BioCreative: Critical Assessment of Information Extraction in Biology is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. We are happy to announce that we will host BioCreative Workshop 2016 at Corvallis Oregon, USA on August 1-2, 2016, satellite to the International Conference on Biological Ontology (ICBO) 2016. The goal of this workshop is to provide a discussion forum for topics of interest to the BioCreative community and to encourage synergistic interactions with the ontology community.
    BioCreative workshop 2016 second day session will run jointly with the ICBO meeting.

    BioCreative 2016 will consist of sessions that were selected based on input from the community via a user survey recently conducted. A draft agenda can be found in agenda

    Keynote Speaker
    William Hersh
    Professor and Chair
    Department of Medical Informatics & Clinical Epidemiology
    School of Medicine, Oregon Health & Science University
    Title: Information Retrieval and Text Mining Evaluation Must Go Beyond “Users”: Incorporating Real-World Context and Outcomes

    Meeting Topics

  • 1-Text mining-facilitated models of curation. Application of text mining methods in areas such as crowdsourcing, database curation, publication process, and metagenomics.
    Chairs: Lynette Hirschman
  • 2-Text mining in precision medicine. Methods for annotations such as disease, phenotype, and adverse reactions in different text sources literature, clinical records and social media.
    Chairs: Donald Comeau
  • 3-Domain portability or generalizability across medical literature. Methods to achieve interoperability, generalizability and scalability in text mining: BioC, RDF and semantic web, among others.
    Chairs:Kevin Bretonnel Cohen and Donald Comeau.
  • 4-Text mining and ontologies. Application of ontologies in text mining, and text mining as ontology builder. (with ICBO).
    Chairs: Cecilia Arighi and Pankaj Jaiswal
  • We are soliciting submission of your work in the form of a 2 page abstract (see instructions to authors). Abstracts will be selected for oral presentation during the corresponding session or poster.
    All information about this event is available at the ICBO conference website site

    Funds are available for US participants for the amount up to $700 to participate in BioCreative workshop 2016. To apply complete the application available here by May 15. Women, under-represented minorities, students, and post-doctoral fellows are encouraged to apply.

    Poster dimension should be maximum Width 36" (inches) x Height 44" (inches).
    Works selected for talks are also encourage to bring a poster to foster further discussion.
    Posters should be set up on August 1 before the poster session time and be available throughout the meeting.