RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VI

PM-task Training Data (Tasks) [2017-05-31]

Please download the training data for the Precision Medicine Task. (JSON and XML formats)

  • Triage training dataset consists of 4082 annotated PubMed documents as relevant or not relevant. The file contains a BioC collection of documents. Each document contains two passages: the title and abstract, and has a document ID corresponding to the article's PubMed ID. Each document is characterized with an infon tag marking the document as relevant or not.
  • Relations training dataset consists of 597 PubMed articles marked relevant in the Triage set. This set of articles is annotated with PPI relations that are affected by a mutation. The mentions of these interacting genes/proteins in the PubMed title and abstract are also annotated and linked to their corresponding Gene ID. The non-relevant interactions are not annotated.

    You can use this link to visualize the relations training dataset.
Feel free to contact task organizers for questions:

Task organizers:

Rezarta Islamaj Dogan (NCBI)
Andrew Chatr-aryamontri (BioGrid)
Sun Kim (NCBI)
Don Comeau (NCBI)
Zhiyong Lu (NCBI)

Downloads

Publications

BC V.5 - Workshop Proceedings (Resources) [2017-05-19]

PDF

Co-located: BioCreative V.5 Challenge Evaluation Workshop PDF and ELIXIR-EXCELERATE Workshop Text-mining infrastructure requirements PDF
PDF
Table of Contents: BC V.5 - Workshop Proceedings

    Editorial and track overview papers

  1. The BioCreative V.5 evaluation workshop: tasks, organization, sessions and topics . Krallinger et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 8-10 PDF
  2. Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks. Pérez-Pérez et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 11-18 PDF
  3. Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track . Pérez-Pérez et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 19-27 PDF
  4. Chemical/gene entity recognition tracks: CEMP & GPRO

  5. DUTIR at the BioCreative V.5.BeCalm Tasks: A BLSTM-CRF Approach for Biomedical Entity Recognition in Patents. Luo et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 28-39 PDF
  6. HITextracter System for Chemical and Gene/Protein Entity Mention Recognition in Patents. Liu et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 40-46 PDF
  7. CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools. Liu et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 47-53 PDF
  8. Neji: Recognition of Chemical and Gene Mentions in Patent Texts. Santos and Matos. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 54-60 PDF
  9. Chemlistem - chemical named entity recognition using recurrent neural networks. Corbett and Boyle. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 61-68 PDF
  10. Recognition of Chemical Entity Mention in Patents Using Feature-rich CRF. Guo et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 69-72 PDF
  11. Towards Robust Chemical Recognition with TaggerOne at the BioCreative V.5 CEMP Task. Leaman and Lu. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 73-80 PDF
  12. A hybrid text mining system for chemical entity recognition and classification using dictionary look-up and pattern matching @ BeCalm challenge evaluation workshop. Raja et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 81-88 PDF
  13. IBEnt: Chemical Entity Mentions in Patents using ChEBI. Lamurias et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 89-95 PDF
  14. ChemGrab: Identification of Chemical Names Using a Combined Negative-Dictionary and Rule-Based Approach. Sharma and Sarkar. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 96-103 PDF
  15. Combining the BANNER tool with the DINTO ontology for the CEMP task of BioCreative V.5. Colon-Ruiz et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 104-107 PDF
  16. An Ensemble Algorithm for Sequential Labelling: A Case Study in Chemical Named Entity Recognition. Wang et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 108-114 PDF
  17. Statistical Principle-based Approach for Gene and Protein Related Object Recognition. Lai et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 115-121 PDF

  18. TIPS (Technical interoperability and performance of annotation servers) Track

  19. Tagger: BeCalm API for rapid named entity recognition. Jensen. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 122-129 PDF
  20. MER: a Minimal Named-Entity Recognition Tagger and Annotation Server. Couto et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 130-137 PDF
  21. SIA: Scalable Interoperable Annotation Server. Kirschnick and Thomas. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 138-145 PDF
  22. High-throughput, interoperability and benchmarking of text-mining with BeCalm biomedical metaserver. Madrid and Valencia. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 146-155 PDF
  23. Performance and interoperability assessment of Disease Extract Annotation Server (DEAS). Jonnagaddala et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 156-162 PDF
  24. TextImager as an interface to BeCalm. Hemati et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 163-166 PDF
  25. Olelo’s named-entity recognition web service in the BeCalm TIPS task. Folkerts and Neves. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 167-174 PDF
  26. OGER: OntoGene’s Entity Recogniser in the BeCalm TIPS Task. Furrer and Rinaldi. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 175-182 PDF
  27. READ-Biomed-Server: A Scalable Annotation Server Using the UIMA Concept Mapper. Teng and Verspoor. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 183-190 PDF
  28. Neji: DIY web services for biomedical concept recognition. Santos and Matos. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 191-195 PDF
  29. NTTMU-SCHEMA BeCalm API in BioCreative V.5. Dai et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 196-204 PDF
  30. Micro-RNA Recognition in Patents in BioCreative V.5. Wang et al. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, 205-209 PDF

Downloads

BioCreative VI

BioCreative VI Challenge and Workshop (Events) [2017-02-06]

BioCreative VI Challenge and Workshop

October 18-20, DoubleTree by Hilton Hotel, Bethesda, Maryland USA

Team registration for tracks is now open!

Workshop registration will start mid July

BioCreative: Critical Assessment of Information Extraction in Biology (http://www.biocreative.org/ ) is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreative VI will run the following tracks:

  • Track 1: Interactive Bio-ID Assignment (IAT-ID) Track on innovations in Biomedical Digital Curation
    Organizers: Lynette Hirschman, Cecilia Arighi, Thomas Lemberger, Robin Liechti and Cathy Wu
    The Bio-ID track will explore the ID assignment to selected bioentities both at the pre- and post-publication stages, with the aim of facilitating downstream article curation. To do this we are bringing together the various stakeholders to discuss functional requirements and develop interoperable digital curation tools. Built on previous BioCreative experiments, including the interactive tracks, the BioC, gene/protein/chemical extraction tracks, and BeCalm framework, the task is designed to foster the development of an integrated and interoperable workflow of multiple text mining tools for real-world testing in pilot publishing frameworks.
    More information about this track can be found under Tasks or at http://www.biocreative.org/tasks/biocreative-vi/track-1/

  • Track 2: Text-mining services for Kinome Curation
    Organizers: Julien Gobeill, Patrick Ruch and Pascale Gaudet
    Literature triage (selection of relevant articles for curation) is a basic task performed by virtually all curated molecular biology databases. This task will focus on triage for both Protein-Disease and Protein-GO annotations related to human kinases. The full data set covers a significant fraction of the Human Kinome (300 proteins out ~500 kinases), with 30,000 annotations from 13,000 articles ready to be integrated in the neXtProt database by 2017. It contains comprehensive annotations about kinase substrates, GO Biological Processes and Diseases. Each annotation is provided with a PMID.
    The first two tasks deal with triage of abstracts or full-texts. The third task deals with passage selection: given a kinase, an axis, and a full-text regarded as relevant after SIB curation, the systems will return a snippet of max. 500 characters containing enough information to make an annotation.
    More information about this track can be found under Tasks or at http://www.biocreative.org/tasks/biocreative-vi/track-2/

  • Track 3: Extraction of causal network information using the Biological Expression Language (BEL)
    Organizer: Juliane Fluck, Sumit Madan and Justyna Szostak
    Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining.
    In BioCreative V, we tackled this complexity by extracting causal relationships represented in Biological Expression Language (BEL). BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The smallest unit is a BEL statement or BEL nanopub, expressing a single causal relationship. In the last BioCreative, there was only a limited time for participants to train on the data and, in addition, the evaluation environment became only available for the test phase. Furthermore, for the second subtask, the sentence classification, no training data was available. Therefore, we decide to present the same task based on new test data. This time, the training data for both subtask is available and, the evaluation environment can be used during the training time. As before, the challenge is organized into two tasks which will evaluate the complementary aspects of the problem:
      1-Given selected textual evidence, construct the corresponding BEL statement
      2-Given a BEL statement, detect all available textual evidence
    The description of the task, the training data and links to the papers and to the evaluation website can be found under the following URL:
    https://wiki.openbel.org/display/BIOC/BioCreative+VI+Track+3+%28BEL+Task+2017%29+Home
    More information about this track can be found under Tasks or at http://www.biocreative.org/tasks/biocreative-vi/track-3/

  • Track 4: Mining protein interactions and mutations for precision medicine (PM)
    Organizers: Rezarta Islamaj Dogan, Andrew Chatr-aryamontri, Sun Kim, Donald C. Comeau, Zhiyong Lu
    We aim to bring together the biomedical text mining community in a new challenge for precision medicine, focusing on identifying and extracting protein-protein interactions affected by mutations described in the biomedical literature. Two subtasks are proposed:
      1-Document Triage: Identifying relevant PubMed citations describing genetic mutations affecting protein-protein interactions
      2-Relation Extraction: Extracting PPI pairs experimentally verified to be affected by the presence of a genetic mutation
    Task datasets will be available in multiple formats (e.g. BioC) and consist of PubMed articles curated for BioGRID and other PPI databases.
    More information about this track can be found under Tasks or at http://www.biocreative.org/tasks/biocreative-vi/track-4/

  • Track 5: Text mining chemical-protein interactions
    Organizers: Martin Krallinger, Alfonso Valencia, Analia Lourenço
    Considerable work has been done on the detection of genes/proteins and also chemical compound mentions, but despite the relevance of relations between them for both biological and well as pharmacological and clinical research only a limited number of strategies have been published to detect interactions between them. A range of different types chemical-protein/gene interactions are of key relevance for biology, including metabolic relations (e.g. substrates, products) inhibition, binding or induction associations. Our aim is to promote research in this field, and to focus on chemical-protein interactions that might be of relevance for precision medicine as well as for drug discovery and basic biomedical research. This task will consist of two subtasks:
      1- Chemical-protein interaction pair detection task: Extracting relations between chemical entities and protein/genes belonging to at least one of a pre-defined set of relation types.
      2- Chemical-protein interaction type detection task: Providing for previously detected interaction pairs (of task 1) the corresponding relation type qualifier).

    Task training and test datasets will prepared and consist of abstracts curated for chemical entity and protein/gene mentions (including mention offsets) as well as relationships between them according to a predefined set of interaction types.
    More information about this track can be found under Tasks or at http://www.biocreative.org/tasks/biocreative-vi/track-5/

    TEAM REGISTRATION

    Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks.
    To register a team go to the team registration page

    BIOCREATIVE ORGANIZING COMMITTEE

  • Cecilia Arighi, University of Delaware, USA
  • Andrew Chatr-aryamontri, Institute for Research in Immunology and Cancer, Université de Montréal, Canada
  • Donald Comeau, National Center for Biotechnology Information (NCBI), NIH, USA
  • Kevin Cohen, University of Colorado, USA
  • Juliane Fluck, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany
  • Sumit Madan, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Germany
  • Rezarta Islamaj Dogan, National Center for Biotechnology Information (NCBI), NIH, USA
  • Pascale Gaudet, Swiss Bioinformatics Institute, Switzerland
  • Julien Gobeill, Swiss Bioinformatics Institute, Switzerland
  • Lynette Hirschman, MITRE Corporation, USA
  • Sun Kim, National Center for Biotechnology Information (NCBI), NIH, USA
  • Martin Krallinger, Spanish National Cancer Centre, CNIO, Spain
  • Zhiyong Lu, National Center for Biotechnology Information (NCBI), NIH, USA
  • Fabio Rinaldi, Swiss Bioinformatics Institute, Switzerland
  • Patrick Ruch, Swiss Bioinformatics Institute, Switzerland
  • Alfonso Valencia, Spanish National Cancer Centre, CNIO, Spain
  • Cathy Wu, University of Delaware and Georgetown University, USA
  • Back to top