RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VI

Track 4: Mining protein interactions and mutations for precision medicine (PM) [2017-03-03]

Precision Medicine and Biomedical Information:

The precision medicine initiative (PMI) promises to identify individualized treatment depending on a patients’ genetic profile and their related responses. In order to help health professionals and researchers in the precision medicine endeavor, one goal is to leverage the knowledge available in the scientific published literature and extract clinically useful information that links genes, mutations, and diseases to specialized treatments (1).

Proteins and their interactions are the building blocks of metabolic and signaling pathways regulating cellular homeostasis (2). Understanding how allelic variation and genetic background influence the functionality of these pathways is crucial for predicting disease phenotypes and personalized therapeutical approaches. A crucial step is the mapping of gene products functional regions through the identification and study of mutations (naturally occurring or synthetically induced) affecting the stability and affinity of molecular interactions.

* Please see here for more details on how mutational analysis can reveal crucial regions for protein-protein interaction.

Overview of the PM task:

Despite previous studies in protein-protein interaction (e.g. (3, 4)) and mutation extraction (e.g. (5)), no one has investigated how to combine these efforts in order to help assessing and curating the clinical significance of genetic variants, an essential step towards precision medicine. Thus, the PM task in BioCreative VI aims to bring together the biomedical text mining community in a new BioCreative challenge task (6) focusing on identifying and extracting from the biomedical literature protein-protein interactions changed by genetic mutations. This challenge consists of two subtasks:

  • Document Triage: Identify relevant PubMed citations describing genetic mutations affecting protein-protein interactions, and
  • Relation Extraction: Extract experimentally verified PPI affected by the presence of a genetic mutation.
    Document Triage Task:

    The training dataset will consist of a set of ~10K PubMed articles. Many of these articles are manually labelled as relevant/not relevant by BioGRID database curators, while the rest of the articles is unlabeled. Participants in this subtask will be expected to build automatic methods capable of receiving a list of PMIDs and return a relevance-ranked judgement of the test set for triage purposes.

    Relation Extraction Task:

    A subset of the relevant articles in Document Triage Task has been manually annotated with relevant interacting protein pairs. Each PubMed article in this set has at least one interacting pair which is listed with the UniProt ID, and GeneEntrez ID of the two interactors. These protein-protein interactions have been experimentally verified and the analysis of natural occurring or synthetic mutations has identified protein residues crucial for the interaction. Participants in this subtask will be expected to build automated methods that are capable of receiving a set of PMID documents and return the set of interacting protein pairs (and their corresponding UniProt/Gene Entrez IDs) mentioned in the text that are affected by a genetic mutation.

    The validity of the text mining methods will be evaluated using standard metrics such as average precision, f-measure, etc. Additionally, the utility of participating systems will be assessed by a group of database curators from BioGrid.

    Data Format and Pre-annotations:

    The PM task organizers will provide the training dataset in multiple formats such as BioC (7). Task organizers will also provide several pre-computed annotations for all articles in the training set with automatically generated labels for diseases, genes/proteins, species, mutations, and other labels (8, 9). The BioC format is discussed in the BioC Webinar series. Organizers will also prepare a Precision Medicine Task Q&A Webinar in April 2017.

    Important dates:

    Mar/April 2017: Release of training dataset
    April 2017: PM Task Q&A webinar
    July/August 2017: Release of test dataset
    August 2017: Team results submission & evaluation
    September 2017: Workshop paper due
    October 18-20, 2017: BioCreative Workshop in Washington DC

    Task organizers:

    Rezarta Islamaj Dogan (NCBI)
    Andrew Chatr-aryamontri (BioGrid)
    Sun Kim (NCBI)
    Don Comeau (NCBI)
    Zhiyong Lu (NCBI)


    1. Singhal A, Simmons M, Lu Z. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol. 2016;12(11):e1005017.

    2. Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017;45(D1):D369-D79.

    3. Kim S, Islamaj Dogan R, Chatr-Aryamontri A, Chang CS, Oughtred R, Rust J, et al. BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database : the journal of biological databases and curation. 2016;2016.

    4. Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-Aryamontri A, Winter A, et al. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC bioinformatics. 2011;12 Suppl 8:S3.

    5. Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013;29(11):1433-9.

    6. Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform. 2016;17(1):132-44.

    7. Comeau DC, Islamaj Dogan R, Ciccarese P, Cohen KB, Krallinger M, Leitner F, et al. BioC: a minimalist approach to interoperability for biomedical text processing. Database : the journal of biological databases and curation. 2013;2013:bat064.

    8. Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41(Web Server issue):W518-22.

    9. Wei CH, Leaman R, Lu Z. Beyond accuracy: creating interoperable and scalable text-mining web services. Bioinformatics. 2016;32(12):1907-10.