RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative VI

PM-task Training Data [2017-05-31]

Please download the training data for the Precision Medicine Task. (JSON and XML formats)

  • Triage training dataset consists of 4082 annotated PubMed documents as relevant or not relevant. The file contains a BioC collection of documents. Each document contains two passages: the title and abstract, and has a document ID corresponding to the article's PubMed ID. Each document is characterized with an infon tag marking the document as relevant or not.
  • Relations training dataset consists of 597 PubMed articles marked relevant in the Triage set. This set of articles is annotated with PPI relations that are affected by a mutation. The mentions of these interacting genes/proteins in the PubMed title and abstract are also annotated and linked to their corresponding Gene ID. The non-relevant interactions are not annotated.

    You can use this link to visualize the relations training dataset.
Feel free to contact task organizers for questions:

Task organizers:

Rezarta Islamaj Dogan (NCBI)
Andrew Chatr-aryamontri (BioGrid)
Sun Kim (NCBI)
Don Comeau (NCBI)
Zhiyong Lu (NCBI)

Downloads