Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative IV

Track 1- BioC: The BioCreative Interoperability Initiative [2012-11-15]

The BioC Project: In a world where everything is becoming digital, textmining is a crucial research area. The development of natural language and information retrieval tools for various textmining purposes has become an initial indispensable step for many research problems. However, while the capabilities and quality of these tools continue to grow, there are a limited number of options for making them easily, quickly and efficiently work together. Consequently, every new generation of researchers creates their own software specific to their research, their environment, and the format of the data they study. It is, after all, the path requiring the least labor.

The bottleneck of textmining research is processing data in various formats, writing software to explore the data in various formats, and implementing algorithms that use data in various formats. Naturally, most of these efforts are limited in their use, not readily adaptable or re-usable. The only way to promote sophistication is to promote re-usability. This is the reuse problem. The reuse problem exists because of the difficulty of achieving interoperability and because of the cognitive burden of learning new systems and languages.

The goals of this project are: simplicity, interoperability, broad use and reuse. The most significant difference in this proposal from previous efforts is the emphasis on simplicity of use. There should be little investment required to learn to use data in a given format or a software module to process that data. Since we are interested in reuse, the focus will be on common tasks in natural language processing that are broadly useful for textmining. Use of the BioC approach and tools will be strongly encouraged if not required for the CTD and IAT tasks of BioCreative IV.

Committee promoting BioC:

  • Paolo Ciccarese, MIND Informatics, Massachusetts General Hospital, Harvard Medical School
  • Kevin Cohen, University of Colorado School of Medicine
  • Donald C. Comeau, National Center for Biotechnology Information
  • Martin Krallinger, Spanish National Cancer Research Center (CNIO)
  • Lynette Hirschman, The MITRE Corporation
  • Rezarta Islamaj Dogan, National Center for Biotechnology Information
  • Zhiyong Lu, National Center for Biotechnology Information
  • Fabio Rinaldi, University of Zurich
  • Manabu Torii, University of Delaware
  • Thomas Wiegers, North Carolina State University
  • W. John Wilbur, National Center for Biotechnology Information
  • BioCreative IV Interoperability Track:

    For the BioCreative IV Workshop, we invite teams to participate in the BioC initiative and contribute to this effort by:

      a) Preparing a BioC module that can be seamlessly coupled with the rest of the BioC code and definitions, and that performs an important NLP or BioNLP task. The task is left to participating teams to choose, implement and validate for the purposes of this challenge. If you are participating in any other BioCreative Track challenge and are producing a BioC compliant module, you are welcome to submit your module to Track 1. If the module you wish to produce is independent of the other tracks, then we request that you submit a proposal to the program committee for approval by end of July 2013. The program committee wishes to approve all proposed independent projects at this stage to avoid overlapping tasks. Such a proposal can consist of a couple of paragraphs and needs to be a high level description of the module you wish to develop and contribute to the repository.
      b) Where necessary preparing a corpus or otherwise making data available, in the BioC format, which will allow the challenge committee to test and judge the performance of the module produced in part a).
      c) Writing a paper that describes the BioC module produced in part a), the data provided in part b), evaluation of the module, and its proposed uses. The paper will be published as part of the BioCreative IV proceedings and a selected number of papers will also be considered for publication in a special journal issue. An accepted module along with the accompanying paper, and data where appropriate, is understood to be a contribution to the BioC public repository. The final products must be submitted to the repository by September 8, 2013 to give the organizers sufficient time to judge the acceptability of a product.

    All entries will be judged for the significance of their contribution to the initiative and for their compliance to the guidelines:

    BioC packages in both C++ and Java can be downloaded from, with code that includes basic classes to work with data in BioC format, as well as a couple of simple applications and examples. For any proposal, a successful module for BioC Interoperability track should work seamlessly with BioC code and perform an important NLP or BioNLP task.

    What makes a successful entry for the Interoperability Track?

    It helps to think on why the proposed module is useful from the NLP or BioNLP point of view, and what particular NLP application will benefit from it. The description of the module should elaborate on what kind of data it processes what kind of output it produces and what general method does it use. As a method, if it is a novel method, how does it differ and how does it compare with other approaches both at the algorithmic level and at the performance level. If on the other hand it is a known method, than it also needs to be stated and explained accordingly. The description that you submit, in particular your final write-up, should clearly explain these things.

    In addition, participants in other BioCreative tracks are eligible to submit a description of their system for consideration as a BioCreative Track 1 task.

    Selected teams will be invited to demonstrate their software in a special demo session to be held during the BioCreative IV Workshop. For the latest information and guidelines use BioC Home.