RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative III

BioCreative Annotation Server examples (News) [2010-08-06]

Sample implementations of Annotation Servers in Java, Perl, Python, and Ruby, and a script to simulate the BCMS, by having the script sending you the texts to your local Annotation Server and evaluating the format of the results. This should help you speed up development, so you do not need to develop the Annotation Server online through the real Meta-Server and can spot problems easliy.

The content consists of two distinct groups, the server itself (found as variant in Java, Perl, Python, and Ruby in the respective directories) and a test client that can be called with one or more fulltext or MEDLINE files (either the IMT or ACTask). The text gets sent to a running Annotation Server you specify (i.e., your Annotation Server server [script] has to be running when starting the test script), and after the Annotation Server responds to the client, the test client asserts the integrity of the received data. Additionally, the result data for BC III can be written to output files ready for use with the evaluation libarary if given as optional arguments to the BCMS test client script. The Perl, Python, and Ruby Annotation Server samples are all set up so you can try them immediately without much fuss - all you need to provide to the BCMS test client script is a text file to annotate and run one of those sample Annotation Server scripts in the background.

The Ruby sample Annotation Server even has a nice bonus: a baseline matcher that does simple RegEx matching of all detection method names in PSI:MI against the text, written by Miguel Vázquez, our newest post-doc here at the CNIO. It therefore nicely demonstrates, in about 130 lines including comments and especially the baseline "text-mining pipeline" (actually, regex matcher setup, more correctly...), how easy and simple it is to implement a whole working Annotation Server that even returns some sensible results. It is extremely fast, too - for the whole training set of ACT and IMT together, it needs less than a few minutes when run as a server, clearly demonstrating that the main overhead is to be found in the text-mining pipelines (as expected).

For more information about input/output structure of the PPI online challenge and a tutorial, please read the article "PPI Online Participation via the BCMS".