Show simple item record

dc.contributor.authorComeau, Donald C.en_US
dc.contributor.authorIslamaj Doğan, Rezartaen_US
dc.contributor.authorCiccarese, Paoloen_US
dc.contributor.authorCohen, Kevin Bretonnelen_US
dc.contributor.authorKrallinger, Martinen_US
dc.contributor.authorLeitner, Florianen_US
dc.contributor.authorLu, Zhiyongen_US
dc.contributor.authorPeng, Yifanen_US
dc.contributor.authorRinaldi, Fabioen_US
dc.contributor.authorTorii, Manabuen_US
dc.contributor.authorValencia, Alfonsoen_US
dc.contributor.authorVerspoor, Karinen_US
dc.contributor.authorWiegers, Thomas C.en_US
dc.contributor.authorWu, Cathy H.en_US
dc.contributor.authorWilbur, W. Johnen_US
dc.date.accessioned2014-03-11T13:26:45Z
dc.date.issued2013en_US
dc.identifier.citationComeau, D. C., R. Islamaj Doğan, P. Ciccarese, K. B. Cohen, M. Krallinger, F. Leitner, Z. Lu, et al. 2013. “BioC: a minimalist approach to interoperability for biomedical text processing.” Database: The Journal of Biological Databases and Curation 2013 (1): bat064. doi:10.1093/database/bat064. http://dx.doi.org/10.1093/database/bat064.en
dc.identifier.issn1758-0463en
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:11879601
dc.description.abstractA vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/en
dc.language.isoen_USen
dc.publisherOxford University Pressen
dc.relation.isversionofdoi:10.1093/database/bat064en
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3889917/pdf/en
dash.licenseLAAen_US
dc.titleBioC: a minimalist approach to interoperability for biomedical text processingen
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden
dc.relation.journalDatabase: The Journal of Biological Databases and Curationen
dash.depositing.authorCiccarese, Paoloen_US
dc.date.available2014-03-11T13:26:45Z
dc.identifier.doi10.1093/database/bat064*
dash.authorsorderedfalse
dash.contributor.affiliatedCiccarese, Paolo Nunzio


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record