Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Polibits
versión On-line ISSN 1870-9044
Polibits no.52 México jul./dic. 2015
https://doi.org/10.17562/PB-52-3
An Approach towards Semi-automated Biomedical Literature Curation and Enrichment for a Major Biological Database
Fabio Rinaldi1, Oscar Lithgow-Serrano2, Alejandra López-Fuentes2, Socorro Gama-Castro2, Yalbi I. Balderas-Martínez2, Hilda Solano-Lira2, and Julio Collado-Vides2
1 Institute of Computational Linguistics, University of Zurich, Switzerland (e-mail: fabio.rinaldi@uzh.ch).
2 Computational Genomics Program, Center for Genomic Sciences, Universidad Nacional Autónoma de México Cuernavaca, Morelos, México.
Manuscript received on June 18, 2015
Accepted for publication on August 12, 2015
Published on October 15, 2015
Abstract
As part of a large-scale biocuration project, we are developing innovative techniques to process the biomedical literature and extract information relevant to specific biological investigations. Biological experts routinely extract core information from the scientific literature using a manual process known as scientific curation. The aim of our activity is to improve the efficiency of this process by leveraging upon natural language processing technologies in a text mining system. There are two lines of investigation that we pursue: (1) finding information relevant for curation and present it in an adaptive interface, and (2) use sentence-similarity techniques to create interlinks across articles in order to allow a process of knowledge discovery.
Key words: Text mining, natural language processing, biocuration.
DESCARGAR ARTÍCULO EN FORMATO PDF
ACKNOWLEDGMENT
The development of OntoGene/ODIN has been supported by the Swiss National Science Foundation (grant 105315-130558/1, PI: Fabio Rinaldi) and by the Data Science Group at Hoffmann-La Roche, Basel, Switzerland. The development of RegulonDB is supported by NIH grant 1R01GM110597 to Julio Collado Vides.
REFERENCES
[1] L. D. Stein, "The case for cloud computing in genome informatics," Genome Biology, vol. 11, no. 5, p. 207, May 2010. [Online]. Available: http://dx.doi.org/10.1186/gb-2010-11-5-207 [ Links ]
[2] A. M. Huerta, H. Salgado, D. Thieffry, and J. Collado-Vides, "RegulonDB: A database on transcriptional regulation in escherichia coli," Nucleic Acids Research, vol. 26, no. 1, pp. 55-59, 1998. [Online]. Available: http://dx.doi.org/10.1093/nar/26.1.55 [ Links ]
[3] H. Salgado, M. Peralta-Gil, S. Gama-Castro, A. Santos-Zavaleta, L. Muniz-Rascado, J. S. Garcia-Sotelo, V. Weiss, H. Solano-Lira, I. Martinez-Flores, A. Medina-Rivera, G. Salgado-Osorio, S. Alquicira-Hernandez, K. Alquicira-Hernandez, A. Lopez-Fuentes, L. Porron-Sotelo, A. M. Huerta, C. Bonavides-Martinez, Y. I. Balderas-Martinez, L. Pannier, M. Olvera, A. Labastida, V. Jimenez-Jacinto, L. Vega-Alvarado, V. D. Moral-Chavez, A. Hernandez-Alvarez, E. Morett, and J. Collado-Vides, "RegulonDB v8.0: Omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more," Nucleic Acids Res., vol. 41, no. D1, pp. D203-D213, 2013. [ Links ]
[4] F. Rinaldi, "The ontogene system: An advanced information extraction application for biological literature," EMBnet.journal, vol. 18, no. Suppl B, pp. 47-49, 2012. [Online]. Available: http://journal.embnet.org/index.php/embnetjournal/article/view/546/755 [ Links ]
[5] F. Rinaldi, S. Clematide, Y. Garten, M. Whirl-Carrillo, L. Gong, J. M. Hebert, K. Sangkuhl, C. F. Thorn, T. E. Klein, and R. B. Altman, "Using ODIN for a PharmGKB re-validation experiment," Database: The Journal of Biological Databases and Curation, vol. 2012, pp. 1-12, 2012. [Online]. Available: http://database.oxfordjournals.org/content/2012/bas021.full [ Links ]
[6] F. Rinaldi, S. Clematide, H. Marques, T. Ellendorff, R. Rodriguez-Esteban, and M. Romacker, "Ontogene web services for biomedical text mining," BMC Bioinformatics, vol. 15, no. Suppl 14, p. S6, 2014. [ Links ]
[7] F. Rinaldi, S. Clematide, and G. Schneider, "Odin: Advanced text mining in support of the curation process," in Pacific Symposium on Biocomputing (PSB), Big Island, Hawaii, Jan. 2012. [ Links ]
[8] F. Rinaldi, A. P. Davis, C. Southan, S. Clematide, T. R. Ellendorff, and G. Schneider, "ODIN: a customizable literature curation tool," in Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, vol. 1, 2013, pp. 219-223. [ Links ]
[9] S. Gama-Castro, F. Rinaldi, A. Lopez-Fuentes, Y. I. Balderas-Martínez, S. Clematide, T. R. Ellendorff, A. Santos-Zavaleta, H. Marques-Madeira, and J. Collado-Vides, "Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12," Database: The Journal of Biological Databases and Curation, vol. bau049, pp. 1-13, 2014. [Online]. Available: http://database.oxfordjournals.org/content/2014/bau049 [ Links ]
[10] R. Mihalcea, C. Corley, and C. Strapparava, "Corpus-based and knowledge-based measures of text semantic similarity," in Proceedings of the 21st National conference on Artificial Intelligence, vol. 1, 2006, pp. 775-780. [Online]. Available: http://www.aaai.org/Papers/AAAI/2006/AAAI06-123.pdf [ Links ]
[11] H. Schutze, "Dimensions of meaning," in Proceedings Supercomputing 1992, 1992, pp. 787-796. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=236684 [ Links ]
[12] A. Wallecha, J. Correnti, V. Munster, and M. van der Woude, "Phase variation of ag43 is independent of the oxidation state of oxyr," Journal of Bacteriology, vol. 185, no. 7, pp. 2203-2209, 2003. [ Links ]
[13] M. Zheng, B. Doan, T. D. Schneider, and G. Storz, "Oxyr and soxrs regulation of fur," Journal of Bacteriology, vol. 181, no. 15, pp. 4639-4643, 1999. [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC103597/ [ Links ]
[14] M. Zheng, X. Wang, L. J. Templeton, D. R. Smulski, R. A. LaRossa, and G. Storz, "DNA microarray-mediated transcriptional profiling of the escherichia coli response to hydrogen peroxide," Journal of Bacteriology, vol. 183, no. 5, pp. 4562-4570, 2001. [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC95351/ [ Links ]
[15] K. Toutanova, D. Klein, and C. D. Manning, "Feature-rich part-of-speech tagging with a cyclic dependency network," in In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), vol. 1, 2003, pp. 252-259. [Online]. Available: http://dl.acm.org/citation.cfm?id=1073478 [ Links ]
[16] M. Porter, "Snowball: A language for stemming algorithms," 2001. [Online]. Available: http://www.snowball.tartarus.org/texts/introduction.html [ Links ]
[17] A. Peter, "Efficient java matrix library (EJML)." [Online]. Available: http://ejml.org [ Links ]
[18] M. Krallinger, M. Vazquez, F. Leitner, D. Salgado, A. Chatr-aryamontri, A. Winter, L. Perfetto, L. Briganti, L. Licata, M. Iannuccelli, L. Castagnoli, G. Cesareni, M. Tyers, G. Schneider, F. Rinaldi, R. Leaman, G. Gonzalez, S. Matos, S. Kim, W. Wilbur, L. Rocha, H. Shatkay, A. Tendulkar, S. Agarwal, F. Liu, X. Wang, R. Rak, K. Noto, C. Elkan, Z. Lu, R. Dogan, J.-F. Fontaine, M. Andrade-Navarro, and A. Valencia, "The protein-protein interaction tasks of biocreative III: Classification/ranking of articles and linking bio-ontology concepts to full text," BMC Bioinformatics, vol. 12, no. Suppl 8, p. S3, 2011. [Online]. Available: http://www.biomedcentral.com/1471-2105/12/S8/S3 [ Links ]
[19] F. Rinaldi, G. Schneider, K. Kaljurand, S. Clematide, T. Vachon, and M. Romacker, "OntoGene in BioCreative II.5," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 472-480, 2010. [ Links ]
[20] G. Schneider, S. Clematide, and F. Rinaldi, "Detection of interaction articles and experimental methods in biomedical literature," BMC Bioinformatics, vol. 12, no. Suppl 8, p. S13, 2011. [Online]. Available: http://www.biomedcentral.com/1471-2105/12/S8/S13 [ Links ]
[21] F. Rinaldi, S. Clematide, S. Hafner, G. Schneider, G. Grigonyte, M. Romacker, and T. Vachon, "Using the OntoGene pipeline for the triage task of BioCreative 2012," The Journal of Biological Databases and Curation, vol. bas053, 2013. [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3568389/ [ Links ]