Serviços Personalizados
Journal
Artigo
Indicadores
- Citado por SciELO
- Acessos
Links relacionados
- Similares em SciELO
Compartilhar
Polibits
versão On-line ISSN 1870-9044
Polibits no.40 México Jul./Dez. 2009
Special section: Information Retrieval and Natural Language Processing
Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors
Tomoya Iwakura and Seishi Okamoto
Fujitsu Laboratories Ltd., 11, Kamikodanaka 4chome, Nakaharaku, Kawasaki 2118588, Japan. (iwakura.tomoya@jp.fujitsu.com, seishi@jp.fujitsu.com)
Manuscript received November 4, 2008.
Manuscript accepted for publication August 25, 2009.
Abstract
This paper proposes feature augmentation methods using unlabeled data and several Named Entity (NE) extractors. We collect NErelated information of each word (which we call NErelated labels) from unlabeled data by using NE extractors. NErelated labels which we collect include candidate NE class labels of each word and NE class labels of cooccurring words. To accurately collect the NErelated labels from unlabeled data, we consider methods to collect NErelated labels by using outputs of several NE extractors. We use NErelated labels as additional features for creating new NE extractors. We apply our NE extraction methods using the NErelated labels to IREX Japanese NE extraction task. The experimental results show better accuracy than the previous results obtained with NE extractors using handcrafted resources.
Key words: Named entity recognition, unlabeled data, combination of extractors.
DESCARGAR ARTÍCULO EN FORMATO PDF
REFERENCES
[1] Y. Takemoto, T. Fukushima, and H. Yamada, "A Japanese named entity extraction system based on building a largescale and high quality dictionary and patternmatching rules (in Japanese)," in IPSJ Journal, 42(6), 2001, pp. 15801591. [ Links ]
[2] M. Collins and Y. Singer, "Unsupervised models for named entity classification," in Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999. [Online]. Available: citeseer.ist.psu.edu/collins99unsupervised.html [ Links ]
[3] K. Uchimoto, Q. Ma, M. Murata, H. Ozaku, M. Utiyama, and H. Isahara, "Named entity extraction based on a maximum entropy model and transformati on rules." in Proc. of the ACL 2000, 2000, pp. 326335. [ Links ]
[4] H. Yamada, T. Kudoh, and Y. Matsumoto, "Japanese named entity extraction using Support Vector Machine (in Japanese)," in IPSJ Journal, 43(1), 2002, pp. 4453. [ Links ]
[5] X. Carreras, L. Màrques, and L. Padró, "Named entity extraction using adaboost," in Proc. of CoNLL2002. Taipei, Taiwan, 2002, pp. 167170. [ Links ]
[6] H. Isozaki and H. Kazawa, "Speeding up named entity recognition based on Support Vector Machines (in Japanese)," in IPSJ SIG notes NL1491, 2002, pp. 18. [ Links ]
[7] R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, "Named entity recognition through classifier combination," in Proc. of CoNLL2003, 2003, pp. 168171. [ Links ]
[8] M. Asahara and Y. Matsumoto, "Japanese named entity extraction with redundant morphological analysis," in Proc. of HLTNAACL 2003, 2003, pp. 815. [ Links ]
[9] K. Nakano and Y. Hirai, "Japanese named entity extraction with bunsetsu features (in Japanese)," in IPSJ Journal, 45(3), 2004, pp. 934941. [ Links ]
[10] S. Miller, J. Guinness, and A. Zamanian, "Name tagging with word clusters and discriminative training." in HLTNAACL, 2004, pp. 337-342. [ Links ]
[11] D. Freitag, "Trained named entity recognition using distributional clusters," in Proc. of EMNLP 2004. Association for Computational Linguistics, July 2004, pp. 262269. [ Links ]
[12] R. Ando and T. Zhang, "A highperformance semisupervised learning method for text chunking," in Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor, Michigan: Association for Computational Linguistics, June 2005, pp. 19. [Online]. Available: http://www.aclweb.org/anthology/P/P05/P051001 [ Links ]
[13] D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," in Proc. of ACL1995, 1995, pp. 189196. [ Links ]
[14] E. Riloff and R. Jones, "Learning dictionaries for information extraction by multilevel bootstrapping," in AAAI/IAAI, 1999, pp. 474479. [Online]. Available: citeseer.ist.psu.edu/article/riloff99learning.html [ Links ]
[15] A. Blum and T. Mitchell, "Combining labeled and unlabeled data with cotraining," in Proc. of the 11th COLT, 1998, pp. 92100. [ Links ]
[16] R. K. Ando, "Semantic lexicon construction: Learning from unlabeled data via spectral analysis," in Proc. of CoNLL2004. Boston, MA, USA, 2004, pp. 916. [ Links ]
[17] C. IREX, Proc. of the IREX workshop, 1999. [ Links ]
[18] L. Ramshaw and M. Marcus, "Text chunking using transformationbased learning," in Proc. of the Third Workshop on Very Large Corpora. Association for Computational Linguistics, 1995, pp. 8294. [Online]. Available: citeseer.ist.psu.edu/article/ramshaw95text.html [ Links ]
[19] E. Tjong Kim Sang and J. Veenstra, "Representing text chunks." in Proc. of EACL '99, Bergen, Norway, 1999. [Online]. Available: http://www.cnts.ua.ac.be/Publications/1999/TV99 [ Links ]
[20] T. Kudo and Y. Matsumoto, "Chunking with Support Vector Machines," in Proc. of NAACL 2001, 2001. [ Links ]
[21] , "Fast methods for kernelbased text analysis," in Proc. of ACL2003, 2003, pp. 2431. [ Links ]
[22] V. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998. [ Links ]
[23] J. C. Platt, Probabilities for SV machines, A. J. Smola, P. L. Bartlett, B. Sch¨olkopf, and D. Schuurmans, Eds. MIT Press, 2000. [ Links ]
[24] T. Utsuro, M. Sassano, and K. Uchimoto, "Combining outputs of multiple Japanese named entity chunkers by stacking," in Proc. of EMNLP 2002, 2002, pp. 281288. [ Links ]
[25] R. Sasano and S. Kurohashi, "Japanese named entity recognition using structural natural language processing," in Proc. of IJCNLP'08, 2008, pp. 607612. [ Links ]
[26] J. Kazama and K. Torisawa, "Inducing gazetteers for named entity recognition by largescale clustering of dependency relations," in Proc. of ACL08: HLT, 2008, pp. 407415. [ Links ]
[27] S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi, GoiTaikei A Japanese Lexicon CDROM. Iwanami Shoten, 1999. [ Links ]