Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors

Iwakura, Tomoya; Okamoto, Seishi

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Polibits

versão On-line ISSN 1870-9044

Polibits no.40 México Jul./Dez. 2009

Special section: Information Retrieval and Natural Language Processing

Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors

Tomoya Iwakura and Seishi Okamoto

Fujitsu Laboratories Ltd., 1–1, Kamikodanaka 4–chome, Nakahara–ku, Kawasaki 211–8588, Japan. (iwakura.tomoya@jp.fujitsu.com, seishi@jp.fujitsu.com)

Manuscript received November 4, 2008.
Manuscript accepted for publication August 25, 2009.

Abstract

This paper proposes feature augmentation methods using unlabeled data and several Named Entity (NE) extractors. We collect NE–related information of each word (which we call NE–related labels) from unlabeled data by using NE extractors. NE–related labels which we collect include candidate NE class labels of each word and NE class labels of co–occurring words. To accurately collect the NE–related labels from unlabeled data, we consider methods to collect NE–related labels by using outputs of several NE extractors. We use NE–related labels as additional features for creating new NE extractors. We apply our NE extraction methods using the NE–related labels to IREX Japanese NE extraction task. The experimental results show better accuracy than the previous results obtained with NE extractors using handcrafted resources.

Key words: Named entity recognition, unlabeled data, combination of extractors.

DESCARGAR ARTÍCULO EN FORMATO PDF

REFERENCES

[1] Y. Takemoto, T. Fukushima, and H. Yamada, "A Japanese named entity extraction system based on building a large–scale and high quality dictionary and pattern–matching rules (in Japanese)," in IPSJ Journal, 42(6), 2001, pp. 1580–1591. [ Links ]

[2] M. Collins and Y. Singer, "Unsupervised models for named entity classification," in Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999. [Online]. Available: citeseer.ist.psu.edu/collins99unsupervised.html [ Links ]

[3] K. Uchimoto, Q. Ma, M. Murata, H. Ozaku, M. Utiyama, and H. Isahara, "Named entity extraction based on a maximum entropy model and transformati on rules." in Proc. of the ACL 2000, 2000, pp. 326–335. [ Links ]

[4] H. Yamada, T. Kudoh, and Y. Matsumoto, "Japanese named entity extraction using Support Vector Machine (in Japanese)," in IPSJ Journal, 43(1), 2002, pp. 44–53. [ Links ]

[5] X. Carreras, L. Màrques, and L. Padró, "Named entity extraction using adaboost," in Proc. of CoNLL–2002. Taipei, Taiwan, 2002, pp. 167–170. [ Links ]

[6] H. Isozaki and H. Kazawa, "Speeding up named entity recognition based on Support Vector Machines (in Japanese)," in IPSJ SIG notes NL–149–1, 2002, pp. 1–8. [ Links ]

[7] R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, "Named entity recognition through classifier combination," in Proc. of CoNLL–2003, 2003, pp. 168–171. [ Links ]

[8] M. Asahara and Y. Matsumoto, "Japanese named entity extraction with redundant morphological analysis," in Proc. of HLT–NAACL 2003, 2003, pp. 8–15. [ Links ]

[9] K. Nakano and Y. Hirai, "Japanese named entity extraction with bunsetsu features (in Japanese)," in IPSJ Journal, 45(3), 2004, pp. 934–941. [ Links ]

[10] S. Miller, J. Guinness, and A. Zamanian, "Name tagging with word clusters and discriminative training." in HLT–NAACL, 2004, pp. 337-342. [ Links ]

[11] D. Freitag, "Trained named entity recognition using distributional clusters," in Proc. of EMNLP 2004. Association for Computational Linguistics, July 2004, pp. 262–269. [ Links ]

[12] R. Ando and T. Zhang, "A high–performance semi–supervised learning method for text chunking," in Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor, Michigan: Association for Computational Linguistics, June 2005, pp. 1–9. [Online]. Available: http://www.aclweb.org/anthology/P/P05/P05–1001 [ Links ]

[13] D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," in Proc. of ACL–1995, 1995, pp. 189–196. [ Links ]

[14] E. Riloff and R. Jones, "Learning dictionaries for information extraction by multi–level bootstrapping," in AAAI/IAAI, 1999, pp. 474–479. [Online]. Available: citeseer.ist.psu.edu/article/riloff99learning.html [ Links ]

[15] A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co–training," in Proc. of the 11th COLT, 1998, pp. 92–100. [ Links ]

[16] R. K. Ando, "Semantic lexicon construction: Learning from unlabeled data via spectral analysis," in Proc. of CoNLL–2004. Boston, MA, USA, 2004, pp. 9–16. [ Links ]

[17] C. IREX, Proc. of the IREX workshop, 1999. [ Links ]

[18] L. Ramshaw and M. Marcus, "Text chunking using transformation–based learning," in Proc. of the Third Workshop on Very Large Corpora. Association for Computational Linguistics, 1995, pp. 82–94. [Online]. Available: citeseer.ist.psu.edu/article/ramshaw95text.html [ Links ]

[19] E. Tjong Kim Sang and J. Veenstra, "Representing text chunks." in Proc. of EACL '99, Bergen, Norway, 1999. [Online]. Available: http://www.cnts.ua.ac.be/Publications/1999/TV99 [ Links ]

[20] T. Kudo and Y. Matsumoto, "Chunking with Support Vector Machines," in Proc. of NAACL 2001, 2001. [ Links ]

[21] ––––––––––, "Fast methods for kernel–based text analysis," in Proc. of ACL–2003, 2003, pp. 24–31. [ Links ]

[22] V. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998. [ Links ]

[23] J. C. Platt, Probabilities for SV machines, A. J. Smola, P. L. Bartlett, B. Sch¨olkopf, and D. Schuurmans, Eds. MIT Press, 2000. [ Links ]

[24] T. Utsuro, M. Sassano, and K. Uchimoto, "Combining outputs of multiple Japanese named entity chunkers by stacking," in Proc. of EMNLP 2002, 2002, pp. 281–288. [ Links ]

[25] R. Sasano and S. Kurohashi, "Japanese named entity recognition using structural natural language processing," in Proc. of IJCNLP'08, 2008, pp. 607–612. [ Links ]

[26] J. Kazama and K. Torisawa, "Inducing gazetteers for named entity recognition by large–scale clustering of dependency relations," in Proc. of ACL–08: HLT, 2008, pp. 407–415. [ Links ]

[27] S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Ooyama, and Y. Hayashi, Goi–Taikei –A Japanese Lexicon CDROM. Iwanami Shoten, 1999. [ Links ]