Stemming: text retrieval tools for digital libraries
Main Article Content
Abstract
The ability to search documents by content, i. e., to look for documents dealing with a certain subject, is one of the most interesting services offered by a Digital Library. In order to offer these services, digital libraries need resources and text retrieval tools (such as corpora, electronic dictionaries, stemmers, or morphological analyzers), which must be developed for the language in which the documents of the library are written. The quantity and quality of the developed resources and tools depend on the used language. English has always had a great advantage in this field. On the contrary, in the Iberian Peninsula, Digital Libraries devoted to texts written in Galician have difficulties to develop content search services, since there are not enough tools and resources to do these implementations yet. This paper shows a Text Retrieval tool for the Galician language, built through a collaboration between Galician–Portuguese Philology and Computer Science researchers from the University of A Corunna. This tool is a stemmer that was first introduced in 2002, and it has been optimized, completed and tested during last years. We have used several different corpora to perform the tests, in order to accurately incorporate content search services in Digital Libraries.
Keywords:
Downloads
Metrics
Article Details
References
Brisaboa, N. R. / Fariña, A. / Navarro, G. / Iglesias, E. L / Paramá, J. R. / Esteller, M. F. (2002): “Compresión de textos en Lenguas Romances”, en Brisaboa, N. R. (ed.): Ingeniería del Software: 169-180 (Colombia: AECI).
Brisaboa, N. R. / Fernández, C. (2001): “Introdución ás Bibliotecas Dixitais”, Revista Galega de Filoloxía, 2: 27-51 (A Coruña: Baía Edicións).
Brisaboa, N. R. / Callón, C. / López, J. R. / Places, A. S. / Sanmartín, G. (2002): “Stemming Galician Texts”, en Laender, A. / Oliveira, A.: Proceedings of the 9th International Symposium, String Precessing and Information Retrieval (SPIRE’02) (Lisboa, 11/13-9-2002): 91-97 (Berlín: Springer-Verlag).
Crystal, D. (2000): Diccionario de Lingüística y Fonética (Barcelona: Octaedro).
Fernández, C. / Places, A. S. (2004): As bibliotecas dixitais (Santiago de Compostela: Laiovento).
Freixeiro, X. R. (1999): Gramática da Lingua Galega. III. Semántica (Vigo: A Nosa Terra).
Freixeiro, X. R. (2000): Gramática da Lingua Galega. II. Morfosintaxe (Vigo: A Nosa Terra).
Moreira, V. / Huyck, C. (2001): “A Stemming Algorithm for the Portuguese Language”, en Navarro, G.: Proceedings of the 8th International Symposium on String Processing and Information Retrieval (SPIRE’01) (Chile, 13/15-11-2001): 186-193 (USA: IEEE Computer Society).
Porter, M. (1980): http://www.tartarus.org/~martin/PorterStemmer.