Building bilingual dictionaries from parallel web documents
McEwan, C.J.A. and Ounis, I. and Ruthven, I.; Crestani, F. and Lalmas, M., eds. (2002) Building bilingual dictionaries from parallel web documents. In: Advances in Information Retrieval. Lecture Notes in Computer Science, 2291 . Springer, Germany, pp. 303-323. ISBN 978-3-540-43343-9 (https://doi.org/10.1007/3-540-45886-7_20)
Preview |
Text.
Filename: strathprints002464.pdf
Accepted Author Manuscript Download (324kB)| Preview |
Abstract
In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.
ORCID iDs
McEwan, C.J.A., Ounis, I. and Ruthven, I. ORCID: https://orcid.org/0000-0001-6669-5376; Crestani, F. and Lalmas, M.-
-
Item type: Book Section ID code: 2464 Dates: DateEvent27 March 2002PublishedSubjects: Science > Mathematics > Electronic computers. Computer science Department: Faculty of Science > Computer and Information Sciences Depositing user: Strathprints Administrator Date deposited: 23 Jan 2007 Last modified: 11 Nov 2024 14:30 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/2464