Picture of boy being examining by doctor at a tuberculosis sanatorium

Understanding our future through Open Access research about our past...

Strathprints makes available scholarly Open Access content by researchers in the Centre for the Social History of Health & Healthcare (CSHHH), based within the School of Humanities, and considered Scotland's leading centre for the history of health and medicine.

Research at CSHHH explores the modern world since 1800 in locations as diverse as the UK, Asia, Africa, North America, and Europe. Areas of specialism include contraception and sexuality; family health and medical services; occupational health and medicine; disability; the history of psychiatry; conflict and warfare; and, drugs, pharmaceuticals and intoxicants.

Explore the Open Access research of the Centre for the Social History of Health and Healthcare. Or explore all of Strathclyde's Open Access research...

Image: Heart of England NHS Foundation Trust. Wellcome Collection - CC-BY.

Building bilingual dictionaries from parallel web documents

McEwan, C.J.A. and Ounis, I. and Ruthven, I. (2002) Building bilingual dictionaries from parallel web documents. In: Advances in Information Retrieval. Lecture Notes in Computer Science, 2291 . Springer, Germany, pp. 303-323. ISBN 978-3-540-43343-9

[img]
Preview
Text (strathprints002464)
strathprints002464.pdf
Accepted Author Manuscript

Download (324kB) | Preview

Abstract

In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.