Using Monolingual Clickthrough Data to Build Crosslingual Search Systems


Abstract:

A major portion of the World Wide Web(WWW) is still dominated by a few languages, with English being on the top. Monolingual information retrieval systems have been setup for such languages and are widely in use. To cater to a wider and diverse language speaking web users, we need Cross Lingual Information Retrieval (CLIR) systems that are capable of receiving a query in one language and returning results from a different language. To our knowledge not much work has been done in creating CLIR systems on the WWW. This is partly due to the unavailability of bilingual resources required for a major portion of the languages that are still a minority language in terms of the documents present on the WWW. Another important reason being the time and effort required to create a practical and useful CLIR system. In this paper, we address the problem of creating CLIR systems for language pairs in which the source language is a minority language and the target language is a majority language with existing search engines. We use clickthrough data from a monolingual search engine to learn translation models that could be used to perform cross lingual search. This approach has enabled us to generate practical CLIR systems on a large scale with less effort and with bilingual resources. We experiment and report the evaluation of our approach by creating CLIR systems for an Indian language and a few other European Languages.