Revue de l'Information Scientifique et Technique
Volume 24, Numéro 1, Pages 66-83
2019-05-22

A Pivot Language Based Approach To Multilingual Document Representation And Information Retrieval Including Arabic

Authors : Boucham Souhila .

Abstract

Arabic language has become an increasing interest in the field of Multilingual Information Retrieval (MIR). We deal in this work with the problem of Information Retrieval in a trilingual containing corpus documents in Arabic, French and English languages. We propose a language independent approach based on a pivot language. The proposed approach combines a surface analysis and the Latent Semantic Analysis (LSA) statistical algorithm in a new way to break the terms of LSA down into units which correspond more closely to morphemes. These morphemes are the variable length character n-gram candidates extracted from different fragments separated by borders. The obtained results are encouraging and competitive with state of the art results in multilingual field.

Keywords

multilingual document representation ; multilingual information retrieval including Arabic ; virtual document ; principle of border ; fragments and variable length character n-grams ; parallel corpus ; surface analysis and the LSA statistical algorithm ; concept types ; pivot language