AL-Lisaniyyat
Volume 17, Numéro 2, Pages 1-10
2011-12-17

Regional Corpus Of Modern Standard Arabic

Authors : Abdelali Ahmed . Cowie Jim .

Abstract

Until recently, only two Arabic corpora were commonly available for researchers: the Agence France-Presse (AFP) Arabic newswire from Linguistic Data Consortium (LDC) and the Al-Harm' newspaper collection from the European Language Resources Distribution Agency (ELDA). The availability of a suitable corpus is a key ,for much objective research in language engineering or any other Natural Language-related This paper presents experimental results of comparing corpora. for Modern Standard Arabic IMSA) collected from samples of online published newspapers from different Arabic countries. The results of the experiments show significant differences in vocabulary and styles within different regions. Comprehensives studies of these differences will allow more understanding fOr the language and has implications on different computational and linguistic related research. Developing adequate resources is more crucial than ever to carry this task further

Keywords

Modern Standard Arabic (MSA), Language variation.