AL-Lisaniyyat
Volume 17, Numéro 2, Pages 1-10
2011-12-17
Authors : Abdelali Ahmed . Cowie Jim .
Until recently, only two Arabic corpora were commonly available for researchers: the Agence France-Presse (AFP) Arabic newswire from Linguistic Data Consortium (LDC) and the Al-Harm' newspaper collection from the European Language Resources Distribution Agency (ELDA). The availability of a suitable corpus is a key ,for much objective research in language engineering or any other Natural Language-related This paper presents experimental results of comparing corpora. for Modern Standard Arabic IMSA) collected from samples of online published newspapers from different Arabic countries. The results of the experiments show significant differences in vocabulary and styles within different regions. Comprehensives studies of these differences will allow more understanding fOr the language and has implications on different computational and linguistic related research. Developing adequate resources is more crucial than ever to carry this task further
Modern Standard Arabic (MSA), Language variation.
Bourouina El-hadj
.
pages 511-522.
Hami Nadjia
.
Abdelfettah Ahcène
.
pages 120-131.
Chouaf Aicha
.
Ferhani Fatma-fatiha
.
pages 681-702.
Ibrahim Abushihab
.
pages 33-63.
Ferrat Kamel
.
pages 71-88.