What is it about?

The corpus is about 56K words/tokens. Every word in the corpus was then manually annotated with a set of metadata attributes to describe the orthographical, morphological, and semantic features of the word such as part of speech, prefixes, stem, suffixes, dialect lemma, MSA lemma, CODA surface, gender, number, mode, and a gloss in English. Every word was annotated in context

Featured Image

Why is it important?

(i) Language learners can use it as a trilingual Palestinian-Standard Arabic-English lexicon (ii) Linguists can use it to for research purposes (iii) To develop IT applications. The dialectal content is rapidly increasing on the web, especially in the social media, and there are no computer applications currently available to process and understand this content, e.g., automatic translate, effective searching and retrieval, spell checking, speech recognition, and many others.

Perspectives

The annotations tags are compatible with LDC Arabic tags, was done very high accuracy, and it is can be searched and downloaded from http://portal.sina.birzeit.edu/curras/

Dr Mustafa Jarrar
Birzeit University

Read the Original

This page is a summary of: Curras: an annotated corpus for the Palestinian Arabic dialect, Language Resources and Evaluation, December 2016, Springer Science + Business Media,
DOI: 10.1007/s10579-016-9370-7.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page