What is it about?
The corpus is about 56K words/tokens. Every word in the corpus was then manually annotated with a set of metadata attributes to describe the orthographical, morphological, and semantic features of the word such as part of speech, prefixes, stem, suffixes, dialect lemma, MSA lemma, CODA surface, gender, number, mode, and a gloss in English. Every word was annotated in context
Featured Image
Why is it important?
(i) Language learners can use it as a trilingual Palestinian-Standard Arabic-English lexicon (ii) Linguists can use it to for research purposes (iii) To develop IT applications. The dialectal content is rapidly increasing on the web, especially in the social media, and there are no computer applications currently available to process and understand this content, e.g., automatic translate, effective searching and retrieval, spell checking, speech recognition, and many others.
Perspectives
Read the Original
This page is a summary of: Curras: an annotated corpus for the Palestinian Arabic dialect, Language Resources and Evaluation, December 2016, Springer Science + Business Media,
DOI: 10.1007/s10579-016-9370-7.
You can read the full text:
Resources
Contributors
The following have contributed to this page