Abstract
Modern Standard Arabic (MSA) is typically written without short vowels, which helps in clarifying the sense and meaning of the word. The short vowels are omitted since experienced Arabic readers can infer the meaning through the context. But there are cases where even the native Arabic speakers cannot resolve. The process of restoring the diacritical marks (short vowels) is known as diacritization. Most of the developed algorithms for diacritization fully restores all the markings, many of which are trivial or unnecessary. In this paper, we present a system that restores the diacritical markings where it is mostly needed, resolving the ambiguity. This is a more challenging problem than fully restoring all the diacritics. The system combines morphological analyzers and context similarities. The goal of the morphological analyzers is to generate all word candidates for the diacritics, and the model eliminates word ambiguity through a statistical approach and context similarities. Out of 80 paragraphs our system resolved 57 cases.
| Original language | English |
|---|---|
| Pages (from-to) | 169-174 |
| Number of pages | 6 |
| Journal | Procedia Computer Science |
| Volume | 117 |
| DOIs | |
| State | Published - 2017 |
| Externally published | Yes |
| Event | 3rd International Conference on Arabic Computational Linguistics, ACLing 2017 - Dubai, United Arab Emirates Duration: 5 Nov 2017 → 6 Nov 2017 |
Keywords
- Ambiguity
- Arabic language
- Automatic vowelization
- Diacritization
- Morphological analysis
- Statistical methods