Arabic named entity operational recognition system

Shiekha Ali Karam; Ali Jaoua; Samir Elloumi

doi:10.5339/qfarf.2012.CSP37

Abstract

Extracting named entities is an important step for information extraction from a text, based on a given ontology. Dealing with Arabic language invokes an additional number of challenges compared to English, French and other languages within similar families. The major difficulties involve complex morphological systems, no capitalization, and no standardization of Arabic writing. The Arabic language has a rich and complex morphological landscape due to its highly inflected nature. Usually, any Arabic lemma word can be constructed using different internal structure, prefixes and suffixes. Furthermore, there is no standardization of Arabic writing because of the spelling inconsistency of Arabic words. In this work, we propose an operational hybrid approach combining dictionary-based and rule-based detection for extracting seven categories of named entities which are organization by name, date, interval, price/value, percentage, currency and unit. The dictionary-based approach performs exact or approximate matching of the words with prepared Arabic organization names. In case of non-exact matching with the dictionary words, the approximate matching is an efficient solution for morphological difficulties. Specificities of Arabic language are also processed by rule-based detection, which is based on capturing the entities patterns in terms of regular expressions or patterns provided by experts. We evaluated our Arabic name entity recognition system using financial news articles and we obtained around an 80% of recognition rate.

oa Arabic named entity operational recognition system

Abstract

Metrics

Most Read This Month

Most Cited Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Multiple organ dysfunction syndrome: Contemporary insights on the clinicopathological spectrum

Prevalence of Multi-Antibiotic Resistant Escherichia coli and Klebsiella species obtained from a Tertiary Medical Institution in Oyo State, Nigeria

Effect of green marketing on consumer purchase behavior

Evolution of emergency medical services in Saudi Arabia