Enhancing amharic news classification through ontology-based feature fusion and logistic regression.
Sci Rep 2026 May 20. [Online ahead of print]

Abstract

Amharic, a major Semitic language of Ethiopia, remains underrepresented in natural language processing research due to limited linguistic resources. This study addresses the challenge of accurate text classification for low-resource languages by proposing a hybrid framework that integrates a semantically structured Amharic News Ontology (ANO) with traditional TF-IDF features. The ANO was systematically developed through a rigorous four-phase methodology to capture hierarchical relationships between news domain concepts. We formalize a feature fusion technique that combines lexical (TF-IDF) and ontological features into enriched document representations used to train a Logistic Regression classifier. Evaluated on a public dataset of 61,915 Amharic news articles across six categories, our ontology-enhanced model achieved 97.0% accuracy. Statistical analysis revealed a 3.2 percentage point improvement over a TF-IDF-only baseline (McNemar's test ([Formula: see text], [Formula: see text])), with ontology integration particularly effective in disambiguating semantically related categories (Politics-Business confusion reduced by 38%). The experimental results suggest that, for Amharic news classification on the evaluated dataset, integrating structured semantic knowledge through a domain-specific ontology improves classification accuracy by 3.2% compared to a TF-IDF baseline. While these findings demonstrate the potential of ontology-based feature fusion for low-resource languages, they are constrained to the news domain and the specific dataset used.

Authors+Show Affiliations

Taye BGDepartment of Information Technology and Artificial Intelligence, Debark University, Debark, Amhara, Ethiopia. bayile.getu@dku.edu.et.
Desta ABDepartment of Information Technology and Artificial Intelligence, Debark University, Debark, Amhara, Ethiopia.
Alene AYDepartment of Information Technology and Artificial Intelligence, Debark University, Debark, Amhara, Ethiopia.

Pub Type(s)

Journal Article

Language

eng

PubMed ID

42162112