Amharic, a major Semitic language of Ethiopia, remains underrepresented in natural language processing research due to limited linguistic resources. This study addresses the challenge of accurate text classification for low-resource languages by proposing a hybrid framework that integrates a semantically structured Amharic News Ontology (ANO) with traditional TF-IDF features. The ANO was systematically developed through a rigorous four-phase methodology to capture hierarchical relationships between news domain concepts. We formalize a feature fusion technique that combines lexical (TF-IDF) and ontological features into enriched document representations used to train a Logistic Regression classifier. Evaluated on a public dataset of 61,915 Amharic news articles across six categories, our ontology-enhanced model achieved 97.0% accuracy. Statistical analysis revealed a 3.2 percentage point improvement over a TF-IDF-only baseline (McNemar's test ([Formula: see text], [Formula: see text])), with ontology integration particularly effective in disambiguating semantically related categories (Politics-Business confusion reduced by 38%). The experimental results suggest that, for Amharic news classification on the evaluated dataset, integrating structured semantic knowledge through a domain-specific ontology improves classification accuracy by 3.2% compared to a TF-IDF baseline. While these findings demonstrate the potential of ontology-based feature fusion for low-resource languages, they are constrained to the news domain and the specific dataset used.
Abstract
Journal Article
eng
42162112
Taye, Bayile Getu, et al. "Enhancing Amharic News Classification Through Ontology-based Feature Fusion and Logistic Regression." Scientific Reports, 2026.
Taye BG, Desta AB, Alene AY. Enhancing amharic news classification through ontology-based feature fusion and logistic regression. Sci Rep. 2026.
Taye, B. G., Desta, A. B., & Alene, A. Y. (2026). Enhancing amharic news classification through ontology-based feature fusion and logistic regression. Scientific Reports. https://doi.org/10.1038/s41598-026-53541-0
Taye BG, Desta AB, Alene AY. Enhancing Amharic News Classification Through Ontology-based Feature Fusion and Logistic Regression. Sci Rep. 2026 May 20; PubMed PMID: 42162112.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR
T1 - Enhancing amharic news classification through ontology-based feature fusion and logistic regression.
AU - Taye,Bayile Getu,
AU - Desta,Abinet Bizuayehu,
AU - Alene,Abrham Yaregal,
Y1 - 2026/05/20/
PY - 2025/11/28/received
PY - 2026/05/12/accepted
PY - 2026/5/21/medline
PY - 2026/5/21/pubmed
PY - 2026/5/20/entrez
KW - Amharic NLP
KW - Feature fusion
KW - Logistic regression
KW - Low-resource languages
KW - Ontology
KW - Semantic integration
KW - Text classification
JF - Scientific reports
JO - Sci Rep
N2 - Amharic, a major Semitic language of Ethiopia, remains underrepresented in natural language processing research due to limited linguistic resources. This study addresses the challenge of accurate text classification for low-resource languages by proposing a hybrid framework that integrates a semantically structured Amharic News Ontology (ANO) with traditional TF-IDF features. The ANO was systematically developed through a rigorous four-phase methodology to capture hierarchical relationships between news domain concepts. We formalize a feature fusion technique that combines lexical (TF-IDF) and ontological features into enriched document representations used to train a Logistic Regression classifier. Evaluated on a public dataset of 61,915 Amharic news articles across six categories, our ontology-enhanced model achieved 97.0% accuracy. Statistical analysis revealed a 3.2 percentage point improvement over a TF-IDF-only baseline (McNemar's test ([Formula: see text], [Formula: see text])), with ontology integration particularly effective in disambiguating semantically related categories (Politics-Business confusion reduced by 38%). The experimental results suggest that, for Amharic news classification on the evaluated dataset, integrating structured semantic knowledge through a domain-specific ontology improves classification accuracy by 3.2% compared to a TF-IDF baseline. While these findings demonstrate the potential of ontology-based feature fusion for low-resource languages, they are constrained to the news domain and the specific dataset used.
SN - 2045-2322
UR - https://www.unboundmedicine.com/prime/citation/42162112/Enhancing_amharic_news_classification_through_ontology-based_feature_fusion_and_logistic_regression.
DB - PRIME
DP - Unbound Medicine
ER -


