2nd International Conference on Computers and Information, Menoufia University, Egypt
Intersected Consecutive Words proposed method for enhancing Arabic Text Classification
Paper ID : 1026-ICCI2021 (R1)
Authors:
mostafa sayed abdelhameed *1, Hatem Abdelkader2, Rashed Salem3
1Mostafa Sayed Faculty of Computers and Artifical Intellgence Beni-Suef University, Egypt. mostafasayed@fcis.bsu.edu.eg
2Information Systems, Faculty of Computers and Information, Menoufia university
3Faculty of Computers and Information, Menoufia University
Abstract:
Classifying text that is written in Arabic format which considered as a classification problem is a significant challenge. This research aims to improve a framework for Arabic text classification. It deals directly with a word in its originality style as a basic unit of modern Arabic sentence and on different level of N-grams. Besides, we proposed novel intersected consecutive words that depend on Arabic sentence structure for comparing with N-grams. We used a fresh dataset of real canonical persecutors requests with modern Arabic context. This dataset has two classes in the field of canonical requests. We adapted a term frequency and a term frequency- inverse document frequency as mutual information for feature vector representation method. Besides, a k-fold validation method applied for splitting the data set into k-folds or groups for more enhancing to the trained model. This research evaluates the outcomes of classification via implementing K-Nearest Neighbor and Support Vector Machines algorithm. Different results discussed and concluded through the experiment analysis section.
Keywords:
Arabic Text Classification, Machine Learning techniques, Arabic text mining, KNN, SVM;
Status : Paper Accepted