Intersected Consecutive Words proposed method for enhancing Arabic Text Classification |
Paper ID : 1026-ICCI2021 (R1) |
Authors: |
mostafa sayed abdelhameed *1, Hatem Abdelkader2, Rashed Salem3 1Mostafa Sayed
Faculty of Computers and Artifical Intellgence Beni-Suef University, Egypt.
mostafasayed@fcis.bsu.edu.eg 2Information Systems, Faculty of Computers and Information, Menoufia university 3Faculty of Computers and Information, Menoufia University |
Abstract: |
Classifying text that is written in Arabic format which considered as a classification problem is a significant challenge. This research aims to improve a framework for Arabic text classification. It deals directly with a word in its originality style as a basic unit of modern Arabic sentence and on different level of N-grams. Besides, we proposed novel intersected consecutive words that depend on Arabic sentence structure for comparing with N-grams. We used a fresh dataset of real canonical persecutors requests with modern Arabic context. This dataset has two classes in the field of canonical requests. We adapted a term frequency and a term frequency- inverse document frequency as mutual information for feature vector representation method. Besides, a k-fold validation method applied for splitting the data set into k-folds or groups for more enhancing to the trained model. This research evaluates the outcomes of classification via implementing K-Nearest Neighbor and Support Vector Machines algorithm. Different results discussed and concluded through the experiment analysis section. |
Keywords: |
Arabic Text Classification, Machine Learning techniques, Arabic text mining, KNN, SVM; |
Status : Paper Accepted |