Arabic NLP

In this project we used data scrapped from https://www.arab-books.com/

First, We scrapped books data from the website like Book Name, Author, Category, Pages’ number, Dar El-Nashr, Book Size, and Description.

Second, We cleaned the data from English words, numbers, and tags.

Third, We used an autocorrect Library "Hunspell-ar" to correct mismatcched words.

Fourth, we stemmed and lemmatized the data using Farassa, We tokenized, removed the stop words, lemmatized and stemmed our data.

Note: We have tried different lemmatizers and stemmers (e.g. Qalsadi, Snowball stemmer, ISRI stemmer and Madamira) but we found that Farassa has outperformed all of them on Arabic words.

Fifth, we train the model using RandomForestClassifier (RFC) on the features(description column) and labels(category column) after applying TF-IDF vectorizer then evaluated and tested the model on 30% of the dataset using new unseen data and achieved an accuracy of 73%.

Sixth, we created two Question-Answer Datasets, then we matched the input query with the questions in the two Datasets and retrieved the most relevant answer.

Seventh, we weighted all the words of All Descriptions of the books, we weighted the important keywords by calculating Tf-Idf for each one. Note: We removed all stop words from the Descriptions.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Arabic Dictionary		Arabic Dictionary
Csv Files		Csv Files
Questions DA		Questions DA
NLP(Phase1).ipynb		NLP(Phase1).ipynb
NLP_Phase_1&2.ipynb		NLP_Phase_1&2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic NLP

About

Releases

Packages

Languages

aehabV/Arabic-Book-Recommendation-and-Question-Answering-System

Folders and files

Latest commit

History

Repository files navigation

Arabic NLP

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages