Skip to content

Commit

Permalink
Datasets Link Updated
Browse files Browse the repository at this point in the history
  • Loading branch information
JUSTSUJAY committed Aug 30, 2024
1 parent 2d9832e commit 9a00240
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 35 deletions.
7 changes: 4 additions & 3 deletions Notebooks/06_LDA_TopicModelling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@
"2. [Preprocessing](./02_Pre_Processing.ipynb)\n",
"3. [Bag of Words and Similarity](./03_BOW_Similarity.ipynb)\n",
"4. [TF-IDF and Document Search](./04_TFIDF_DocSearch.ipynb)\n",
"5. [Naive Bayes Text Classification](./05_NaiveBayes_TextClf.ipynb)\n",
"5. [Naive Bayes Text Classification](./05_NaiveBayes_TextClf.ipynb) \n",
"6. LDA Topic Modelling\n",
"[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/JUSTSUJAY/NLP_One_Shot/blob/2d9832ee4c75997f0f13fab528d81f9b2164804b/Notebooks/07_Word_Embeddings.ipynb)\n",
"\n",
"## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">1.2 Outline</p>\n",
"\n",
Expand Down Expand Up @@ -315,10 +316,10 @@
"## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">3.1 Topic Modelling</p>\n",
"\n",
"We're going to use a library called **Gensim** and its implementation of LDA for topic modelling. \n",
"\n",
"<br>\n",
"<br>\n",
"\n",
"* Spotify App Reviews - [Link](https://www.kaggle.com/datasets/mfaaris/spotify-app-reviews-2022)\n",
"\n",
"**Import libraries**"
]
},
Expand Down
4 changes: 3 additions & 1 deletion Notebooks/07_Word_Embeddings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -254,9 +254,11 @@
"## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">4.1 Word vectors</p>\n",
"\n",
"We're going to be using **pre-trained word vectors** via the **Gensim** library, which we came across last time. These particular vectors are known as the **Google News vectors** as they were trained on a 3 billion word Google News corpus in 2015. In total, there are **3 million, 300-dimension vectors**. \n",
"\n",
"<br>\n",
"\n",
"* GoogleNews-vectors-negative300 - [Link](https://www.kaggle.com/datasets/leadbest/googlenewsvectorsnegative300)\n",
"* Spotify App Reviews - [Link](https://www.kaggle.com/datasets/mfaaris/spotify-app-reviews-2022)\n",
"\n",
"**Import libraries**"
]
},
Expand Down
4 changes: 3 additions & 1 deletion Notebooks/08_RNNs_LMs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -295,9 +295,11 @@
"## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">5.1 Part-of-Speech Tagging</p>\n",
"\n",
"We're going to build a **bidirectional LSTM** to perform **Part-of-Speech (PoS) tagging**. We'll use datasets from nltk to train our model\n",
"\n",
"<br>\n",
"\n",
"* Shakespeare Text - [Link](https://www.kaggle.com/datasets/adarshpathak/shakespeare-text)\n",
"\n",
"\n",
"**Import libraries**"
]
},
Expand Down
34 changes: 4 additions & 30 deletions Notebooks/10_Transformers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,9 @@
"\n",
"## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">3.1 Transformers from scratch</p>\n",
"\n",
"To reinforce our understanding of transformers, lets write the code to **implement one from scratch**. We wont't be able to train it on our own but we will later see how was can use **transfer learning** to fine tuning a pre-trained transformer for our applications. "
"To reinforce our understanding of transformers, lets write the code to **implement one from scratch**. We wont't be able to train it on our own but we will later see how was can use **transfer learning** to fine tuning a pre-trained transformer for our applications. \n",
"\n",
"* Sentiment Analysis Company Reviews - [Link](https://www.kaggle.com/competitions/sentiment-analysis-company-reviews/code)"
]
},
{
Expand Down Expand Up @@ -832,8 +834,7 @@
"**Load data**\n",
"\n",
"<br>\n",
"\n",
"The dataset is from my beginner friendly [NLP competition on sentiment analysis](https://www.kaggle.com/competitions/sentiment-analysis-company-reviews) I am currently hosting. "
"\n"
]
},
{
Expand Down Expand Up @@ -2034,33 +2035,6 @@
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "bb0a61c3",
"metadata": {
"execution": {
"iopub.execute_input": "2023-02-21T11:19:48.826027Z",
"iopub.status.busy": "2023-02-21T11:19:48.825082Z",
"iopub.status.idle": "2023-02-21T11:19:48.864550Z",
"shell.execute_reply": "2023-02-21T11:19:48.863684Z"
},
"papermill": {
"duration": 0.055434,
"end_time": "2023-02-21T11:19:48.866682",
"exception": false,
"start_time": "2023-02-21T11:19:48.811248",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"# Save predictions to csv\n",
"sub[\"Rating\"] = test_preds+1\n",
"sub.to_csv(\"submission.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"id": "b6f6798c",
Expand Down

0 comments on commit 9a00240

Please sign in to comment.