Datasets Link Updated

JUSTSUJAY · Aug 30, 2024 · 9a00240 · 9a00240
1 parent 2d9832e
commit 9a00240
Show file tree

Hide file tree

Showing 4 changed files with 14 additions and 35 deletions.
diff --git a/Notebooks/06_LDA_TopicModelling.ipynb b/Notebooks/06_LDA_TopicModelling.ipynb
@@ -30,8 +30,9 @@
     "2. [Preprocessing](./02_Pre_Processing.ipynb)\n",
     "3. [Bag of Words and Similarity](./03_BOW_Similarity.ipynb)\n",
     "4. [TF-IDF and Document Search](./04_TFIDF_DocSearch.ipynb)\n",
-    "5. [Naive Bayes Text Classification](./05_NaiveBayes_TextClf.ipynb)\n",
+    "5. [Naive Bayes Text Classification](./05_NaiveBayes_TextClf.ipynb) \n",
     "6. LDA Topic Modelling\n",
+    "[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/JUSTSUJAY/NLP_One_Shot/blob/2d9832ee4c75997f0f13fab528d81f9b2164804b/Notebooks/07_Word_Embeddings.ipynb)\n",
     "\n",
     "## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">1.2 Outline</p>\n",
     "\n",
@@ -315,10 +316,10 @@
     "## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">3.1 Topic Modelling</p>\n",
     "\n",
     "We're going to use a library called **Gensim** and its implementation of LDA for topic modelling. \n",
-    "\n",
-    "<br>\n",
     "<br>\n",
     "\n",
+    "* Spotify App Reviews - [Link](https://www.kaggle.com/datasets/mfaaris/spotify-app-reviews-2022)\n",
+    "\n",
     "**Import libraries**"
    ]
   },

diff --git a/Notebooks/07_Word_Embeddings.ipynb b/Notebooks/07_Word_Embeddings.ipynb
@@ -254,9 +254,11 @@
     "## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">4.1 Word vectors</p>\n",
     "\n",
     "We're going to be using **pre-trained word vectors** via the **Gensim** library, which we came across last time. These particular vectors are known as the **Google News vectors** as they were trained on a 3 billion word Google News corpus in 2015. In total, there are **3 million, 300-dimension vectors**. \n",
-    "\n",
     "<br>\n",
     "\n",
+    "* GoogleNews-vectors-negative300 - [Link](https://www.kaggle.com/datasets/leadbest/googlenewsvectorsnegative300)\n",
+    "* Spotify App Reviews - [Link](https://www.kaggle.com/datasets/mfaaris/spotify-app-reviews-2022)\n",
+    "\n",
     "**Import libraries**"
    ]
   },

diff --git a/Notebooks/08_RNNs_LMs.ipynb b/Notebooks/08_RNNs_LMs.ipynb
@@ -295,9 +295,11 @@
     "## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">5.1 Part-of-Speech Tagging</p>\n",
     "\n",
     "We're going to build a **bidirectional LSTM** to perform **Part-of-Speech (PoS) tagging**. We'll use datasets from nltk to train our model\n",
-    "\n",
     "<br>\n",
     "\n",
+    "* Shakespeare Text - [Link](https://www.kaggle.com/datasets/adarshpathak/shakespeare-text)\n",
+    "\n",
+    "\n",
     "**Import libraries**"
    ]
   },

diff --git a/Notebooks/10_Transformers.ipynb b/Notebooks/10_Transformers.ipynb
@@ -237,7 +237,9 @@
     "\n",
     "## <p style=\"font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;\">3.1 Transformers from scratch</p>\n",
     "\n",
-    "To reinforce our understanding of transformers, lets write the code to **implement one from scratch**. We wont't be able to train it on our own but we will later see how was can use **transfer learning** to fine tuning a pre-trained transformer for our applications. "
+    "To reinforce our understanding of transformers, lets write the code to **implement one from scratch**. We wont't be able to train it on our own but we will later see how was can use **transfer learning** to fine tuning a pre-trained transformer for our applications. \n",
+    "\n",
+    "* Sentiment Analysis Company Reviews - [Link](https://www.kaggle.com/competitions/sentiment-analysis-company-reviews/code)"
    ]
   },
   {
@@ -832,8 +834,7 @@
     "**Load data**\n",
     "\n",
     "<br>\n",
-    "\n",
-    "The dataset is from my beginner friendly [NLP competition on sentiment analysis](https://www.kaggle.com/competitions/sentiment-analysis-company-reviews) I am currently hosting. "
+    "\n"
    ]
   },
   {
@@ -2034,33 +2035,6 @@
     "plt.show()"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "bb0a61c3",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2023-02-21T11:19:48.826027Z",
-     "iopub.status.busy": "2023-02-21T11:19:48.825082Z",
-     "iopub.status.idle": "2023-02-21T11:19:48.864550Z",
-     "shell.execute_reply": "2023-02-21T11:19:48.863684Z"
-    },
-    "papermill": {
-     "duration": 0.055434,
-     "end_time": "2023-02-21T11:19:48.866682",
-     "exception": false,
-     "start_time": "2023-02-21T11:19:48.811248",
-     "status": "completed"
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "# Save predictions to csv\n",
-    "sub[\"Rating\"] = test_preds+1\n",
-    "sub.to_csv(\"submission.csv\", index=False)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "b6f6798c",