Section 3 improvements

huggingface · Nov 20, 2024 · 5b03fdd · 5b03fdd
1 parent 4d47d89
commit 5b03fdd
Showing 1 changed file with 17 additions and 5 deletions.
diff --git a/chapters/en/chapter10/3.mdx b/chapters/en/chapter10/3.mdx
@@ -2,6 +2,13 @@
 
 Depending on the NLP task that you're working with and the specific use case or application, your data and the annotation task will look differently. For this section of the course, we'll use [a dataset collecting news](https://huggingface.co/datasets/SetFit/ag_news) to complete two tasks: a text classification on the topic of each text and a token classification to identify the named entities mentioned.
 
+<iframe
+  src="https://huggingface.co/datasets/SetFit/ag_news/embed/viewer/default/train"
+  frameborder="0"
+  width="100%"
+  height="560px"
+></iframe>
+
 It is possible to import datasets from the Hub using the Argilla UI directly, but we'll be using the SDK to learn how we can make further edits to the data if needed.
 
 ## Configure your dataset
@@ -25,15 +32,17 @@ We can now think about the settings of our dataset in Argilla. These represent t
 ```python
 data = load_dataset("SetFit/ag_news", split="train")
 data.features()
-````
+```
+
+These are the features of our dataset:
 
 ```python out
 {'text': Value(dtype='string', id=None),
  'label': Value(dtype='int64', id=None),
  'label_text': Value(dtype='string', id=None)}
 ```
 
-Our dataset contains a `text` and also some initial labels for the text classification. We'll add those to our dataset settings together with a `spans` question for the named entities:
+It contains a `text` and also some initial labels for the text classification. We'll add those to our dataset settings together with a `spans` question for the named entities:
 
 ```python
 settings = rg.Settings(
@@ -58,8 +67,9 @@ settings = rg.Settings(
 
 Let's dive a bit deeper into what these settings mean. First, we've defined **fields**, these include the information that we'll be annotating. In this case, we only have one field and it comes in the form of a text, so we've choosen a `TextField`.
 
-Then, we define **questions** that represent the tasks that we want to perform on our data: 
-- For the text classification task we've chosen a `LabelQuestion` and we used the unique values of the `label_text` column as our labels, to make sure that the question is compatible with the labels that already exist in the dataset. 
+Then, we define **questions** that represent the tasks that we want to perform on our data:
+
+- For the text classification task we've chosen a `LabelQuestion` and we used the unique values of the `label_text` column as our labels, to make sure that the question is compatible with the labels that already exist in the dataset.
 - For the token classification task, we'll need a `SpanQuestion`. We've defined a set of labels that we'll be using for that task, plus the field on which we'll be drawing the spans.
 
 To learn more about all the available types of fields and questions and other advanced settings, like metadata and vectors, go to the [Argilla docs](https://docs.argilla.io/latest/how_to_guides/dataset/#define-dataset-settings).
@@ -83,4 +93,6 @@ The dataset now appears in our Argilla instance, but you will see that it's empt
 dataset.records.log(data, mapping={"label_text": "label"})
 ```
 
-Now your dataset is ready to start annotating!
+In our mapping, we've specified that the `label_text` column in the dataset should be mapped to the question with the name `label`. In this way, we'll use the existing labels in the dataset as pre-annotations so we can annotate faster.
+
+Now our dataset is ready to start annotating!