copyright | lastupdated | subcollection | ||
---|---|---|---|---|
|
2022-09-06 |
natural-language-understanding |
{{site.data.keyword.attribute-definition-list}}
{: #classifications}
The custom classifications feature allows you to train a multi-label text classifier using your own labeled data. Once trained, the model will be automatically deployed in {{site.data.keyword.nlufull}} and available for analyze calls.
{: #create-classification-training-data}
Create and train a custom classifications model using the Natural Language Understanding training API. You can use this example Python notebook that shows how to create a classifications model, or the more advanced notebook that shows how to train and fine-tune your classifications model.
Classifications accepts training data in the following JSON format:
[
{
"text": "Example 1",
"labels": ["label1"]
},
{
"text": "Example 2",
"labels": ["label1", "label2"]
}
]
{: codeblock}
You can also provide training data in comma-separated value (CSV) format.
Example 1,label1
Example 2,label1,label2
In CSV format, a row in the file represents an example record. Each record has two or more columns. The first column is the representative text to classify. The additional columns are classes that apply to that text.
Headers are not expected for the CSV file. {: note}
{: #classification-training-data-requirements}
- Classifications training data consists of an array containing multiple JSON objects.
- Each of these JSON objects, needs to contain, 1
text
and 1labels
field. text
consists of the training examples andlabels
consists of 1 or more labels associated with an example.labels
are case-sensitive- Minimum number of unique labels required:
2
- Maximum number of unique labels allowed:
3000
- Minimum number of examples required per label:
5
- Maximum size of each example (training and predict):
2000
codepoints - Maximum number of examples:
20000
{: #classification-training-parameters}
Passing in the optional training_parameters
object allows you to specify characteristics of your classifier. Not passing in the object or an empty object into the request will train the model using default values.
Supported training parameters:
Keys | Default Value | Optional Values |
---|---|---|
model_type |
multi_label |
single_label |
Description:
model_type
: Passing thesingle_label
value will result in a single-label classifier, capable of handling training datasets with only one label per example. The single-label classifier will output normalized confidence scores so that the scores sum up to one. Passing themulti_label
value will result in a multi-label classifier, capable of handling training datasets with multiple labels per example. The multi-label classifier will not output normalized confidence scores, in order to account for the added flexibility of associating multiple labels with examples.
{: #training-a-custom-classification}
When your training data is ready, use the Create classifications model method to create your custom classifications model. Make sure to substitute your credentials for {apikey}
and {url}
, and use the path to your training data file in the training_data
parameter. Optionally, you can also specify characteristics of your classifier using training_parameters
.
curl -X POST -u "apikey:{apikey}" \
-H "Content-Type: multipart/form-data" \
-F "name=MyClassificationsModel" \
-F "language=en" \
-F "model_version=1.0.1" \
-F 'training_parameters={"model_type": "multi_label"}' \
-F "training_data=@classifications_data.json;type=application/json" \
"{url}/v1/models/classifications?version=2021-03-23"
{: pre}
Use the model_id
in the response to check the status of your model.
{: #checking-status-of-classifications}
The following sample request for the Get classifications model method checks the status for the classifications model with ID cb3755ad-d226-4587-b956-43a4a7202202
.
curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications/cb3755ad-d226-4587-b956-43a4a7202202?version=2021-03-23"
{: pre}
To get information for all classifications models deployed to your instance, use the List classifications models method.
curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications?version=2021-03-23"
{: pre}
When the status is available
, the classification is ready to use.
{: #analyzing-text-with-custom-classifications-models}
To use your classifications model, specify the model
that you deployed in the classifications{: external} options of your API request:
-
Example parameters.json file:
{ "url": "www.url.example", "features": { "classifications": { "model": "your-model-id-here" } } }
-
Example cURL request:
curl --request POST \ --header "Content-Type: application/json" \ --user "apikey":"{apikey}" \ "{url}/v1/analyze?version=2021-03-23" \ --data @parameters.json
{: pre}
{: #deleting-a-custom-classifications-model}
To delete a classifications model from your service instance, use the Delete classifications model method. Replace {url}
and {apikey}
with your service URL and API key, and replace {model_id}
with the model ID of the classifications model you want to delete.
-
The following example deletes a classification model.
curl --user "apikey":"{apikey}" \ "{url}/v1/models/classifications/{model_id}?version=2021-03-23" \ --request DELETE
{: pre}
{: #migrating-natural-language-classifier}
On 9 August 2021, IBM announced the deprecation of the {{site.data.keyword.nlclassifierfull}} service. The service will no longer be available from 8 August 2022. As of 9 September 2021, you can't create new instances, and access to free instances will be removed. Existing premium plan instances are supported until 8 August 2022. Any instance that still exists on that date will be deleted. As an alternative, we encourage {{site.data.keyword.nlclassifiershort}} users to consider migrating to the {{site.data.keyword.nlushort}} service.
You can directly use the available training data to train classifications
in {{site.data.keyword.nlushort}}. {{site.data.keyword.nlushort}} accepts the same CSV file format.
You can fetch the data you used to train {{site.data.keyword.nlclassifiershort}} from the service. Refer to this tutorial{: external}.