Skip to content

Latest commit

 

History

History
177 lines (127 loc) · 7.57 KB

custom-class.md

File metadata and controls

177 lines (127 loc) · 7.57 KB
copyright lastupdated subcollection
years
2015, 2022
2022-09-06
natural-language-understanding

{{site.data.keyword.attribute-definition-list}}

Creating custom classification models

{: #classifications}

The custom classifications feature allows you to train a multi-label text classifier using your own labeled data. Once trained, the model will be automatically deployed in {{site.data.keyword.nlufull}} and available for analyze calls.

Creating classifications model training data

{: #create-classification-training-data}

Create and train a custom classifications model using the Natural Language Understanding training API. You can use this example Python notebook that shows how to create a classifications model, or the more advanced notebook that shows how to train and fine-tune your classifications model.

Training data in JSON format

Classifications accepts training data in the following JSON format:

[
  {
    "text": "Example 1",
    "labels": ["label1"]
  },
  {
    "text": "Example 2",
    "labels": ["label1", "label2"]
  }
]

{: codeblock}

Training data in CSV format

You can also provide training data in comma-separated value (CSV) format.

Example 1,label1
Example 2,label1,label2

In CSV format, a row in the file represents an example record. Each record has two or more columns. The first column is the representative text to classify. The additional columns are classes that apply to that text.

Headers are not expected for the CSV file. {: note}

Classifications training data requirements

{: #classification-training-data-requirements}

  • Classifications training data consists of an array containing multiple JSON objects.
  • Each of these JSON objects, needs to contain, 1 text and 1 labels field.
  • text consists of the training examples and labels consists of 1 or more labels associated with an example.
  • labels are case-sensitive
  • Minimum number of unique labels required: 2
  • Maximum number of unique labels allowed: 3000
  • Minimum number of examples required per label: 5
  • Maximum size of each example (training and predict): 2000 codepoints
  • Maximum number of examples: 20000

Classifications training parameters

{: #classification-training-parameters}

Passing in the optional training_parameters object allows you to specify characteristics of your classifier. Not passing in the object or an empty object into the request will train the model using default values.

Supported training parameters:

Keys Default Value Optional Values
model_type multi_label single_label

Description:

  • model_type: Passing the single_label value will result in a single-label classifier, capable of handling training datasets with only one label per example. The single-label classifier will output normalized confidence scores so that the scores sum up to one. Passing the multi_label value will result in a multi-label classifier, capable of handling training datasets with multiple labels per example. The multi-label classifier will not output normalized confidence scores, in order to account for the added flexibility of associating multiple labels with examples.

Training a custom classifications model

{: #training-a-custom-classification}

When your training data is ready, use the Create classifications model method to create your custom classifications model. Make sure to substitute your credentials for {apikey} and {url}, and use the path to your training data file in the training_data parameter. Optionally, you can also specify characteristics of your classifier using training_parameters.

curl -X POST -u "apikey:{apikey}" \
-H "Content-Type: multipart/form-data" \
-F "name=MyClassificationsModel" \
-F "language=en" \
-F "model_version=1.0.1" \
-F 'training_parameters={"model_type": "multi_label"}' \
-F "training_data=@classifications_data.json;type=application/json" \
"{url}/v1/models/classifications?version=2021-03-23"

{: pre}

Use the model_id in the response to check the status of your model.

Checking the status of a classifications model

{: #checking-status-of-classifications}

The following sample request for the Get classifications model method checks the status for the classifications model with ID cb3755ad-d226-4587-b956-43a4a7202202.

curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications/cb3755ad-d226-4587-b956-43a4a7202202?version=2021-03-23"

{: pre}

To get information for all classifications models deployed to your instance, use the List classifications models method.

curl -X GET -u "apikey:{apikey}" \
"{url}/v1/models/classifications?version=2021-03-23"

{: pre}

When the status is available, the classification is ready to use.

Analyzing text with a custom classifications model

{: #analyzing-text-with-custom-classifications-models}

To use your classifications model, specify the model that you deployed in the classifications{: external} options of your API request:

  • Example parameters.json file:

    {
      "url": "www.url.example",
      "features": {
        "classifications": {
          "model": "your-model-id-here"
        }
      }
    }
  • Example cURL request:

    curl --request POST \
    --header "Content-Type: application/json" \
    --user "apikey":"{apikey}" \
    "{url}/v1/analyze?version=2021-03-23" \
    --data @parameters.json

    {: pre}

Deleting a custom classifications model

{: #deleting-a-custom-classifications-model}

To delete a classifications model from your service instance, use the Delete classifications model method. Replace {url} and {apikey} with your service URL and API key, and replace {model_id} with the model ID of the classifications model you want to delete.

  • The following example deletes a classification model.

    curl --user "apikey":"{apikey}" \
    "{url}/v1/models/classifications/{model_id}?version=2021-03-23" \
    --request DELETE

    {: pre}

Migrating from {{site.data.keyword.nlclassifiershort}} to {{site.data.keyword.nlushort}}

{: #migrating-natural-language-classifier}

On 9 August 2021, IBM announced the deprecation of the {{site.data.keyword.nlclassifierfull}} service. The service will no longer be available from 8 August 2022. As of 9 September 2021, you can't create new instances, and access to free instances will be removed. Existing premium plan instances are supported until 8 August 2022. Any instance that still exists on that date will be deleted. As an alternative, we encourage {{site.data.keyword.nlclassifiershort}} users to consider migrating to the {{site.data.keyword.nlushort}} service.

When training data is available

You can directly use the available training data to train classifications in {{site.data.keyword.nlushort}}. {{site.data.keyword.nlushort}} accepts the same CSV file format.

When training data is not available

You can fetch the data you used to train {{site.data.keyword.nlclassifiershort}} from the service. Refer to this tutorial{: external}.