Batches

A batch represents a collection of recipes that should be scheduled for re-processing. The batch definition is a JSON document that defines exactly which recipes will be included in the batch. It will describe things like a date range and specific jobs within a recipe type to queue for execution. It also allows new recipes to be created for existing source files that have never been queued for the recipe type before using the current recipe type trigger rule or a custom batch-specific trigger rule.

Consider the following example that is a request to re-process all recipes in the batch that were originally created within the 2016 calendar year and have since gone through a revision. Additionally, the job names Job 1 and Job 2 should also be included even if they have not actually changed since the original recipe type revision. The priority is used to override the original priority of each job type. Lastly, it will create new recipes for all source files that have not already been queued that are plain/text and have the data tags foo and bar.

Example batch definition:

{
   "version": "1.0",
   "date_range": {
      "type": "created",
      "started": "2016-01-01T00:00:00.000Z",
      "ended": "2016-12-31T00:00:00.000Z"
   },
   "job_names": [
       "Job 1",
       "Job 2"
   ],
   "priority": 1000,
   "trigger_rule": {
      "condition": {
         "media_type": "text/plain",
         "data_types": [
            "foo",
            "bar"
         ]
      },
      "data": {
         "input_data_name": "my_file",
         "workspace_name": "my_workspace"
      }
   }
}

Batch Definition Specification Version 1.0

A valid batch definition is a JSON document with the following structure:

{
   "version": STRING,
   "date_range": {
      "type": "created"|"data",
      "started": STRING,
      "ended": STRING
   },
   "job_names": [
      STRING,
      STRING
   ],
   "all_jobs": BOOLEAN,
   "priority": INTEGER,
   "trigger_rule": {
      "condition": {
         "media_type": STRING,
         "data_types": [
            STRING,
            STRING
         ]
      },
      "data": {
         "input_data_name": STRING,
         "workspace_name": STRING
      }
   }
}

`version`

Type: String
Required: No

Defines the version of the batch specification used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value, if not included, is the latest version (currently 1.0).

`date_range`

Type: JSON Object
Required: No

Defines a date range of existing recipes to include in a batch. If not provided, date_range defaults to null (no date range limit). The started and ended parameters are each optional by themselves, but at least one of them (or both) must be included in a date_range declaration. The JSON object has the following fields:

type

Type: String
Required: No

Defines the type of the date range. If this parameter is not included, it defaults to the created value. The valid types are:
- created
  
  Matches recipes based on the timestamp of when they were originally created.
- data
  
  Matches recipes based of the timestamp of when its input files were collected.
started

Type: String
Required: No

Defines the minimum value of the date range filter. The value should follow the ISO-8601 datetime standard.
ended

Type: String
Required: No

Defines the maximum value of the date range filter. The value should follow the ISO-8601 datetime standard.

`job_names`

Type: Array
Required: No

Defines specific jobs that will be re-processed as part of the batch recipe. Any job that has changed between the original recipe type revision and the current revision will automatically be included in the batch, however this parameter can be used to include additional jobs that did not have a revision change. If a job is selected to be re-processed, all of its dependent jobs will automatically be re-processed as well.

`all_jobs`

Type: Boolean
Required: No

Indicates every job in the recipe should be re-processed, regardless of whether the recipe type revision actually changed. This parameter overrides the values included in the job_names parameter.

`priority`

Type: Integer
Required: No

Indicates every job in the recipe should be queued with an override priority instead of the default priority defined by the job type. This option allows for large batches to be executed with a lower priority to avoid impacting real-time processing or to fix products as quickly as possible using a higher priority.

`trigger_rule`

Type: Object | Boolean
Required: No

Defines rules used to determine whether new recipes should be queued for existing source files that have never been run with the associated recipe type before. Omitting the field indicates old source files will be ignored. Setting the field to True is a special case that will cause old source files to be evaluated using the current trigger rule defined by the recipe type. Lastly, a completely custom trigger rule can be included using the nested fields below that will evaluate old source files in the context of this batch. The supported fields are similar to the ingest trigger definition.

condition

Type: JSON Object
Required: No

Contains other fields that specify the conditions under which this rule is triggered. If not provided, the rule is triggered by EVERY source file.
- media_type
  
  Type: String
  Required: No
  
  Defines a media type. A source file must have the identical media type defined here in order to match this trigger rule. If not provided, the field defaults to “” and all file media types are accepted by the rule.
- data_types
  
  Type: Array
  Required: No
  
  A source file must have all of the data types that are listed here tagged to the file in order to match this trigger rule. If not provided, the field defaults to [] and no data types are required.
data

Type: JSON Object
Required: Yes

Contains other fields that specify the details for creating the job/recipe linked to this trigger rule.
- input_data_name
  
  Type: String
  Required: Yes
  
  Specifies the input parameter name of the triggered job/recipe that the source file should be passed to when the job/recipe is created and placed on the queue.
- data_types
  
  Type: String
  Required: Yes
  
  Contains the unique system name of the workspace that should store the products created by the triggered job/recipe.

Home
What's New
- Working Release Notes
- Official Releases
In-depth Topics
Developer Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batches

Batch Definition Specification Version 1.0

`version`

`date_range`

`type`

`started`

`ended`

`job_names`

`all_jobs`

`priority`

`trigger_rule`

`condition`

`media_type`

`data_types`

`data`

`input_data_name`

`data_types`

Clone this wiki locally