-
Notifications
You must be signed in to change notification settings - Fork 45
Batches
A batch represents a collection of recipes that should be scheduled for re-processing. The batch definition is a JSON document that defines exactly which recipes will be included in the batch. It will describe things like a date range and specific jobs within a recipe type to queue for execution. It also allows new recipes to be created for existing source files that have never been queued for the recipe type before using the current recipe type trigger rule or a custom batch-specific trigger rule.
Consider the following example that is a request to re-process all recipes in the batch that were originally created within the 2016 calendar year and have since gone through a revision. Additionally, the job names Job 1
and Job 2
should also be included even if they have not actually changed since the original recipe type revision. The priority is used to override the original priority of each job type. Lastly, it will create new recipes for all source files that have not already been queued that are plain/text and have the data tags foo
and bar
.
Example batch definition:
{
"version": "1.0",
"date_range": {
"type": "created",
"started": "2016-01-01T00:00:00.000Z",
"ended": "2016-12-31T00:00:00.000Z"
},
"job_names": [
"Job 1",
"Job 2"
],
"priority": 1000,
"trigger_rule": {
"condition": {
"media_type": "text/plain",
"data_types": [
"foo",
"bar"
]
},
"data": {
"input_data_name": "my_file",
"workspace_name": "my_workspace"
}
}
}
A valid batch definition is a JSON document with the following structure:
{
"version": STRING,
"date_range": {
"type": "created"|"data",
"started": STRING,
"ended": STRING
},
"job_names": [
STRING,
STRING
],
"all_jobs": BOOLEAN,
"priority": INTEGER,
"trigger_rule": {
"condition": {
"media_type": STRING,
"data_types": [
STRING,
STRING
]
},
"data": {
"input_data_name": STRING,
"workspace_name": STRING
}
}
}
Type: String
Required: No
Defines the version of the batch specification used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an olderversion
and convert it to the current version. The default value, if not included, is the latestversion
(currently1.0
).
Type: JSON Object
Required: No
Defines a date range of existing recipes to include in a batch. If not provided,date_range
defaults tonull
(no date range limit). Thestarted
andended
parameters are each optional by themselves, but at least one of them (or both) must be included in adate_range
declaration. The JSON object has the following fields:
-
Type: String
Required: No
Defines the type of the date range. If this parameter is not included, it defaults to thecreated
value. The valid types are:-
created
Matches recipes based on the timestamp of when they were originally created.
-
data
Matches recipes based of the timestamp of when its input files were collected.
-
-
Type: String
Required: No
Defines the minimum value of the date range filter. The value should follow theISO-8601
datetime standard. -
Type: String
Required: No
Defines the maximum value of the date range filter. The value should follow theISO-8601
datetime standard.
Type: Array
Required: No
Defines specific jobs that will be re-processed as part of the batch recipe. Any job that has changed between the original recipe type revision and the current revision will automatically be included in the batch, however this parameter can be used to include additional jobs that did not have a revision change. If a job is selected to be re-processed, all of its dependent jobs will automatically be re-processed as well.
Type: Boolean
Required: No
Indicates every job in the recipe should be re-processed, regardless of whether the recipe type revision actually changed. This parameter overrides the values included in thejob_names
parameter.
Type: Integer
Required: No
Indicates every job in the recipe should be queued with an override priority instead of the default priority defined by the job type. This option allows for large batches to be executed with a lower priority to avoid impacting real-time processing or to fix products as quickly as possible using a higher priority.
Type: Object | Boolean
Required: No
Defines rules used to determine whether new recipes should be queued for existing source files that have never been run with the associated recipe type before. Omitting the field indicates old source files will be ignored. Setting the field toTrue
is a special case that will cause old source files to be evaluated using the current trigger rule defined by the recipe type. Lastly, a completely custom trigger rule can be included using the nested fields below that will evaluate old source files in the context of this batch. The supported fields are similar to the ingest trigger definition.
-
Type: JSON Object
Required: No
Contains other fields that specify the conditions under which this rule is triggered. If not provided, the rule is triggered by EVERY source file.-
Type: String
Required: No
Defines a media type. A source file must have the identical media type defined here in order to match this trigger rule. If not provided, the field defaults to“”
and all file media types are accepted by the rule. -
Type: Array
Required: No
A source file must have all of the data types that are listed here tagged to the file in order to match this trigger rule. If not provided, the field defaults to[]
and no data types are required.
-
-
Type: JSON Object
Required: Yes
Contains other fields that specify the details for creating the job/recipe linked to this trigger rule.-
Type: String
Required: Yes
Specifies the input parameter name of the triggered job/recipe that the source file should be passed to when the job/recipe is created and placed on the queue. -
Type: String
Required: Yes
Contains the unique system name of the workspace that should store the products created by the triggered job/recipe.
-
- Home
- What's New
-
In-depth Topics
- Enable Scale to run CUDA GPU optimized algorithms
- Enable Scale to store secrets securely
- Test Scale's scan capability on the fly
- Test Scale's workspace broker capability on the fly
- Scale Performance Metrics
- Private docker repository configuration
- Setting up Automated Snapshots for Elasticsearch
- Setting up Cluster Monitoring
- Developer Notes