-
Notifications
You must be signed in to change notification settings - Fork 13
Write a Rodan job package
Jobs are modules that do a specific task in a workflow. This can be as simple as converting an arbitrary image into PNG format, or as complex as performing shape analysis and recognition on an image. This section will serve as an introduction on how to write Jobs for Rodan so that they can be used in a workflow.
All job code should be contained in the rodan/jobs
directory. There are several sub-directories, like gamera
, neon
, etc, considered of different job packages. A job package provides its directory under this folder, where multiple Rodan jobs are defined. A job package can define the resource types that are required for its jobs as well.
A Rodan job is defined by a class that inherits rodan.jobs.base.RodanTask
. The class should define the following attributes as its description:
attribute | description |
---|---|
name |
string a unique name within all the jobs provided by the vendor. |
author |
string the author of the job. |
description |
string |
settings |
[JSON Schema](http://json-schema.org/) 1 the validation schema that describes the requirements of the job settings. |
enabled |
boolean |
category |
string |
interactive |
boolean indicates whether the job will pause at some point and wait for manual input.2
|
input_port_types |
list of Python dictionary |
output_port_types |
list of Python dictionary |
1At present, Rodan only supports a JSON object as the topmost structure of settings.
2It is only informative for the users. It does not affect whether the job will pause. The behaviour of the job is determined by the return value of its execution code.
For input_port_types
and output_port_types
, the following keys should be defined:
key | description |
---|---|
name |
string |
resource_types |
list of string OR lambda: string -> boolean describes all possible resource MIME-types. If provided with a lambda function, Rodan will automatically filter the matched resource types in its registry. |
minimum |
number minimum requirement of the job. 0 indicates no minimum requirement. |
maximum |
number maximum requirement of the job. 0 indicates no maximum requirement. |
is_list |
boolean whether it should take a Resource or a ResourceList . |
The execution of a job can have two possible phases: automatic phase and manual phase. In automatic phase, the job is sent to background workers that are distributed on the network; in manual phase, the job communicates with human through a web interface via HTTP protocol.
A job always starts and ends with an automatic phase. It is allowed to go back and fro between automatic phases and manual phases:
The automatic phases are implemented in the method run_my_task
(and my_error_information
). The manual phases are implemented in the methods get_my_interface
and validate_my_user_input
.
The signature of method run_my_task
should be:
run_my_task(self, inputs, settings, outputs)
This method is expected to read the resource files as described in inputs
, process them according to the configuration in settings
, and produce the result files at the paths as described in outputs
.
The parameter inputs
is a Python dictionary. Every key-value pair maps a type of input ports to the list of details of the input resources. The details are Python dictionaries that include:
key | value |
---|---|
resource_path |
string the path to the input resource file |
resource_type |
string the MIME-type of the input resource |
If the input port is list-typed (is_list==True
), the provided resource list is represented by a list of pairs as above.
For example, if a job is executed with 2 inputs typed "image" (not list-typed) and 2 input typed "mask" (list-typed), the inputs
will be structured like:
{
"image": [{
"resource_path": "/some/path/file1",
"resource_type": "image/jpeg"
}, {
"resource_path": "/some/path/file2",
"resource_type": "image/png"
}],
"mask": [
[{
"resource_path": "/some/path/file3",
"resource_type": "image/bmp"
}, {
"resource_path": "/some/path/file4",
"resource_type": "image/bmp"
}, {
"resource_path": "/some/path/file5",
"resource_type": "image/bmp"
}], [{
"resource_path": "/some/path/file6",
"resource_type": "image/bmp"
}, {
"resource_path": "/some/path/file7",
"resource_type": "image/bmp"
}, {
"resource_path": "/some/path/file8",
"resource_type": "image/bmp"
}]
]
}
The parameter outputs
is alike the parameter inputs
, but the detail of resource is a little bit different:
key | value |
---|---|
resource_path |
string the path that is supposed to be written into (only for non-list typed ones) |
resource_folder |
string the path that all files of the resource list are supposed to be written into (only for list-typed ones) |
resource_type |
string the MIME-type of the output resource |
For example, if a job is executed with 2 outputs typed "result" (not list-typed) and 2 input typed "aux files" (list-typed), the outputs
will be structured like:
{
"result": [{
"resource_path": "/some/path/file1",
"resource_type": "image/jpeg"
}, {
"resource_path": "/some/path/file2",
"resource_type": "image/png"
}],
"aux files": [{
"resource_folder": "/some/path/folder1",
"resource_type": "image/jpeg"
}, {
"resource_folder": "/some/path/folder2",
"resource_type": "image/png"
}]
}
Again, in the output
object, the resource_path
points to a file that does NOT exist, and the resource_folder
is empty. The job code should fill them in with output files.
The parameter settings
is a Python dictionary that is validated against the JSON schema that the job has defined.
The job can raise any exceptions in automatic phases. By default, the exception message and traceback are as the error summary and details, respectively. This behaviour can be changed by defining the method my_error_information(self, exc, traceback)
, where exc
is the exception object and traceback
is a traceback object. The method should return a Python dictionary that includes error_summary
and error_details
.
If the job needs a temporary directory to work with, the recommended way is:
with self.tempdir() as tempdir:
# do things inside tempdir
... to avoid producing filesystem garbage upon any exception (including the ones of Celery environment).
run_my_task
method can return an instance of self.WAITING_FOR_INPUT
to indicate its requirement of a manual phase (see section 2.3). Other types of return value will be ignored and treated as a signal of job completion.
In manual phases, the job is put forward to receive and response HTTP requests. Upon a GET request, the job needs to provide its web interface; upon a POST request, the job validates the input data and updates its settings accordingly.
get_my_interface
method returns the web interface. Its signature is:
get_my_interface(self, inputs, settings)
The data structure of argument inputs
is alike the counterpart in automatic phases. But in manual phases, inputs
provides more details for the interface to locate resource URLs remotely:
key | value |
---|---|
resource_path |
string the path to the input resource file |
resource_type |
string the MIME-type of the input resource |
resource_url |
string the URL to the original resource file |
small_thumb_url |
string the URL to the small thumbnail |
medium_thumb_url |
string the URL to the medium thumbnail |
large_thumb_url |
string the URL to the large thumbnail |
The argument settings
is structured the same as its automatic counterpart.
get_my_interface
method is expected to return a tuple (t, c)
, where t
is the relative path to the template HTML file. The path should be relative to the vendor's package, and the template HTML file should be written in Django template language.
c
is a Python dictionary that defines the variables and their values to be rendered in the HTML template.
The interface can reference resource files, like CSS, JS, and images. Resource files need to be placed in the static
folder inside the job vendor directory. For example, if a CSS file is placed at static/css/mystyle.css
, the HTML template can provide the following link to it:
<link href="static/css/mystyle.css" rel="stylesheet">
Note: If you link external stylesheets and Javascripts provided by a CDN, be sure that the CDN can serve these resources via HTTPS. Otherwise, user's browser will refuse to load HTTP resources if Rodan is served via HTTPS ("Mixed Content" error). The good practice of linking external resources is without protocol type:
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
Signature:
validate_my_user_input(self, inputs, settings, user_input)
This method validates the user input through HTTP POST request. The user input is provided as JSON data in user_input
. If validation fails, it is expected to raise an instance of self.ManualPhaseException
that incurs an HTTP 400 response (with error message) back to the interface.
If validation passes, the method should return a Python dictionary of the update of the settings. All updated keys should start with '@' or they will be discarded (reason see section 2.3). The dictionary can be wrapped as an instance of self.WAITING_FOR_INPUT
to let the job stay in the manual phase.
The inputs
and settings
arguments are structured in the same way as in automatic phases.
A job can have multiple automatic phases and manual phases, but there is only one method run_my_task
for all automatic phases and one set of methods get_my_interface
and validate_my_user_input
for all manual phases. However, run_my_task
and validate_my_user_input
can modify the settings
of the job, and thus provide a clue to determine the exact phase according to the value of settings.
As stated above, run_my_task
can return an instance of self.WAITING_FOR_INPUT
to launch a manual phase. The update of settings can be performed at this point, like:
# in run_my_task
return self.WAITING_FOR_INPUT({'@field1': newVal1, '@field2': newVal2})
Notice that the fields of updated settings must be prefixed with @
, in order not to overwrite the original settings. Fields not starting with @
will be removed.
The job methods should read the @
-prefixed settings to determine which exact phase to perform.
Similarly, validate_my_user_input
can return an unwrapped Python dictionary of setting updates, like:
# in validate_my_user_input
return {'@field1': newVal1, '@field2': newVal2}
If validate_my_user_input
needs to let the job stay in a manual phase, it can also return an instance of self.WAITING_FOR_INPUT
. Additionally, it can provide an HTTP response to the interface, like:
# in validate_my_user_input
return self.WAITING_FOR_INPUT({'@field1': newVal1, '@field2': newVal2}, response="Please continue working on this manual phase.")
test_my_task(self, testcase)
This method is called during the unit test of Rodan.
This method should call run_my_task()
and/or get_my_interface()
and/or validate_my_user_input
. Before calling the job code, this method needs to construct inputs
, settings
, and outputs
objects as parameters to feed the methods.
Its own parameter testcase
refers to the Python TestCase object. Aside from assertion methods like assertEqual()
and assertRaises()
, it provides new_available_path()
which returns a path to a nonexist file in the temporary filesystem. test_my_task
method can thus create input files in these paths and feed them into the job methods.
The resource MIME-types should be defined for Rodan to recognize them. A vendor can describe the required resource MIME-types through a file resource_types.yaml
in the vendor directory. It is a list of mappings, which include:
name | description |
---|---|
mimetype |
string |
description |
(optional) string
|
extension |
(optional) string the suggested extension of this resource type. |
Rodan imports the vendor module according to RODAN_JOB_PACKAGES
in settings_production.py
. Therefore, it is the vendor's responsibility to import the jobs in outermost __init__.py
. It is not necessary to import every class, though -- import the Python file that contains the job classes, and Rodan will find the job classes and register them.
It is safer to use rodan.jobs.module_loader
function to import the job modules. module_loader
will catch the ImportError
and write it into the log file instead of throwing an exception that terminates Rodan.
- Repository Structure
- Working on Rodan
- Testing Production Locally
- Working on Interactive Classifier
- Job Queues
- Testing New Docker Images
- Set up Environment Variables
- Set up SSL with Certbot
- Set up SSH with GitHub
- Deploying on Staging
- Deploying on Production
- Import Previous Data