Skip to content

Write a Rodan job package

Ling-Xiao Yang edited this page Feb 13, 2015 · 24 revisions

The job packages reside in the folder /rodan/jobs/. A job vendor provides its directory under this folder, where multiple Rodan jobs are defined. A job vendor can define the resource types that are required for its jobs as well.

1. Describe a Rodan Job

A Rodan job is defined by a class that inherits rodan.jobs.base.RodanTask. The class should define the following attributes as its description:

attribute description
name string a unique name within all the jobs provided by the vendor.
author string the author of the job.
description string
settings [JSON Schema](http://json-schema.org/)1 the validation schema that describes the requirements of the job settings.
enabled boolean
category string
interactive boolean indicates whether the job will pause at some point and wait for manual input.2
input_port_types list of Python dictionary
output_port_types list of Python dictionary

1 - At present, Rodan only supports a JSON object as the topmost structure of settings.

2 - It is only informative for the users. It does not affect whether the job will pause. The behaviour of the job is determined by the return value of its execution code.

For input_port_types and output_port_types, the following keys should be defined:

key description
name string
resource_types list of string OR lambda: string -> boolean describes all possible resource MIME-types. If provided with a lambda function, Rodan will automatically filter the matched resource types in its registry.
minimum number minimum requirement of the job. 0 indicates no minimum requirement.
maximum number maximum requirement of the job. 0 indicates no maximum requirement.

2. Implement the Job

The execution of a job can have two possible phases: automatic phase and manual phase. In automatic phase, the job is sent to background workers that are distributed on the network; in manual phase, the job communicates with human through a web interface via HTTP protocol.

A job always starts and ends with an automatic phase. It is allowed to go back and fro between automatic phases and manual phases:

The automatic phases are implemented in the method run_my_task (and my_error_information). The manual phases are implemented in the methods get_my_interface and validate_my_user_input.

2.1 Implement Automatic Phases

The signature of method run_my_task should be:

run_my_task(self, inputs, settings, outputs)

This method is expected to read the resource files as described in inputs, process them according to the configuration in settings, and produce the result files at the paths as described in outputs.

The parameter inputs is a Python dictionary. Every key-value pair maps a type of input ports to the list of details of the input resources. The details are Python dictionaries that include:

key value
resource_path string the path to the input resource file
resource_type string the MIME-type of the input resource

Therefore, inputs is a dictionary, which maps a string (the name of the input port type), to a list of dictionaries (the details of the input resources). For example, if a job is executed with 2 inputs typed "image" and 1 input typed "mask", the inputs will be structured like:

{
    "image": [{
        "resource_path": "/some/path/file1",
        "resource_type": "image/jpeg"
    }, {
        "resource_path": "/some/path/file2",
        "resource_type": "image/png"
    }],
    "mask": [{
        "resource_path": "/some/path/file3",
        "resource_type": "image/bmp"
    }]
}

The parameter outputs is alike the parameter inputs, but the detail of resource is a little bit different:

key value
resource_path string the path that is supposed to been written into
resource_type string the MIME-type of the output resource

The parameter settings is a Python dictionary that is validated against the JSON schema that the job has defined.

The job can raise any exceptions in automatic phases. By default, the exception message and traceback are as the error summary and details, respectively. This behaviour can be changed by defining the method my_error_information(self, exc, traceback), where exc is the exception object and traceback is a traceback object. The method should return a Python dictionary that includes error_summary and error_details.

If the job needs a temporary directory to work with, the recommended way is:

with self.tempdir() as tempdir:
    # do things inside tempdir

... to avoid producing filesystem garbage upon an exception.

run_my_task method can return an instance of self.WAITING_FOR_INPUT to indicate its requirement of a manual phase (see section 2.3). Other types of return value will be ignored and treated as a signal of job completion.

2.2 Implement Manual Phases

In manual phases, the job is put forward to receive and response HTTP requests. Upon a GET request, the job needs to provide its web interface; upon a POST request, the job validates the input data and updates its settings accordingly.

2.2.1 get_my_interface

get_my_interface method returns the web interface. Its signature is:

get_my_interface(self, inputs, settings)

The data structure of argument inputs is alike the counterpart in automatic phases. But in manual phases, inputs provides more details for the interface to locate resource URLs remotely:

key value
resource_path string the path to the input resource file
resource_type string the MIME-type of the input resource
resource_url string the URL to the original resource file
small_thumb_url string the URL to the small thumbnail
medium_thumb_url string the URL to the medium thumbnail
large_thumb_url string the URL to the large thumbnail

The argument settings is structured the same as its automatic counterpart.

get_my_interface method is expected to return a tuple (t, c), where t is the relative path to the template HTML file. The path should be relative to the vendor's package, and the template HTML file should be written in Django template language.

c is a Python dictionary that defines the variables and their values to be rendered in the HTML template.

2.2.2 validate_my_user_input

Signature:

validate_my_user_input(self, inputs, settings, user_input)

This method validates the user input through HTTP POST request. The user input is provided as JSON data in user_input. If validation fails, it is expected to raise an instance of self.ManualPhaseException that incurs an HTTP 400 response (with error message) back to the interface.

If validation passes, the method should return a Python dictionary of the update of the settings. All updated keys should start with '@' or they will be discarded (reason see section 2.3).

The inputs and settings arguments are structured in the same way as in automatic phases.

2.3 State Transform between Automatic Phases and Manual Phases

3. Test the Job

test_my_task(self, testcase)

This method is called during the unit test of Rodan.

This method should call run_my_task() and/or get_my_interface() and/or validate_my_user_input. Before calling the job code, this method needs to construct inputs, settings, and outputs objects as parameters to feed the methods.

Its own parameter testcase refers to the Python TestCase object. Aside from assertion methods like assertEqual() and assertRaises(), it provides new_available_path() which returns a path to a nonexist file in the temporary filesystem. test_my_task method can thus create input files in these paths and feed them into the job methods.

4. Describe Resource Types

The resource MIME-types should be defined for Rodan to recognize them. A vendor can describe the required resource MIME-types through a file resource_types.yaml in the vendor directory. It is a list of mappings, which include:

name description
mimetype string
description (optional) string
extension (optional) string the suggested extension of this resource type.

5. Import the Job

Rodan imports the vendor module. Therefore, it is the vendor's responsibility to import the jobs in outermost __init__.py. It is not necessary to import every class, though -- import the Python file that contains the job classes, and Rodan will find the job classes and register them.

It is safer to use rodan.jobs.module_loader function to import the job modules. module_loader will catch the ImportError and write it into the log file instead of throwing an exception that terminates Rodan.

Clone this wiki locally