Skip to content

Framework Configuration

juliencruz edited this page Jan 18, 2015 · 17 revisions

In the previous section you set up a Maven project capable of compiling a StreamFlow Framework JAR. In order to identify the Spouts and Bolts that are available in your Framework JAR, StreamFlow utilizes a single Framework configuration file. The framework configuration is integral to building Components that can be used within the StreamFlow UI to dynamically build topologies. Although the project you created in the last section will compile, the lack of a framework.yml configuration file will prevent StreamFlow from registering any Spouts or Bolts within your JAR.

Important: Even though you provide the source code in your framework project, Spouts and Bolts which have not been registered in the framework.yml will not be visible by the StreamFlow UI.

The framework configuration file is used to register all of the Spouts, Bolts, Resources, and Serializations that are present within the framework jar project. This configuration can be defined using a YAML format or a JSON format. Selection of the configuration format (YAML/JSON) is solely a personal preference as all settings are available in each format. Although either format can be used, YAML is the recommended format as it is less verbose than JSON when formatting the configuration file.

The framework.yml and framework.json configuration files must be located in a STREAMFLOW-INF folder at the root of the class path (e.g. src/main/resources/STREAMFLOW-INF/framework.yml or src/main/resources/STREAMFLOW-INF/framework.json). The following sample framework.yml and framework.json files outline the format of these configuration files. Following the examples, each major section of the configuration will be covered in detail.

Note: The following YAML and JSON configurations are equivalent and you should only define either framework.yml OR framework.json in your project.

Sample framework.yml

# Framework Properties
name: sample-framework
label: Sample Framework
version: 1.0.0-SNAPSHOT
description: Spouts and Bolts implemented for demonstration purposes

# Framework Components
components: 
  - name: sample-bolt
    label: Sample Bolt
    type: storm-bolt
    description: Sample bolt used for demonstration purposes
    mainClass: streamflow.bolt.SampleBolt
    properties: 
      - name: field-one
        label: Field One
        description: Description of field one
        defaultValue: 10
        required: true
        type: number
      - name: field-two
        label: Field Two
        description: Description of field two
        defaultValue: second
        required: false
        type: select
        options:
            listItems:
              - first
              - second
              - third   
    inputs: 
      - key: default
        description: Takes any tuple as input
    outputs:
      - key: default
        description: Processed activity content

# Framework Resources      
resources:
  - name: file-resource
    label: File Resource
    description: File Resource for Testing
    resourceClass: streamflow.resource.FileResource
    properties:
      - name: file-resource
        label: File Resource
        description: Use this resource to upload and save files
        defaultValue: 
        required: true
        type: file

# Framework Serializations
serializations:
  - typeClass: streamflow.serializer.SampleType
    serializerClass: streamflow.serializer.SampleTypeSerializer
  - typeClass: streamflow.serializer.AnotherSampleType

Sample framework.json

{
    "name": "sample-framework",
    "label": "Sample Framework",
    "version": "1.0.0-SNAPSHOT",
    "description": "Spouts and Bolts implemented for demonstration purposes",
    "components": [
        {
            "name": "sample-bolt",
            "label": "Sample Bolt",
            "type": "storm-bolt",
            "description": "Sample bolt used for demonstration purposes",
            "mainClass": "streamflow.bolt.SampleBolt",
            "properties": [
                {
                    "name": "field-one",
                    "label": "Field One",
                    "description": "Description of field one",
                    "defaultValue": "10",
                    "required": true,
                    "type": "number"
                },
                {
                    "name": "field-two",
                    "label": "Field Two",
                    "description": "Description of field two",
                    "defaultValue": "second",
                    "required": false,
                    "type": "select",
                    "options": {
                        "listItems": [
                            "first",
                            "second",
                            "third"
                        ]
                    }
                }
            ],
            "inputs": [
                {
                    "key": "default",
                    "description": "Takes any tuple as input"
                }
            ],
            "outputs": [
                {
                    "key": "default",
                    "description": "Processed activity content"
                }
            ]
        }
    ],    
    "resources": [
        {
            "name": "file-resource",
            "label": "File Resource",
            "description": "File Resource for Testing",
            "resourceClass": "streamflow.resource.FileResource",
            "properties": [
                {
                    "name": "file-resource",
                    "label": "File Resource",
                    "description": "Use this resource to upload and save files",
                    "defaultValue": "",
                    "required": true,
                    "type": "file"
                }
            ]
        }
    ],
    "serializations": [
        {
            "typeClass": "streamflow.serializer.SampleType",
            "serializerClass": "streamflow.serializer.SampleTypeSerializer"
        }
    ]
}

General Configuration

Framework properties define general information about a framework that is used to identify the framework. These properties are typically listed at the top of the framework configuration for clarity, although it can be located anywhere in the configuration.

Let's look at a snippet of the framework properties from the above example and walk through each property in detail.

name: sample-framework
label: Sample Framework
version: 1.0.0-SNAPSHOT
description: Spouts and Bolts implemented for demonstration purposes

name

  • Description: Globally unique identifier for the framework. You must ensure that no other frameworks use the same name, otherwise you the two frameworks will collide during framework upload. To help protect against this, it is common to prefix your framework name with namespace style information (e.g. streamflow.core.sample.framework)
  • Default: None

label

  • Description: User friendly name that users will see in the UI to identify your framework.
  • Default: None

version

  • Description: Identifies the version number of your framework to users. Although you are free to enter any value for this field, we recommend incrementing your version using Semantic Versioning.
  • Default: None

description

  • Description: Helps users understand the purpose of your framework
  • Default: None

Component Configuration

The components section of the configuration is used to register Storm Spouts and Bolts in the framework. The components property is an array which can be used to register multiple components in the configuration. The components that are registered in this section will appear in the palette of the topology editor when the framework is uploaded to the StreamFlow server.

Let's look at a snippet of the components section from the above example and walk through each property in detail.

components: 
  - name: sample-bolt
    label: Sample Bolt
    type: storm-bolt
    description: Sample bolt used for demonstration purposes
    mainClass: streamflow.bolt.SampleBolt
    properties: [] 
    inputs: 
      - key: default
        description: Takes any tuple as input
    outputs:
      - key: default
        description: Processed activity content

components.name

  • Description: Key used to uniquely identify the component within the framework. The name does not have to be globally unique across different frameworks, only within the same framework.
  • Default: None

components.label

  • Description: User friendly name that users will see in the UI to identify the component.
  • Default: None

components.type

  • Description: Special value to indicate what type of component is being implemented. It is used to help organize your components by type in the component palette of the topology builder. Currently, the only valid type values are as follows:

    • storm-spout
    • storm-bolt
    • trident-spout
    • trident-batch-spout
    • trident-partitioned-spout
    • trident-opaque-partitioned-spout
    • trident-function
    • trident-filter
    • trident-aggregator
    • trident-combiner-aggregator
    • trident-reducer-aggregator
  • Default: None

components.description

  • Description: Short description to describe the purpose of the component. This value is visible in the framework details page and as a link in the component palette.
  • Default: None

components.mainClass

  • Description: Fully qualified class name of the component implementation (e.g. streamflow.bolt.SampleBolt)
  • Default: None

components.properties

  • Description: Array of configurable properties to associate with the component. When adding the component to build a topology, these properties will be available for modification and will be injected into the component implementaton. Properties are a shared concept between Components and Resources and is covered in detail in the Property Configuration section. The example properties section above is intentionally left blank since properties are discussed in greater detail in the Property Configuration section.
  • Default: Empty Array

components.inputs

  • Description: Array of objects which define the input settings for the component. When used for Storm, only one input is allowed as Storm only accepts a single input for each component.
  • Default: Empty Array

components.inputs.key

  • Description: Unique key to identify the input. This value should map to the Storm streamId for your component implementation. By default, Storm uses a streamId of "default" for simplicity if none is specified in the implementation. If you did not configure a custom streamId, it is best to use the default value of "default"
  • Default: default

components.inputs.description

  • Description: Short description of the purpose of the input. It is often helpful to indicate any limitations on data types or formats in the description.
  • Default: None

components.outputs

  • Description: Array of objects which define the output settings for the component. You can specify multiple output objects if your implmentation supports this.
  • Default: Empty Array

components.outputs.key

  • Description: Unique key to identify the output. This value should map to the Storm streamId for your component implementation. By default, Storm uses a streamId of "default" for simplicity if none is specified in the implementation. If you did not configure a custom streamId, it is best to use the default value of "default"
  • Default: default

components.outputs.description

  • Description: Short description of the purpose of the output. It is often helpful to indicate any limitations on data types or formats in the description.
  • Default: None

Resource Configuration

The resources section is used to register new Resources with StreamFlow. Resources are used to register frequently used logic such as establishing a database connection. StreamFlow Resources are implemented as Guice modules which can accept user configurable properties similar StreamFlow Components. Due to the reuse of the property feature, you will find many similarities between the Resource and Component configuration formats.

Let's look at a snippet of the resources section from the above example and walk through each property in detail.

resources:
  - name: file-resource
    label: File Resource
    description: File Resource for Testing
    resourceClass: streamflow.resource.FileResource
    properties: []

resources.name

  • Description: Key used to uniquely identify the resource within the framework. The name does not have to be globally unique across different frameworks, only within the same framework.
  • Default: None

resources.label

  • Description: User friendly name that users will see in the UI to identify the resource.
  • Default: None

resources.description

  • Description: Short description to describe the purpose of the resource.
  • Default: None

resources.resourceClass

  • Description: Fully qualified class name of the Guice module class that supports the resource (e.g. streamflow.resource.SampleResource)
  • Default: None

resources.properties

  • Description: Array of configurable properties to associate with the resource. Properties are a shared concept between Components and Resources and is covered in detail in the Property Configuration section. The example properties section above is intentionally left blank since properties are discussed in greater detail in the Property Configuration section.
  • Default: Empty Array

Serialization Configuration

The serializations section is used to register classes within your framework that are required for serialiation. The Storm implementation uses the classes configured in the serialization section to initialize the Kryo implementation used to serialize Tuple data between Spouts and Bolts. Any top level serialization and serializer classes which would typically be registered in Storm using the topology.kryo.register property should be registered in this section.

Let's look at a snippet of the serializations section from the above example and walk through each property in detail.

serializations:
  - typeClass: streamflow.serializer.SampleType
    serializerClass: streamflow.serializer.SampleTypeSerializer
  - typeClass: streamflow.serializer.AnotherSampleType

serializations.typeClass

  • Description: The fully qualified class name of a class to register with Kryo.
  • Default: None

serializations.serializerClass

  • Description: A map from the name of a class to register to an implementation of com.esotericsoftware.kryo.Serializer
  • Default: None

Property Configuration