Available transformers

SPADE includes a set of transformers to rewrite the responses of provenance queries. They are described below.

DropKeys

Since provenance query responses may have more detail than is needed, it may be preferable to eliminate some of the annotations by specifying their keys. This transformer supports such functionality. By default, the keys are assumed to be listed in the SPADE configuration file cfg/spade.transformer.DropKeys.config. If another file is used, it can be specified with the config argument when adding the transformer:

-> add transformer DropKeys position=1 config=/tmp/DropKeys.config
Adding transformer DropKeys... done

Alternatively, keys to be eliminated can be directly specified with the keys argument when adding the transformer. For example, to eliminate the storageId and size annotations, use:

-> add transformer DropKeys position=1 keys=storageId,size
Adding transformer DropKeys... done

TemporalTraversal

The ancestral lineage of a vertex is constructed by backward traversal of the provenance graph. If the edges have ordered event identifiers or timestamps, this information can be used to temporally scope the traversal. In particular, the time of an edge emerging from a process is used to eliminate all later edges entering the process.

Similarly, during a forward traversal to identify descendants, the time of an edge entering a process vertex can be used to eliminate all earlier edges emerging from that process.

The annotation key used to determine the temporal ordering can be specified by passing it as an argument. If none is provided, the default is to use event identifiers -- that is, an argument of order="event id" is implicit. Ordering using timestamps can be specified with:

-> add transformer TemporalTraversal position=1 order=timestamp
adding transformer TemporalTraversal... done

NoVersions

When a file is repeatedly written by a process, a corresponding number of artifact vertices (with different version numbers) appear in the provenance graph. This transformer combines all versions of the file into a single one and removes the version annotation.

MergeIO

When a process repeatedly reads (or writes, respectively) a file, a corresponding number of edges are created. In the context of dependency analysis, a single edge suffices. This transformer merges all read (or write, respectively) edges into a single one representing the flow of data from (or to, respectively) the file.

SimpleForks

When a child process (after a fork or clone call) is replaced by another process (via an execve call), the intermediate process is eliminated from the graph. In particular, "parent ---fork/clone---> intermediate ---execve---> child" is replaced by "parent ---fork/clone---> child".

Blacklist

In some cases, it may be preferable to eliminate some of the file artifacts from the provenance graph. For example, particular files, extensions, or subtrees in the filesystem may be deemed of no interest. In such cases, a blacklist can be specified in the SPADE configuration cfg/spade.transformer.Blacklist.config. Any artifact with a filename that matches the expression will be removed from the graph (along with all incident edges).

NoEphemeralWrites

If a file is only modified by a single process and never read by any other process, the writes are deemed ephemeral. This transformer eliminates all such ephemeral writes from the provenance graph.

NoEphemeralReads

If a file is only read by a single process and never modified by any other process, the reads are deemed ephemeral. In general, ephemeral reads are of interest. In the special case that the reads are from "garbage" files (such as applications' predefined temporary files), it may be preferable to eliminate them from the graph. This transformer supports the read elimination, using a list of garbage files specified in the SPADE configuration cfg/spade.transformer.NoEphemeralReads.config. If the optional argument limited=false is specified, ephemeral read elimination is not limited to the files specified in the configuration.

-> add transformer NoEphemeralReads position=1 limited=false
Adding transformer NoEphemeralReads... done

Prune

A query response graph may contain portions that are not of interest. This transformer takes an expression framed over the annotations on vertices. It will prune the subgraphs that flow to or from all matching vertices (with the direction automatically determined by query that gave rise to the response graph). For example, it may be preferable to ignore the provenance of the sudo command when returning the provenance of a file created by the program that was executed via sudo. This can be effected with:

-> add transformer Prune position=1 expression=name:sudo
Adding transformer Prune... done

LastName

A file may be renamed or linked to, allowing it to subsequently be referred to by a new name. This transformer can be used to retain the write edge from the process that performed the rename or link operation to the new artifact, while eliminating the analogous read edge from the old artifact and the edge between the old and new artifacts. This simplifies the provenance to reflect only the last name of an artifact.

NoUnits

When a program is instrumented with BEEP¹, internal loop execution can be interpreted as unit vertices. In the context of workflow analysis, it may be preferable to abstract away the units. This transformer does this by merging all unit vertices with that of the containing process.

NoMemory

When BEEP¹ is used, inter-unit communication may occur through memory addresses that are depicted as artifact vertices in the provenance graph. If this level of detail is not needed, this transformer can be used to abstract away the flows through memory addresses. In particular, memory artifact vertices and the edges representing reads to and from them are eliminated.

BEEP

This transformer composes several others in a specific order. It can be used to provide results that match those produced by BEEP¹. Different transformations must be performed, depending on whether an ancestor or descendant lineage query was executed. The specific transformers, arguments, and order used for each type of query are defined in the SPADE configuration file cfg/spade.transformer.BEEP.config, which has sections for both ancestor and descendant queries. This transformer automatically determines which configuration to use based on the query that gave rise to the response graph being processed.

¹Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu, High accuracy attack provenance via binary-based execution partition, 20th Network and Distributed System Security Symposium, 2013.

This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Setting up SPADE
Storing provenance
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
  - On Linux
  - On macOS
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
  - Using filters
  - Available filters
Viewing provenance
- In a graph database
- In a relational database
Querying SPADE
- Illustrative example
- Transforming query responses
  - Using transformers
  - Available transformers
- Protecting query responses
Miscellaneous

Provide feedback

Saved searches

Use saved searches to filter your results more quickly