diff --git a/.gitignore b/.gitignore
index 8577387..ea148fb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,3 +4,5 @@
 .settings
 bin/
 target/
+logs/
+pipeline-output/
diff --git a/README.md b/README.md
index ea37c61..3cadffd 100644
--- a/README.md
+++ b/README.md
@@ -1,394 +1,193 @@
-AI2 Pipeline Framework
-=========================
-
-
-Design Goals
-============
-
-A common pain point in analysis-driven software development is the
-management of data sets and experimental results.  In the absence of an
-organizing framework, the tendency is for individuals to write
-standalone executables that read and parse data from disk, transform the
-data, and write back to disk.  A pipeline consisting of such steps is
-difficult to manage by hand for the following reasons:
-
-1.  No validation checks on the compatibility of data written as the
-    output of one step with the input needed for a following step
-2.  No record in the code of the upstream steps needed to produce a
-    particular intermediate step
-3.  Code is difficult to reuse and expensive to migrate to other (e.g.
-    cloud-based) storage systems.
-
-These problems can be alleviated by appropriately chosen and enforced
-conventions, but even better is a framework that solves them for
-developers in a consistent way.  Such a framework should:
-
-1.  Be a standalone library (i.e. not a hosted solution)
-2.  Be as convenient to use as writing typical standalone executables
-3.  Have the ability to express an end-to-end pipeline in compiled code
-    (i.e. not specified in config files)
-4.  Enforce consistency between connected outputs and inputs
-5.  Cache datasets that are re-used by multiple consumers
-6.  Support streaming calculations on out-of-RAM datasets
-7.  Support easy swapping of storage implementations
-
-Pipeline Abstractions
-=====================
-
-There are three central abstractions in the data pipeline framework.
-
-Data Transformation
--------------------
-
-The most essential abstraction is the logic transforming one data
-structure into another.  This is represented in the framework by the
-Producer[T] trait.  A Producer[T] provides a lazily-computed value of
-type T returned by the get method.  Only the output type is
-parameterized, because different producers may require different inputs.
- An implementation may specify a default for whether the result is
-cached in memory, but this can be overridden when using is in a
-pipeline.  
-
-Data Storage
-------------
+#Allen-AI Pipeline Framework
 
-A data structure saved in persistent storage is represented by the
-Artifact trait.  An artifact may represent a flat file, a directory, a
-zip archive, or an S3 blob.  Future implementations could represent an
-HDFS dataset or other mechanisms.  If a data structure has been saved in
-an Artifact, then it will be read from that Artifact when needed, rather
-than recomputing the value from the underlying Producer.  In this way,
-expensive calculations are transparently cached to disk when necessary.
- The author of a pipeline specifies which data structures should be
-persisted and select the desired persistence mechanism and path names.
+The Allen-AI Pipeline (AIP) framework is a library that facilitates collaborative experimentation
+by allowing users to define workflows that share data resources transparently while maintaining
+complete freedom over the environment in which those workflows execute.
 
-Data Serialization
-------------------
+Send questions to *rodneyk@allenai.org*
 
-Serialization of a data structure of type T into an artifact of type
-A is represented by the ArtifactIo[T,A] trait.  Because common cases,
-such as serialization to JSON and delimited-columns, are implemented by the framework,
-many pipelines can be implemented end-to-end without any code that
-performs I/O.  Different serialization formats, i.e. different
-implementations of ArtifactIo, can be specified when the pipeline is
-constructed, while the Artifact instance specifies the physical location
-where the data will be stored.
+#Design Goals
 
-Example Pipeline 
-================
+Collaboration among data scientists is often complicated by the fact that individuals tend to
+experiment with algorithms in isolated environments (typically their individual workstations). Sharing
+data is difficult because there is no record of the code that was used to produce a given dataset
+and there are no validation checks on the compatibility of code with the data format. Workflows with
+any significant complexity become immediately unmanageable as soon as more than one scientist is involved.
 
-The complete code for this example can be found in
-src/test/scala/org/allenai/pipeline/SamplePipeline.scala
+There are many workflow management systems designed for production data pipelines. The problem of sharing data
+is solved by providing a centralized execution environment, but users sacrifice the ability
+to rapidly develop code that runs on their local machine while accessing production data.  To solve
+these problems, AIP:
 
-The Basic Pipeline
-------------------
+1.  Is a library that can be run locally or within a cloud environment
+1.  Caches datasets for sharing between multiple users, even if they are running in separate environments
+1.  Enforces compatibility of inputs/outputs at compile time
+1.  Supports streaming calculations on out-of-RAM datasets
+1.  Supports execution of arbitrary executables
+1.  Support easy swapping of storage implementations
 
-As an example let us take the familiar case of training and measuring a
-classification model.  Our pipeline consists of the following steps:
+#Core Concepts
 
-1.  Read a collection of labels from TSV
-2.  Read a collection of feature vectors from TSV
-3.  Join the features with the labels and split into train/test sets
-4.  Train a classifier
-5.  Measure the classifier accuracy on the test set
+##Producer
 
-First, we specify the persistence implementation to use for I/O.  These
-have methods for providing artifact representing a flat file or
-structured dataset (zip file or directory)
+A Producer[T] represents a calculation that produces an output of type T. It can have
+arbitrary inputs, and any Producer of the same output type T is interchangeable.
 
-    import IoHelpers._
-    val input = new FileSystem(inputDir)
-    val output = new FileSystem(outputDir)
-
-We read the labels using the framework’s built-in delimited-column parsing methods:
- 
+##Artifact
 
-    val labelData: Producer[Iterable[Boolean]]
-        = Read.Collection.fromText[Boolean](input.flatArtifact(labelFile))
-
-Similarly for features
-
-    val featureData: Producer[Iterable[Array[Double]]]
-    = Read.arrayCollection.fromText[Double](input.flatArtifact(featureFile))
-
-Step 3 takes steps 1 and 2 as input, as well as a parameter determining
-the relative size of the test set.  It produces a pair of datasets with
-both features and labels
-
-  
-
-    class JoinAndSplitData(features: Producer[Features]
-                           labels: Producer[Labels],
-                           testSizeRatio: Double)
-     extends Producer[(Iterable[(Boolean, Array[Double])], Iterable[(Boolean, Array[Double])])]
-
-Step 4 takes a Iterable[(Boolean, Array[Double])] producer as input and produces a
-TrainedModel object
-
-    class TrainModel(trainingData: Producer[Iterable[(Boolean, Array[Double])]]) 
-        extends Producer[TrainedModel]
-
-Step 5 takes producers of Iterable[(Boolean, Array[Double])] and TrainedModel and
-produces a P/R measurement
-
-    // Threshold, precision, recall.
-    type PRMeasurement = Iterable[(Double, Double, Double)]
-
-    class MeasureModel(model: Producer[TrainedModel], testData: Producer[Iterable[(Boolean, Array[Double])]])
-          extends Producer[PrecisionRecallMeasurement]
-
-The pipeline is defined by simply chaining the producers together
-
-    val Producer2(trainData: Producer[Iterable[(Boolean, Array[Double])]],
-                  testData: Producer[Iterable[(Boolean, Array[Double])]])
-                 = new JoinAndSplitData(featureData, labelData, 0.2)
-    val model: Producer[TrainedModel] = new TrainModel(trainData)
-    val measure: Producer[Iterable[(Double, Double, Double)]] = new MeasureModel(model, testData)
-
-Note the use of Producer2.unapply, which converts a Producer of a Tuple
-to a Tuple of Producers. To run the pipeline, we invoke the get method
-of the final step
-
-    val result = measure.get
-
-Persisting the Output 
----------------------
-
-At this point, the result of the calculation has been created in memory,
-but is not being persisted.  We would like to persist not only the final
-Iterable[(Double, Double, Double)] object, but the intermediate TrainedModel instance.  The
-earlier import of IoHelpers adds PersistedXXX methods to
-Producer instances that persist their data before passing it on to
-downstream consumers.  To use them, we must also provide an implicit
-persistence implementation.
-
-    implicit val location = output
-    val model: Producer[TrainedModel]
-            = Persist.Singleton.asJson(new TrainModel(trainData), "model.json")
-    val measure: Producer[Iterable[(Double, Double, Double)]]
-            = Persist.Collection.asText(new MeasureModel(model, testData), "PR.txt")
-
-We have opted not to persist the Iterable[(Boolean, Array[Double])] data, but we could
-do so in the same way.  Note that we have written no code that performs
-I/O directly.  Instead, we need to define the transformation between our
-data objects and JSON or column format
-
-    import spray.json.DefaultJsonProtocol._
-    implicit val modelFormat = jsonFormat1(TrainedModel)
-    implicit val prMeasurementFormat 
-      = tuple3ColumnFormat[Double, Double, Double](',')
-
-Furthermore, all that is required to have our pipeline persist data to
-S3 is to set the persistence implementation differently
-
-    val s3Config = S3Config("ai2-pipeline-sample")
-    implicit val location = new S3(s3Config)
-
-An important point is that when a Producer is persisted, its serialized
-output acts as a cached result.  That is, if the pipeline is rerun, even
-in a subsequent process, and that Producer’s output it found in the
-expected location, the result will be deserialized from the store rather
-than re-computed from its inputs.  In the "Tracking Overlapping Pipelines" section we 
-will see how this is used for pipelines that have some shared computations.
-
-Out-of-Core Datasets 
---------------------
-
-Instead of reading feature data from disk, suppose now that we compute
-it on the fly by processing XML documents from a source directory,
-producing a feature vector for each document.  Suppose further that the
-entire set of documents is too large to fit in memory.  In this case, we
-must implement a different Producer instance that will process
-an input stream of ParsedDocument objects
-
-    class FeaturizeDocuments(documents:Producer[Iterator[ParsedDocument]]) extends Producer[Features]
-
-Because this class has an Iterator as its input type, it will not hold
-the raw document dataset in memory.  To produce the Iterator of parsed
-documents, we must implement an ArtifactIo class.  Recall that an
-ArtifactIo class is parameterized with the output type (in this case,
-Iterator[ParsedDocument]) and the artifact type.  We will define ours in
-terms of the more general StructuredArtifact rather than the narrow
-DirectoryArtifact. This will allow us to read from Zip archives on the
-local file system or in S3 with the same implementation class.  The
-ArtifactIo interface includes both read and write operations, to ensure
-consistency of serialization/deserialization code throughout the
-pipeline.  For this use case, however, we only need implement the read
-operation.
-
-    object ParseDocumentsFromXML
-               extends ArtifactIo[Iterator[ParsedDocument], StructuredArtifact] {
-      def read(a: StructuredArtifact): Iterator[ParsedDocument] = {
-        for ((entry, is) \<- a.reader.readAll) yield parse(is)
-      }
-      def parse(is: InputStream): ParsedDocument = ???
-      // Writing back to XML not supported
-      def write(data: Iterator[ParsedDocument], artifact: StructuredArtifact) = ???
+A data structure saved in persistent storage is represented by the
+Artifact trait.  An artifact may represent a flat file, a directory, a
+zip archive, an S3 blob, an HDFS dataset, etc. Serialization/Deserialization of an object of type T into an artifact of type
+A is represented by the ArtifactIo[T,A] trait. AIP provides implementations to serialize arbitrary
+objects as column-delimited text or JSON (via spray.json)
+
+##Persistence
+
+A Producer[T] will create an in-memory object of type T.  This object can be passed to downstream consumers
+without storing it to disk.  If desired, it can be written to disk before being passed downstream by
+using one of the `Pipeline.Persist.*` methods.  These methods return a new Producer (actually an
+instance of PersistedProducer) with the same output type T.  A PersistedProducer's `get` method
+first checks to see whether data exists on disk. If the data exists, it will read the data from disk, and only
+if the data does not exist will it compute the data. This allows caching of intermediate steps to speed up calculations.
+
+AIP's persistence mechanism also allows users to reuse data from other users' pipeline.  Normally this is not
+possible because file names are specified by users and they may collide.  To avoid this, a Pipeline
+will choose a unique name for any Producer that it persists.  The name include a hash, which is based
+on the parameters of the Producer and the parameters of all its transitive upstream ancestors.  By using
+this hashing mechanism, different users running compatible code on different machines can share data
+without fear of collision. The information that goes into determining the path name of a Producer's output is
+encapsulated in a Signature object.
+
+*AIP's persistence mechanism makes it 100% impossible to overwrite data.* Any data that exists on disk
+will be used in place of recalculating it.  Only code that has not been executed before will result in
+new data being written, and the path name chosen will always be unique.
+This is true regardless of where the code was executed!
+
+##Pipeline
+
+A Pipeline has a `run` method that will calculate the result of any Producers that were persisted using
+one of its `persist` methods.
+When run, a Pipeline produces a static HTML visualization of the workflow, which includes hyperlinks
+to the locations of any input/output/intermediate data Artifacts.  It also produces equivalent data in
+JSON format that can be parsed programmatically.
+
+#More Details
+
+## Anatomy of a Producer
+
+A Producer is conceptually a function that is decorated with enough metadata to build a Signature. Recall
+that a Signature is used to determine a unique path name where the output of this Producer will be written.
+The easiest way to define a Producer class is to make a case class that mixes in the Ai2StepInfo trait.
+The only method that needs to be implemented in that case is the `create` method, which builds the output object
+For example:
+
+    case class CountLines(lines: Producer[Iterable[String]], countBlanks: Boolean = true) extends Producer[Int] with Ai2StepInfo {
+      override protected def create: Int =
+      if (countBlanks)
+        lines.get.size
+      else
+        lines.get.filter(_.length > 0).size
     }
 
-Now we can use our document featurizer as a drop-in replacement for the
-feature data we had originally read from TSV
-
-    val docDir = new File("raw-xml")
-    val docs = readFromArtifact(ParseDocumentsFromXML,
-                                          new DirectoryArtifact(docDir))
-    val docFeatures = new FeaturizeDocuments(docs) 
-    // use in place of featureData above
-
-Out-of-Process Computation
---------------------------
-
-Most data transformations are assumed to be implemented in Scala code.
- However, it is sometimes necessary for components in a pipeline to be
-implemented outside the JVM.  For example, our TrainModel class might
-invoke a Python trainer via a shell command.  The only appropriate input
-type for such Producer classes is an Artifact, since the JVM will only
-communicate with outside processes via some persistent store. In the
-constructor, we also supply an ArtifactIo instance to deserialize the
-output of the outside process.  A Producer that does training via a
-shell command is
-
-    class TrainModelPython(data: Producer[FileArtifact],
-                           io: ArtifactIo[TrainedModel, FileArtifact])
-          extends Producer[TrainedModel] {
-      def create: TrainedModel = {
-        val outputFile = File.createTempFile("model", ".json")
-        import sys.process.\_
-        import scala.language.postfixOps
-        val stdout = s"train.py -input ${data.get.file} -output $outputFile" !!
-        val model = io.read(new FileArtifact(outputFile))
-        model
-      }
+Notice how each Producer's `create` method calls the `get` method of its inputs. (`get` is simply an in-memory cache of
+the result of `create`)  This is the mechanism by
+which the end-to-end workflow is executed: the `Pipeline.run` method calls `get` on each persisted Producer.
+The workflow graph is isomorphic to the object graph of Producers with references to other Producers.
+
+The Signature of this Producer depends on the value of the `countBlanks` parameter, but also on the Signature of its
+input, the Producer[Iterable[String]] whose lines it is counting.  That Producer's Signature depends likewise on
+its own parameters and inputs, etc.  The outcome is that this Producer's output will be written to a
+different location depending on where in a workflow it it plugged in.
+
+Occasionally, it is necessary to change the logic of a Producer, such that its behavior will be different
+from previous versions of the code.  The Signature includes a class-version field for this purpose. To indicate a change in the logic of a Producer, override the
+`versionHistory` method.  For example:
+
+    case class CountLines(lines: Producer[Iterable[String]], countBlanks: Boolean = true) extends Producer[Int] with Ai2StepInfo {
+      override protected def create: Int =
+        if (countBlanks)
+          lines.get.size
+        else
+          lines.get.filter(_.trim.length > 0).size
+
+      override def versionHistory = List(
+        "v1.1" // Count whitespace-only lines as blank
+      )
     }
 
-Any upstream Producer that persists its results via the standard mechanism can be converted to a 
-Producer of the appropriate Artifact type, so that a a downstream out-of-JVM step can consume it.
- Otherwise, the structure of the pipeline is unchanged.
-
-    val labelData: Producer[Labels]
-         = Read.Collection.fromText[Boolean](input.flatArtifact(labelFile))
-    
-    val Producer2(trainData: Producer[Iterable[(Boolean, Array[Double])]],
-                  testData: Producer[Iterable[(Boolean, Array[Double])]])
-         = new JoinAndSplitData(docFeatures, labelData, 0.2)
-    
-    val trainingDataFile = Persist.Collection.asText(trainData, "trainData.tsv").asArtifact
-    val model = Persist.Singleton.asJson(new TrainModelPython(trainingDataFile,
-          SingletonIo.json[TrainedModel]), "model.json")
-    val measure: Producer[PRMeasurement] 
-        = Persist.Collection.asText(new MeasureModel(model, testData), "PR.txt")
-
-Tracking Overlapping Pipelines
-------------------------------
-The source code for this example is found in src/test/scala/org/allenai/pipeline/SampleExperiment.scala
-
-For most projects, we would expect to run many variants of a core pipeline, 
-specifying different parameters, different featurizations, etc., but all producing the same 
-kind of final output, for example a trained model and measurement metrics. In the previous 
-sections, the location of stored output was specified explicitly.  It is possible
-to have multiple different pipelines storing data into the same directory, 
-but it becomes difficult to make sure that the names of the output files do not conflict.  
-Alternatively, one could specify a separate output directory for each variant, 
-but then the variants cannot share intermediate calculations they may have in common.  To help 
-with the management of many different but closely related pipelines, 
-the framework provides the PipelineRunner class and the PipelineRunnerSupport interface.
-
-The PipelineRunner automatically determines the location to which Producers will persist their 
-results.  If a PipelineRunner instance is implicitly in scope, no file name needs to be specified
-when persisting a Producer:
-
-    implicit val runner = PipelineRunner.writeToDirectory(outputDir)
-    val trainDataPersisted = Persist.Collection.asText(trainData)
-    val model = Persist.Singleton.asJson(new TrainModel(trainDataPersisted))
-
-If a second pipeline is defined using a PipelineRunner that saves to the same directory, 
-even in a separate project and run on different days, the second pipeline will look for 
-persisted data in the same location, and it will re-use any calculations that are shared 
-with a previous run of a different pipeline.  In this example, the second pipeline produces its 
-training data in the same way as the first.  When the second pipeline is run, 
-it will read the training feature data from the persistent store, rather than duplicating the 
-(typically expensive) feature calculation.  By contrast, the second pipeline uses different 
-logic to train the model, so the output of the model training will be stored in a different location.
-
-    implicit val runner = PipelineRunner.writeToDirectory(outputDir)
-    val trainDataPersisted = Persist.Collection.asText(trainData)
-    val model = Persist.Singleton.asJson(new TrainModelPython(trainDataPersisted.asArtifact,
-      SingletonIo.json[TrainedModel]))
-
-The file name chosen by PipelineRunner is based on a hash of the parameters, inputs, 
-and code version of the Producer instance being persisted.  These are provided by the 
-PipelineRunnerSupport class and represented by an instance of the Signature class.  There are 
-various factory convenience methods for building Signature objects.  If the Producer instance is 
-a case class, one can declare
- 
-    override def signature = Signature.fromObject(this)
-    
-Alternatively, one can declare the names of the publicly-accessible fields that contain the 
-parameters and inputs:
-
-    override def signature = Signature.fromFields(this, "features", "labels", "testSizeRatio")
-
-The code version is specified by an instance of the CodeInfo class.  This is most conveniently 
-done by mixing in the Ai2CodeInfo trait, which uses information created by the sbt release plugin.  
-The PipelineRunner assumes by default that the logic of a particular Producer class does not change
-between releases.  In case the logic does differ from a previous release, 
-the updateVersionHistory field can be updated so that it contains a history of all the release 
-ids in which the logic of the class differs.
-
-The second purpose of the PipelineRunner is to produce a summary of a pipeline run in the form of
- an HTML page. The page will be written to the same directory as the output data and contains a
- visualization of the pipeline workflow with URL links to where output data is  stored. The page is
- produced automatically when using the PipelineRunner.run method instead of Producer.get
- 
-    runner.run(measure)
-
-Using the PipelineRunner class writing to S3 is a convenient way of managing projects with 
-many different contributors.  Users running experiments can re-use data, even from calculations 
-run on different machines.  The HTML pages stored into S3 are visible in a browser and serve as
-a record of results of the group as a whole.
-
-Summary
-=======
-
-The sample pipeline illustrates many of the benefits of the framework
-for managing a pipeline.  Here is a summary:
-
--   Guaranteed input/output location and format compatibility.  The
-    persistence path of input/output data is specified in a single
-    place.  There is no need to match a string specifying an upstream
-    step’s output with another string specifying a downstream step’s
-    input.  Similarly, it is impossible for an upstream step to write
-    data in a format different from the format expected by a downstream
-    step.  For example, if data is written using comma delimiters,
-    nothing will ever attempt to read it using tab delimiters.
--   Guaranteed input/output type compatibility.  The interfaces between
-    pipeline steps are defined in terms of Scala classes, and are
-    therefore subject to compile-time type checking.  It is impossible,
-    for example, for the training to be run on a data set that uses
-    Booleans for labels, while the measurement is done on a data set
-    that uses 0/1 for labels.  This can easily happen if the pipeline
-    steps are defined in terms of file paths.  Using the framework, such
-    a pipeline would simply not compile.
--   Easy swapping of persistence implementations. A pipeline can be
-    developed and fully debugged using local filesystem persistence and
-    then trivially and transparently migrated to use S3 for production.
-     It is highly unlikely for this migration to introduce bugs because
-    the persistence implementation is hidden from the code implementing
-    the pipeline steps.
--   Highly modular and reusable code.  Data transformation logic is
-    fully isolated from having to know where its data comes from or is
-    bound for.  Only the top-level code that defines the pipeline has
-    control over which outputs are cached in memory, which are persisted
-    in storage, the location where they are stored, and the format used
-    to store them. A Producer instance is lightweight and easily used
-    even in code that does not otherwise interact with the framework.
-     Similarly, ArtifactIo instances are lightweight, self-contained,
-    and reusable outside the framework.  While it is certainly possible
-    to write reusable code without the framework, using the framework
-    makes it impossible not to write modular code.
--   Distinct users running different (but related) pipelines can gain efficiency by sharing data 
-    between pipelines and are automatically provided with a record of past pipeline runs and their 
-    outputs.   
+In this way, cached data produced by older versions of your code can coexist with more recent versions.  Different
+users can share data without conflict despite possibly running different versions of the code. (The value of the
+version field can be any string, so long as it is unique.)
+
+What happens if you change the logic of a Producer but forget to update the `versionHistory` method?
+Even in this case, it is impossible to overwrite existing data.  Instead, your Producer may end up reading cached
+data instead of recomputing based on the new logic.  To force a recomputation, you must change the Signature by updating the
+`versionHistory` field.
+
+##Dry Runs
+
+Before running a pipeline, you can call the `Pipeline.dryRun` method.  This will not perform any calculations,
+but will output the summary HTML, allowing you to visualize your workflow before executing it. The HTML
+will contain hyperlinks to the Signature-based path names where any output Artifacts will be written. Any outputs
+that do not yet exist will be highlighted in red.  It is possible that all outputs exist already, created by previous
+runs by you or another user.  In that case, `Pipeline.run` will return immediately without performing
+any calculations.
+
+##Configuration
+
+Use Typesafe Config to control persistence in a pipeline.
+
+##Out-of-core Datasets
+
+Have your Producers produce Iterators to stream data through a Pipeline.
+
+##Parallel Execution
+
+Have your Producers produce Futures to execute pipeline steps in parallel
+
+##External Processes
+
+Use the ExternalProcess class.
+
+##Cloud Storage
+
+S3 is supported out of the box via `Pipeline.saveToS3` factory method. Invoking this method
+in place of `Pipeline.saveToFileSystem` is all that is needed to store your pipeline data
+in the cloud.  A team of people running pipelines that save to the same directory within
+S3 will be able to re-use each others' intermediate data as if they had computed it themselves.
+
+##Alternate Storage
+
+To persist data to other storage systems, cloud-based or otherwise,
+implement an Artifact class and a corresponding ArtifactIo class.  You must also override `Pipeline.tryCreateArtifact`
+to construct Artifacts of your custom type from relative paths.  With these in place, any Producers
+persisted via `Pipeline.persist` will use your new storage mechanism.  Furthermore, you can change
+the storage mechanism for an entire pipeline globally in one line of code by changing the implementation of
+`Pipeline.tryCreateArtifact`.
+
+#Example Pipelines
+
+Use `sbt "test:run-main org.allenai.pipeline.examples.<pipeline>"` to run the examples.
+
+###CountWordsAndLinesPipeline
+A basic pipeline using the simples components and basic JSON serialization.
+
+###TrainModelPipeline
+A more complex workflow.  Demonstrates use of column-delimited serialization, multi-return-valued Producers,
+custom serialization, and streaming data.
+
+###TrainModelViaPythonPipeline
+Demonstrates use of external processes within a pipeline.
+
+#Summary
+
+To summarize, the benefits provided by AIP are:
+
+- Intermediate data is cached and is sharable by different users on different systems.
+- A record of past runs is maintained, with navigable links to all inputs/output of the pipeline.
+- A pipeline can be visualized before running.
+- Output resource naming is managed to eliminate naming collisions.
+- Input/output data is always compatible with the code reading the data.
 
 
 
diff --git a/build.sbt b/build.sbt
index a9791f5..e894104 100644
--- a/build.sbt
+++ b/build.sbt
@@ -2,11 +2,21 @@ import Dependencies._
 
 import ReleaseKeys._
 
-val pipeline = Project(
-  id = "allenai-pipeline",
-  base = file(".")
+val core = Project(
+  id = "core",
+  base = file("core")
 )
 
+val s3 = Project(
+  id = "s3",
+  base = file("s3")
+).dependsOn(core)
+
+val spark = Project(
+  id = "spark",
+  base = file("spark")
+).dependsOn(core, s3)
+
 organization := "org.allenai"
 crossScalaVersions := Seq("2.11.5")
 scalaVersion <<= crossScalaVersions { (vs: Seq[String]) => vs.head }
@@ -33,12 +43,3 @@ enablePlugins(LibraryPlugin)
 PublishTo.ai2Public
 
 dependencyOverrides += "org.scala-lang" % "scala-reflect" % "2.11.5"
-
-libraryDependencies ++= Seq(
-  sprayJson,
-  awsJavaSdk,
-  commonsIO,
-  ai2Common,
-  allenAiTestkit % "test",
-  scalaReflection
-)
diff --git a/core/build.sbt b/core/build.sbt
new file mode 100644
index 0000000..b818caa
--- /dev/null
+++ b/core/build.sbt
@@ -0,0 +1,15 @@
+import Dependencies._
+
+name := "pipeline-core"
+organization := "org.allenai"
+
+StylePlugin.enableLineLimit := false
+
+dependencyOverrides += "org.scala-lang" % "scala-reflect" % "2.11.5"
+libraryDependencies ++= Seq(
+  sprayJson,
+  commonsIO,
+  ai2Common,
+  allenAiTestkit % "test",
+  scalaReflection
+)
diff --git a/src/main/resources/org/allenai/pipeline/pipelineSummary.html b/core/src/main/resources/org/allenai/pipeline/pipelineSummary.html
similarity index 90%
rename from src/main/resources/org/allenai/pipeline/pipelineSummary.html
rename to core/src/main/resources/org/allenai/pipeline/pipelineSummary.html
index 025a837..83925cd 100644
--- a/src/main/resources/org/allenai/pipeline/pipelineSummary.html
+++ b/core/src/main/resources/org/allenai/pipeline/pipelineSummary.html
@@ -61,7 +61,12 @@
         padding-top: 10px;
       }
 
-      .node rect {
+        .node .data {
+          border-top: 1px solid #2a75a1;
+          padding-top: 10px;
+        }
+
+        .node rect {
         stroke: #2a75a1;
         stroke-width: 2px;
         fill: #e2f0f8;
@@ -275,7 +280,12 @@
         fill: #C38888;
       }
 
-      #outputContainer {
+        .executionInfo {
+         font-size: 20%%;
+         float: right;
+        }
+
+        #outputContainer {
         position: absolute;
         top: 41px;
         left: 41px;
@@ -313,7 +323,7 @@ <h2>Outputs</h2>
 %s
     </div>
     <script src="http://d3js.org/d3.v3.min.js"></script>
-    <script src="http://cpettitt.github.io/project/dagre-d3/v0.4.2/dagre-d3.min.js"></script>
+    <script src="http://cpettitt.github.io/project/dagre-d3/v0.4.6/dagre-d3.min.js"></script>
     <script>
       (function() {
         /**
@@ -337,42 +347,19 @@ <h2>Outputs</h2>
          *
          * @return {string}   The HTML for the step contents.
          */
-        function generateStepContent(label, description, millis, data, links) {
+        function generateStepContent(label, description, execInfo, data, links) {
           var out = "<h2>" + label + "</h2>";
-          if(millis) {
-            var seconds = 1000;
-            var minutes = seconds * 60;
-            var hours = minutes * 60;
-            var days = hours * 24;
-            if (millis / days > 1) {
-              out += (millis / days).toFixed(2) + " days";
-            }
-            else if (millis / hours > 1) {
-              out += (millis / hours).toFixed(2) + " hours";
-            }
-            else if (millis / minutes > 1) {
-              out += (millis / minutes).toFixed(2) + " minutes";
-            }
-            else if (millis / seconds > 1) {
-              out += (millis / seconds).toFixed(2) + " seconds";
-            }
-            else {
-              out += millis.toFixed(0) + " millis";
-            }
-          }
-          else {
-            out += "cached";
-          }
           if(description) {
             out += "<p>" + description + "</p>";
           }
-          if(data && Array.isArray(data)) {
+          if(data && Array.isArray(data) && data.length > 0) {
             out += '<ul class="data">';
             data.forEach(function(d) {
               out += '<li>' + d + '</li>';
             });
             out += '</ul>';
-            if(links) {
+          }
+          if(links && links.length > 0) {
               out += '<ul class="links">';
               links.forEach(function(l) {
                 if(l instanceof Link) {
@@ -381,7 +368,7 @@ <h2>Outputs</h2>
               });
               out += '</ul>';
             }
-          }
+          out += '<span class="executionInfo">' + execInfo + '</span>';
           return out;
         };
 
@@ -395,6 +382,12 @@ <h2>Outputs</h2>
         // our nodes.
 %s
 
+       // Round the corners of the nodes
+        g.nodes().forEach(function(v) {
+          var node = g.node(v);
+          node.rx = node.ry = 10;
+        });
+
         // Add edges to the graph. The first argument is the edge id. Here we use null
         // to indicate that an arbitrary edge id can be assigned automatically. The
         // second argument is the source of the edge. The third argument is the target
diff --git a/src/main/scala/org/allenai/pipeline/Artifact.scala b/core/src/main/scala/org/allenai/pipeline/Artifact.scala
similarity index 98%
rename from src/main/scala/org/allenai/pipeline/Artifact.scala
rename to core/src/main/scala/org/allenai/pipeline/Artifact.scala
index ebc3fee..86ec732 100644
--- a/src/main/scala/org/allenai/pipeline/Artifact.scala
+++ b/core/src/main/scala/org/allenai/pipeline/Artifact.scala
@@ -91,7 +91,7 @@ trait StructuredArtifact extends Artifact {
   */
 class ArtifactStreamWriter(out: OutputStream) {
   def write(data: Array[Byte]): Unit = {
-    out.write(data, 0, data.size)
+    out.write(data, 0, data.length)
   }
 
   def write(data: Array[Byte], offset: Int, size: Int): Unit = {
diff --git a/core/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala b/core/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala
new file mode 100644
index 0000000..018e990
--- /dev/null
+++ b/core/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala
@@ -0,0 +1,126 @@
+package org.allenai.pipeline
+
+import scala.reflect.ClassTag
+
+import java.io.File
+import java.net.URI
+
+/** Creates an Artifact from a URL
+  */
+trait ArtifactFactory {
+  /** @param url The location of the Artifact.  The scheme (protocol) is used to determine the
+    *            specific implementation.
+    * @tparam A The type of the Artifact to create.  May be an abstract or concrete type
+    * @return The artifact
+    */
+  def createArtifact[A <: Artifact: ClassTag](url: URI): A
+
+  /** If path is an absolute URL, create an Artifact at that location.
+    * If it is a relative path, create it relative to the given root URL
+    */
+  def createArtifact[A <: Artifact: ClassTag](rootUrl: URI, path: String): A = {
+    val parsed = new URI(path)
+    val url = parsed.getScheme match {
+      case null =>
+        val fullPath = s"${rootUrl.getPath.reverse.dropWhile(_ == '/').reverse}/${parsed.getPath.dropWhile(_ == '/')}"
+        new URI(
+          rootUrl.getScheme,
+          rootUrl.getHost,
+          fullPath,
+          rootUrl.getFragment
+        )
+      case _ => parsed
+    }
+    createArtifact[A](url)
+  }
+}
+
+object ArtifactFactory {
+  def apply(urlHandler: UrlToArtifact, fallbackUrlHandlers: UrlToArtifact*): ArtifactFactory =
+    new ArtifactFactory {
+      val urlHandlerChain =
+        if (fallbackUrlHandlers.isEmpty) {
+          urlHandler
+        } else {
+          UrlToArtifact.chain(urlHandler, fallbackUrlHandlers.head, fallbackUrlHandlers.tail: _*)
+        }
+
+      def createArtifact[A <: Artifact: ClassTag](url: URI): A = {
+        val fn = urlHandlerChain.urlToArtifact[A]
+        val clazz = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
+        require(fn.isDefinedAt(url), s"Cannot create $clazz from $url")
+        fn(url)
+      }
+    }
+
+}
+
+/** Supports creation of a particular type of Artifact from a URL.
+  * Allows chaining together of different implementations that recognize different input URLs
+  * and support creation of different Artifact types
+  */
+trait UrlToArtifact {
+  /** Return a PartialFunction indicating whether the given Artifact type can be created from an input URL
+    * @tparam A The Artifact type to be created
+    * @return A PartialFunction where isDefined will return true if an Artifact of type A can
+    *         be created from the given URL
+    */
+  def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A]
+}
+
+object UrlToArtifact {
+  // Chain together a series of UrlToArtifact instances
+  // The result will be a UrlToArtifact that supports creation of the union of Artifact types and input URLs
+  // that are supported by the individual inputs
+  def chain(first: UrlToArtifact, second: UrlToArtifact, others: UrlToArtifact*) =
+    new UrlToArtifact {
+      override def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = {
+        var fn = first.urlToArtifact[A] orElse second.urlToArtifact[A]
+        for (o <- others) {
+          fn = fn orElse o.urlToArtifact[A]
+        }
+        fn
+      }
+    }
+
+  object Empty extends UrlToArtifact {
+    def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] =
+      PartialFunction.empty[URI, A]
+  }
+
+}
+
+object CreateCoreArtifacts {
+  // Create a FlatArtifact or StructuredArtifact from an absolute file:// URL
+  val fromFileUrls: UrlToArtifact = new UrlToArtifact {
+    def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = {
+      val c = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
+      val fn: PartialFunction[URI, A] = {
+        case url if c.isAssignableFrom(classOf[FileArtifact])
+          && "file" == url.getScheme =>
+          new FileArtifact(new File(url)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[FileArtifact])
+          && null == url.getScheme =>
+          new FileArtifact(new File(url.getPath)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[DirectoryArtifact])
+          && "file" == url.getScheme
+          && new File(url).exists
+          && new File(url).isDirectory =>
+          new DirectoryArtifact(new File(url)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[DirectoryArtifact])
+          && null == url.getScheme
+          && new File(url.getPath).exists
+          && new File(url.getPath).isDirectory =>
+          new DirectoryArtifact(new File(url.getPath)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[ZipFileArtifact])
+          && "file" == url.getScheme =>
+          new ZipFileArtifact(new File(url)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[ZipFileArtifact])
+          && null == url.getScheme =>
+          new ZipFileArtifact(new File(url.getPath)).asInstanceOf[A]
+      }
+      fn
+    }
+  }
+}
+
diff --git a/src/main/scala/org/allenai/pipeline/ArtifactIo.scala b/core/src/main/scala/org/allenai/pipeline/ArtifactIo.scala
similarity index 77%
rename from src/main/scala/org/allenai/pipeline/ArtifactIo.scala
rename to core/src/main/scala/org/allenai/pipeline/ArtifactIo.scala
index fb4bb6c..7b8c6ae 100644
--- a/src/main/scala/org/allenai/pipeline/ArtifactIo.scala
+++ b/core/src/main/scala/org/allenai/pipeline/ArtifactIo.scala
@@ -9,23 +9,23 @@ import scala.io.{ Codec, Source }
 import scala.reflect.ClassTag
 
 trait ArtifactIo[T, -A <: Artifact]
-  extends SerializeToArtifact[T, A] with DeserializeFromArtifact[T, A]
+  extends Serializer[T, A] with Deserializer[T, A]
 
 /** Interface for defining how to persist a data type.
   *
   * @tparam  T  the type of the data being serialized
-  * @tparam  A  the type of the artifact being written (i.e. FileArtifact)
+  * @tparam  A  the type of the artifact being written (e.g. FileArtifact)
   */
-trait SerializeToArtifact[-T, -A <: Artifact] extends PipelineStep {
+trait Serializer[-T, -A <: Artifact] extends PipelineStep {
   def write(data: T, artifact: A): Unit
 }
 
 /** Interface for defining how to persist a data type.
   *
   * @tparam  T  the type of the data being serialized
-  * @tparam  A  the type of the artifact being read (i.e. FileArtifact)
+  * @tparam  A  the type of the artifact being read (e.g. FileArtifact)
   */
-trait DeserializeFromArtifact[+T, -A <: Artifact] extends PipelineStep {
+trait Deserializer[+T, -A <: Artifact] extends PipelineStep {
   def read(artifact: A): T
 }
 
@@ -54,11 +54,14 @@ class SingletonIo[T: StringSerializable: ClassTag](implicit codec: Codec)
     _.write(implicitly[StringSerializable[T]].toString(data))
   }
 
-  override def stepInfo: PipelineStepInfo =
+  override def stepInfo: PipelineStepInfo = {
+    val className = scala.reflect.classTag[T].runtimeClass.getSimpleName
     super.stepInfo.copy(
-      className = s"SingletonIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]",
-      parameters = Map("charSet" -> codec.charSet.toString)
+      className = s"ReadObject[$className]",
+      parameters = Map("charSet" -> codec.charSet.toString),
+      description = Some(s"Read [$className] into memory")
     )
+  }
 }
 
 object SingletonIo {
@@ -82,11 +85,14 @@ class LineCollectionIo[T: StringSerializable: ClassTag](implicit codec: Codec)
   override def write(data: Iterable[T], artifact: FlatArtifact): Unit =
     delegate.write(data.iterator, artifact)
 
-  override def stepInfo: PipelineStepInfo =
+  override def stepInfo: PipelineStepInfo = {
+    val className = scala.reflect.classTag[T].runtimeClass.getSimpleName
     super.stepInfo.copy(
-      className = s"LineCollectionIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]",
-      parameters = Map("charSet" -> codec.charSet.toString)
+      className = s"ReadCollection[$className]",
+      parameters = Map("charSet" -> codec.charSet.toString),
+      description = Some(s"Read collection of [$className] into memory")
     )
+  }
 
 }
 
@@ -124,12 +130,15 @@ class LineIteratorIo[T: StringSerializable: ClassTag](implicit codec: Codec)
     }
   }
 
-  override def stepInfo: PipelineStepInfo =
+  override def stepInfo: PipelineStepInfo = {
+    val className = scala.reflect.classTag[T].runtimeClass.getSimpleName
     super.stepInfo.copy(
       className =
-      s"LineIteratorIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]",
-      parameters = Map("charSet" -> codec.charSet.toString)
+      s"ReadIterator[$className]",
+      parameters = Map("charSet" -> codec.charSet.toString),
+      description = Some(s"Stream iterator of [$className]")
     )
+  }
 }
 
 object LineIteratorIo {
diff --git a/src/main/scala/org/allenai/pipeline/CodeInfo.scala b/core/src/main/scala/org/allenai/pipeline/CodeInfo.scala
similarity index 98%
rename from src/main/scala/org/allenai/pipeline/CodeInfo.scala
rename to core/src/main/scala/org/allenai/pipeline/CodeInfo.scala
index df4fa7f..5f05812 100644
--- a/src/main/scala/org/allenai/pipeline/CodeInfo.scala
+++ b/core/src/main/scala/org/allenai/pipeline/CodeInfo.scala
@@ -34,7 +34,7 @@ object Ai2CodeInfo {
         // We have to guess which remote will have the commit in it
         val useRemote = remotes.size match {
           // If there is only one remote, use it
-          case 1 => remotes(0)
+          case 1 => remotes.head
           // People shouldn't push directly to the upstream allenai repo.  Instead the upstream
           // repo gets updated via a pull request, which will have a different commit sha
           // Use the first non-allenai repo found in the list, which will typically be
diff --git a/src/main/scala/org/allenai/pipeline/ColumnFormats.scala b/core/src/main/scala/org/allenai/pipeline/ColumnFormats.scala
similarity index 97%
rename from src/main/scala/org/allenai/pipeline/ColumnFormats.scala
rename to core/src/main/scala/org/allenai/pipeline/ColumnFormats.scala
index 17a13ab..80a8aba 100644
--- a/src/main/scala/org/allenai/pipeline/ColumnFormats.scala
+++ b/core/src/main/scala/org/allenai/pipeline/ColumnFormats.scala
@@ -156,6 +156,12 @@ trait ColumnFormats {
     override def toString(param: Int): String = param.toString
   }
 
+  implicit object LongToString extends StringSerializable[Long] {
+    override def fromString(s: String): Long = s.toLong
+
+    override def toString(param: Long): String = param.toString
+  }
+
   implicit object DoubleToString extends StringSerializable[Double] {
     override def fromString(s: String): Double = s.toDouble
 
diff --git a/core/src/main/scala/org/allenai/pipeline/ExecutionInfo.scala b/core/src/main/scala/org/allenai/pipeline/ExecutionInfo.scala
new file mode 100644
index 0000000..c6048a3
--- /dev/null
+++ b/core/src/main/scala/org/allenai/pipeline/ExecutionInfo.scala
@@ -0,0 +1,51 @@
+package org.allenai.pipeline
+
+import scala.concurrent.duration.Duration
+
+/** Information on how a Producer was executed during a pipeline run */
+trait ExecutionInfo {
+  def status: String
+  def formatDuration(duration: Duration) = {
+    val seconds = 1000
+    val minutes = seconds * 60
+    val hours = minutes * 60
+    val days = hours * 24
+    duration.toMillis match {
+      case x if x / days > 1 =>
+        "%.2f days".format(x.toDouble / days)
+      case x if x / hours > 1 =>
+        "%.2f hours".format(x.toDouble / hours)
+      case x if x / minutes > 1 =>
+        "%.2f minutes".format(x.toDouble / minutes)
+      case x if x / seconds > 1 =>
+        "%.2f seconds".format(x.toDouble / seconds)
+      case x => "%d ms".format(x)
+    }
+  }
+}
+
+/** The Producer was not needed to run the pipeline.
+  * (All downstream steps were already persisted)
+  */
+case object NotRequested extends ExecutionInfo {
+  override def status = "Not requested"
+}
+
+/** The Producer was executed */
+case class Executed(duration: Duration) extends ExecutionInfo {
+  override def status = s"Computed in ${formatDuration(duration)}"
+}
+
+/* The output was read from previously-computed persisted result */
+case class ReadFromDisk(duration: Duration) extends ExecutionInfo {
+  override def status = s"Read in ${formatDuration(duration)}"
+}
+
+case class ExecutedAndPersisted(duration: Duration) extends ExecutionInfo {
+  override def status = s"Computed and stored in ${formatDuration(duration)}"
+}
+
+/** The output was an Iterator that was buffered to disk */
+case class ExecuteAndBufferStream(duration: Duration) extends ExecutionInfo {
+  override def status = s"Computed and buffered in ${formatDuration(duration)}"
+}
diff --git a/core/src/main/scala/org/allenai/pipeline/ExternalProcess.scala b/core/src/main/scala/org/allenai/pipeline/ExternalProcess.scala
new file mode 100644
index 0000000..404e7a3
--- /dev/null
+++ b/core/src/main/scala/org/allenai/pipeline/ExternalProcess.scala
@@ -0,0 +1,314 @@
+package org.allenai.pipeline
+
+import java.io.{ ByteArrayInputStream, File, FileWriter, InputStream }
+import java.nio.file.Files
+import java.util.UUID
+
+import org.allenai.common.Resource
+import org.allenai.pipeline.ExternalProcess._
+import org.apache.commons.io.{ FileUtils, IOUtils }
+
+import scala.collection.JavaConverters._
+
+/** Executes an arbitrary system process
+  * @param commandTokens   The set of tokens that comprise the command to be executed.
+  *                        Each token is either:
+  *                        a String
+  *                        a Placeholder representing an input data file
+  *                        a Placeholder representing an output data file
+  *                        Examples:
+  *                        StringToken("cp") InputFileToken("src") OutputFileToken("target")
+  *                        StringToken("python") InputFileToken("script") StringToken("-o") OutputFileToken("output")
+  *
+  *                        Producers based on ExternalProcess should be created with class RunExternalProcess.
+  */
+class ExternalProcess(val commandTokens: CommandToken*) {
+
+  def run(
+    inputs: Map[String, () => InputStream] = Map(),
+    stdinput: () => InputStream = () => new ByteArrayInputStream(Array.emptyByteArray)
+  ) = {
+    {
+      val inputNames = inputs.map(_._1).toSet
+      val inputTokenNames = commandTokens.collect { case InputFileToken(name) => name }.toSet
+      val unusedInputs = inputNames -- inputTokenNames
+      require(unusedInputs.size == 0, s"The following inputs are not used: [${unusedInputs.mkString(",")}}]")
+      val unboundTokens = inputTokenNames -- inputNames
+      require(unboundTokens.size == 0, s"The following input tokens were not found: [${unboundTokens.mkString(",")}}]")
+      val outputNames = commandTokens.collect { case OutputFileToken(name) => name }.toSet
+      require(inputNames.size == inputs.size, "Names of inputs must be unique")
+      require((inputNames ++ outputNames).size == inputs.size + outputNames.size, "Cannot share names between inputs and outputs")
+      require(((inputNames ++ outputNames) intersect Set("stderr", "stdout")).isEmpty, "Cannot use 'stderr' or 'stdout' for name")
+    }
+
+    val scratchDir = Files.createTempDirectory(null).toFile
+    sys.addShutdownHook(FileUtils.deleteDirectory(scratchDir))
+
+    for ((name, data) <- inputs) {
+      StreamIo.write(data, new FileArtifact(new File(scratchDir, name)))
+    }
+
+    import scala.sys.process._
+    val captureStdoutFile = new File(scratchDir, "stdout")
+    val captureStderrFile = new File(scratchDir, "stderr")
+    val out = new FileWriter(captureStdoutFile)
+    val err = new FileWriter(captureStderrFile)
+
+    val logger = ProcessLogger(
+      (o: String) => out.append(o),
+      (e: String) => err.append(e)
+    )
+
+    val cmd = commandTokens.map {
+      case InputFileToken(name) => new File(scratchDir, name).getCanonicalPath
+      case OutputFileToken(name) => new File(scratchDir, name).getCanonicalPath
+      case t => t.name
+    }
+    val status = (cmd #< stdinput()) ! logger
+    out.close()
+    err.close()
+
+    commandTokens.foreach {
+      case OutputFileToken(name) =>
+        val fOut = new File(scratchDir, name)
+        if (!fOut.exists()) {
+          val stCmd = cmd.mkString(" ")
+          throw new RuntimeException(
+            f"Script should have written an output file at:${fOut.getCanonicalPath}\n  command=$stCmd"
+          )
+        }
+      case _ =>
+    }
+
+    val outputNames = commandTokens.collect { case OutputFileToken(name) => name }
+
+    val outputStreams = for (name <- outputNames) yield {
+      (name, StreamIo.read(new FileArtifact(new File(scratchDir, name))))
+    }
+    val stdout = StreamIo.read(new FileArtifact(captureStdoutFile))
+    val stderr = StreamIo.read(new FileArtifact(captureStderrFile))
+
+    CommandOutput(status, stdout, stderr, outputStreams.toMap)
+  }
+
+}
+
+object ExternalProcess {
+
+  sealed trait CommandToken {
+    def name: String
+  }
+
+  case class StringToken(name: String) extends CommandToken
+
+  case class InputFileToken(name: String) extends CommandToken
+
+  case class OutputFileToken(name: String) extends CommandToken
+
+  import scala.language.implicitConversions
+
+  implicit def convertToInputData[T, A <: FlatArtifact](p: PersistedProducer[T, A]): Producer[() => InputStream] = {
+    p.copy(create =
+      () => {
+        p.get
+        StreamIo.read(p.artifact.asInstanceOf[FlatArtifact])
+      })
+  }
+
+  implicit def convertArtifactToInputData[A <: FlatArtifact](artifact: A): Producer[() => InputStream] = StaticResource(artifact)
+
+  implicit def convertToToken(s: String): StringToken = StringToken(s)
+
+  case class CommandOutput(
+    returnCode: Int,
+    stdout: () => InputStream,
+    stderr: () => InputStream,
+    outputs: Map[String, () => InputStream]
+  )
+}
+
+// Pattern: Name Producer subclasses with a verb.
+
+class RunExternalProcess private (
+    commandTokens: Seq[CommandToken],
+    _versionHistory: Seq[String],
+    inputs: Map[String, Producer[() => InputStream]]
+) extends Producer[CommandOutput] with Ai2SimpleStepInfo {
+  override def create = {
+    new ExternalProcess(commandTokens: _*).run(inputs.mapValues(_.get))
+  }
+
+  val parameters = inputs.map { case (name, src) => (name, src) }.toList
+
+  override def versionHistory = _versionHistory
+
+  override def stepInfo = {
+    val cmd = commandTokens.map {
+      case InputFileToken(name) => s"<$name>"
+      case OutputFileToken(name) => s"<$name>"
+      case t => t.name
+    }
+    super.stepInfo
+      .copy(className = "ExternalProcess")
+      .addParameters(parameters: _*)
+      .addParameters("cmd" -> cmd.mkString(" "))
+  }
+}
+object RunExternalProcess {
+
+  import ExternalProcess.CommandOutput
+
+  def apply(
+    commandTokens: CommandToken*
+  )(
+    inputs: Map[String, Producer[() => InputStream]] = Map(),
+    versionHistory: Seq[String] = Seq(),
+    requireStatusCode: Iterable[Int] = List(0)
+  ): CommandOutputComponents = {
+    val outputNames = commandTokens.collect { case OutputFileToken(name) => name }
+    val processCmd = new RunExternalProcess(commandTokens, versionHistory, inputs)
+    val baseName = processCmd.stepInfo.className
+    val stdout = new ExtractOutputComponent("stdout", _.stdout, processCmd, requireStatusCode)
+    val stderr = new ExtractOutputComponent("stderr", _.stderr, processCmd, requireStatusCode)
+    val outputStreams =
+      for (name <- outputNames) yield {
+        val outputProducer =
+          new ExtractOutputComponent(s"outputs.$name", _.outputs(name), processCmd, requireStatusCode)
+        (name, outputProducer)
+      }
+    CommandOutputComponents(stdout, stderr, outputStreams.toMap)
+  }
+
+  class ExtractOutputComponent(
+      name: String,
+      f: CommandOutput => () => InputStream,
+      processCmd: Producer[CommandOutput],
+      requireStatusCode: Iterable[Int] = List(0)
+  ) extends Producer[() => InputStream] {
+    override protected def create: () => InputStream = {
+      val result = processCmd.get
+      result match {
+        case CommandOutput(status, _, _, _) if requireStatusCode.toSet.contains(status) =>
+          f(result)
+        case CommandOutput(status, _, stderr, _) =>
+          val stderrString = IOUtils.readLines(stderr()).asScala.take(100)
+          sys.error(s"Command ${processCmd.stepInfo.parameters("cmd")} failed with status$status: $stderrString")
+      }
+      f(processCmd.get)
+    }
+
+    override def stepInfo: PipelineStepInfo =
+      PipelineStepInfo(className = name)
+        .addParameters("cmd" -> processCmd)
+
+    def ifSuccessful[T](f: CommandOutput => T): () => T = { () =>
+      val result = processCmd.get
+      result match {
+        case CommandOutput(0, _, _, _) => f(result)
+        case CommandOutput(_, _, stderr, _) =>
+          sys.error(s" Failed to run command ${processCmd.stepInfo.parameters("cmd")}: $stderr")
+      }
+
+    }
+  }
+}
+
+object StreamIo extends ArtifactIo[() => InputStream, FlatArtifact] {
+  override def read(artifact: FlatArtifact): () => InputStream =
+    () => artifact.read
+
+  override def write(data: () => InputStream, artifact: FlatArtifact): Unit = {
+    artifact.write { writer =>
+      val buffer = new Array[Byte](16384)
+      Resource.using(data()) { is =>
+        Iterator.continually(is.read(buffer)).takeWhile(_ != -1).foreach(n =>
+          writer.write(buffer, 0, n))
+      }
+    }
+  }
+
+  override def stepInfo: PipelineStepInfo = PipelineStepInfo(className = "SerializeDataStream")
+}
+
+/** Binary data that is assumed to never change.
+  * Appropriate for: data in which the URL uniquely determines the content
+  */
+object StaticResource {
+  def apply[A <: FlatArtifact](artifact: A): Producer[() => InputStream] =
+    new Producer[() => InputStream] with Ai2SimpleStepInfo {
+      override def create = StreamIo.read(artifact)
+
+      override def stepInfo =
+        super.stepInfo.copy(className = "StaticResource")
+          .copy(outputLocation = Some(artifact.url))
+          .addParameters("src" -> artifact.url)
+    }
+}
+
+object VersionedResource {
+  def apply[A <: FlatArtifact](artifact: A, version: String): Producer[() => InputStream] =
+    new Producer[() => InputStream] with Ai2SimpleStepInfo {
+      override def create = StreamIo.read(artifact)
+
+      override def stepInfo =
+        super.stepInfo.copy(
+          className = "VersionedResource",
+          classVersion = version
+        )
+          .copy(outputLocation = Some(artifact.url))
+          .addParameters("src" -> artifact.url)
+    }
+}
+
+/** Binary data that is allowed to change.
+  * A hash of the contents will be computed
+  * to determine whether to rerun downstream pipeline steps
+  * Appropriate for: scripts, local resources used during development
+  */
+object DynamicResource {
+  def apply(artifact: FlatArtifact): Producer[() => InputStream] =
+    new Producer[() => InputStream] with Ai2SimpleStepInfo {
+      lazy val contentHash = {
+        var hash = 0L
+        val buffer = new Array[Byte](16384)
+        Resource.using(artifact.read) { is =>
+          Iterator.continually(is.read(buffer)).takeWhile(_ != -1)
+            .foreach(n => buffer.take(n).foreach(b => hash = hash * 31 + b))
+        }
+        hash.toHexString
+      }
+
+      override def create = StreamIo.read(artifact)
+
+      override def stepInfo =
+        super.stepInfo.addParameters("contentHash" -> contentHash)
+          .copy(className = "DynamicResource")
+          .copy(outputLocation = Some(artifact.url))
+          .addParameters("src" -> artifact.url)
+
+    }
+}
+
+/** Binary data that is assumed to change every time it is accessed
+  * Appropriate for: non-deterministic queries
+  */
+object VolatileResource {
+  def apply(artifact: FlatArtifact): Producer[() => InputStream] =
+    new Producer[() => InputStream] with Ai2SimpleStepInfo {
+      override def create = StreamIo.read(artifact)
+
+      override def stepInfo =
+        super.stepInfo.addParameters("guid" -> UUID.randomUUID().toString)
+          .copy(className = "VolatileResource")
+          .copy(outputLocation = Some(artifact.url))
+          .addParameters("src" -> artifact.url)
+    }
+}
+
+// for RunExternalProcess
+case class CommandOutputComponents(
+  stdout: Producer[() => InputStream],
+  stderr: Producer[() => InputStream],
+  outputs: Map[String, Producer[() => InputStream]]
+)
+
diff --git a/src/main/scala/org/allenai/pipeline/FileArtifact.scala b/core/src/main/scala/org/allenai/pipeline/FileArtifact.scala
similarity index 93%
rename from src/main/scala/org/allenai/pipeline/FileArtifact.scala
rename to core/src/main/scala/org/allenai/pipeline/FileArtifact.scala
index c9607c8..eb16f79 100644
--- a/src/main/scala/org/allenai/pipeline/FileArtifact.scala
+++ b/core/src/main/scala/org/allenai/pipeline/FileArtifact.scala
@@ -10,8 +10,11 @@ import java.util.zip.{ ZipEntry, ZipFile, ZipOutputStream }
 
 /** Flat file.  */
 class FileArtifact(val file: File) extends FlatArtifact {
-  private val parentDir = file.getCanonicalFile.getParentFile
-  FileUtils.forceMkdir(parentDir)
+  private val parentDir = {
+    val f = file.getCanonicalFile.getParentFile
+    FileUtils.forceMkdir(f)
+    f
+  }
 
   override def exists: Boolean = file.exists
 
@@ -44,11 +47,14 @@ class DirectoryArtifact(val dir: File) extends StructuredArtifact {
 
   override def url: URI = dir.getCanonicalFile.toURI
 
-  private val parentDir = dir.getCanonicalFile.getParentFile
-  require(
-    (parentDir.exists && parentDir.isDirectory) || parentDir.mkdirs,
-    s"Unable to find or create directory $dir"
-  )
+  private val parentDir = {
+    val f = dir.getCanonicalFile.getParentFile
+    require(
+      (f.exists && f.isDirectory) || f.mkdirs,
+      s"Unable to find or create directory $dir"
+    )
+    f
+  }
 
   override def exists: Boolean = dir.exists && dir.isDirectory
 
diff --git a/src/main/scala/org/allenai/pipeline/IoHelpers.scala b/core/src/main/scala/org/allenai/pipeline/IoHelpers.scala
similarity index 75%
rename from src/main/scala/org/allenai/pipeline/IoHelpers.scala
rename to core/src/main/scala/org/allenai/pipeline/IoHelpers.scala
index 7830dbc..18e6f88 100644
--- a/src/main/scala/org/allenai/pipeline/IoHelpers.scala
+++ b/core/src/main/scala/org/allenai/pipeline/IoHelpers.scala
@@ -1,5 +1,8 @@
 package org.allenai.pipeline
 
+import java.io.File
+import java.net.URI
+
 import spray.json._
 
 import scala.reflect.ClassTag
@@ -10,10 +13,10 @@ object IoHelpers extends ColumnFormats {
   object Read {
     /** General deserialization method. */
     def fromArtifact[T, A <: Artifact](
-      reader: DeserializeFromArtifact[T, A],
+      reader: Deserializer[T, A],
       artifact: A
     ): Producer[T] =
-      new ReadFromArtifact(reader, artifact)
+      new ReadFromArtifact(reader, artifact.asInstanceOf[A])
 
     /** Read single object from flat file */
     object Singleton {
@@ -43,7 +46,7 @@ object IoHelpers extends ColumnFormats {
     }
 
     /** Read a collection of arrays of a single type from a flat file. */
-    object ArrayCollection {
+    object CollectionOfArrays {
       def fromText[T: StringSerializable: ClassTag](
         artifact: FlatArtifact,
         sep: Char = '\t'
@@ -58,7 +61,7 @@ object IoHelpers extends ColumnFormats {
     }
 
     /** Read an iterator of arrays of a single type from a flat file. */
-    object ArrayIterator {
+    object IteratorOfArrays {
       def fromText[T: StringSerializable: ClassTag](
         artifact: FlatArtifact,
         sep: Char = '\t'
@@ -71,10 +74,20 @@ object IoHelpers extends ColumnFormats {
         fromArtifact(io, artifact)
       }
     }
+  }
+  import scala.language.implicitConversions
 
+  implicit def asFileArtifact(f: File) = new FileArtifact(f)
+  implicit def asStructuredArtifact(f: File): StructuredArtifact = f match {
+    case f if f.exists && f.isDirectory => new DirectoryArtifact(f)
+    case _ => new ZipFileArtifact(f)
   }
+  implicit def asFlatArtifact(url: URI) =
+    CreateCoreArtifacts.fromFileUrls.urlToArtifact[FlatArtifact].apply(url)
+  implicit def asStructuredArtifact(url: URI) =
+    CreateCoreArtifacts.fromFileUrls.urlToArtifact[StructuredArtifact].apply(url)
 
-  def asStringSerializable[T](jsonFormat: JsonFormat[T]): StringSerializable[T] =
+  implicit def asStringSerializable[T](jsonFormat: JsonFormat[T]): StringSerializable[T] =
     new StringSerializable[T] {
       override def fromString(s: String): T = jsonFormat.read(s.parseJson)
 
diff --git a/src/main/scala/org/allenai/pipeline/MavenVersionId.scala b/core/src/main/scala/org/allenai/pipeline/MavenVersionId.scala
similarity index 100%
rename from src/main/scala/org/allenai/pipeline/MavenVersionId.scala
rename to core/src/main/scala/org/allenai/pipeline/MavenVersionId.scala
diff --git a/core/src/main/scala/org/allenai/pipeline/Pipeline.scala b/core/src/main/scala/org/allenai/pipeline/Pipeline.scala
new file mode 100644
index 0000000..ec47d82
--- /dev/null
+++ b/core/src/main/scala/org/allenai/pipeline/Pipeline.scala
@@ -0,0 +1,364 @@
+package org.allenai.pipeline
+
+import java.io.File
+import java.net.URI
+import java.text.SimpleDateFormat
+import java.util.Date
+
+import com.typesafe.config.Config
+import org.allenai.common.Config._
+import org.allenai.common.Logging
+import org.allenai.pipeline.IoHelpers._
+import spray.json.DefaultJsonProtocol._
+import spray.json.JsonFormat
+
+import scala.reflect.ClassTag
+import scala.util.control.NonFatal
+
+/** A top-level data flow pipeline.
+  * Provides methods for persisting Producers in a consistent location,
+  * running Producers,
+  * and producing a visualization of the end-to-end data flow
+  */
+trait Pipeline extends Logging {
+
+  def rootOutputUrl: URI
+
+  /** Run the pipeline.  All steps that have been persisted will be computed, along with any upstream dependencies */
+  def run(title: String) = {
+    runPipelineReturnResults(title, persistedSteps)
+  }
+
+  def persistedSteps = steps.toMap
+
+  /** Common-case methods for persisting Producers */
+  object Persist {
+
+    /** Persist a collection */
+    object Collection {
+      def asText[T: StringSerializable: ClassTag](
+        step: Producer[Iterable[T]],
+        stepName: String = null,
+        suffix: String = ".txt"
+      )(): PersistedProducer[Iterable[T], FlatArtifact] =
+        persist(step, LineCollectionIo.text[T], stepName, suffix)
+
+      def asJson[T: JsonFormat: ClassTag](
+        step: Producer[Iterable[T]],
+        stepName: String = null,
+        suffix: String = ".json"
+      )(): PersistedProducer[Iterable[T], FlatArtifact] =
+        persist(step, LineCollectionIo.json[T], stepName, suffix)
+    }
+
+    /** Persist a single object */
+    object Singleton {
+      def asText[T: StringSerializable: ClassTag](
+        step: Producer[T],
+        stepName: String = null,
+        suffix: String = ".txt"
+      )(): PersistedProducer[T, FlatArtifact] =
+        persist(step, SingletonIo.text[T], stepName, suffix)
+
+      def asJson[T: JsonFormat: ClassTag](
+        step: Producer[T],
+        stepName: String = null,
+        suffix: String = ".json"
+      )(): PersistedProducer[T, FlatArtifact] =
+        persist(step, SingletonIo.json[T], stepName, suffix)
+    }
+
+    /** Persist an Iterator */
+    object Iterator {
+      def asText[T: StringSerializable: ClassTag](
+        step: Producer[Iterator[T]],
+        stepName: String = null,
+        suffix: String = ".txt"
+      ): PersistedProducer[Iterator[T], FlatArtifact] =
+        persist(step, LineIteratorIo.text[T], stepName, suffix)
+
+      def asJson[T: JsonFormat: ClassTag](
+        step: Producer[Iterator[T]],
+        stepName: String = null,
+        suffix: String = ".json"
+      )(): PersistedProducer[Iterator[T], FlatArtifact] =
+        persist(step, LineIteratorIo.json[T], stepName, suffix)
+    }
+
+  }
+
+  /** Create a persisted version of the given Producer
+    * The producer is registered as a target of the pipeline, and will be computed
+    * when the pipeline is run.
+    * See Persist.Collection, Persist.Singleton, etc. utility methods above
+    * @param original the non-persisted Producer
+    * @param io the serialization format
+    * @param suffix a file suffix
+    * @return the persisted Producer
+    */
+  def persist[T, A <: Artifact: ClassTag](
+    original: Producer[T],
+    io: Serializer[T, A] with Deserializer[T, A],
+    name: String = null,
+    suffix: String = ""
+  ): PersistedProducer[T, A] = {
+    val stepName = Option(name).getOrElse(original.stepInfo.className)
+    val path = s"data/$stepName.${hashId(original, io)}$suffix"
+    persistToArtifact(original, io, createOutputArtifact[A](path), name)
+  }
+
+  def persistToUrl[T, A <: Artifact: ClassTag](
+    original: Producer[T],
+    io: Serializer[T, A] with Deserializer[T, A],
+    url: URI,
+    name: String = null
+  ): PersistedProducer[T, A] = {
+    persistToArtifact(original, io, artifactFactory.createArtifact[A](url), name)
+  }
+
+  /** Persist this Producer and add it to list of targets that will be computed when the pipeline is run */
+  def persistToArtifact[T, A <: Artifact](
+    original: Producer[T],
+    io: Serializer[T, A] with Deserializer[T, A],
+    artifact: A,
+    name: String = null
+  ): PersistedProducer[T, A] = {
+    val persisted = new ProducerWithPersistence(original, io, artifact)
+    val stepName = Option(name).getOrElse(original.stepInfo.className)
+    addTarget(stepName, persisted)
+  }
+
+  def persistCustom[T, P <: Producer[T], A <: Artifact: ClassTag](
+    original: P,
+    makePersisted: (P, A) => PersistedProducer[T, A],
+    name: String = null,
+    suffix: String = ""
+  ): PersistedProducer[T, A] = {
+    val stepName = Option(name).getOrElse(original.stepInfo.className)
+    val path = s"data/$stepName.${original.stepInfo.signature.id}$suffix"
+    val artifact = createOutputArtifact[A](path)
+    addTarget(stepName, makePersisted(original, artifact))
+  }
+
+  def addTarget[T, A <: Artifact](name: String, target: PersistedProducer[T, A]) = {
+    var i = 1
+    var stepName = name
+    while (steps.contains(stepName)) {
+      stepName = s"$name.$i"
+      i += 1
+    }
+    steps(stepName) = target
+    target
+  }
+
+  protected[this] def urlToArtifact = CreateCoreArtifacts.fromFileUrls
+
+  def artifactFactory = ArtifactFactory(urlToArtifact)
+
+  /** Create an Artifact at the given path, relative to the rootOutputUrl
+    * (path may also be an absolute URL)
+    */
+  def createOutputArtifact[A <: Artifact: ClassTag](path: String): A =
+    artifactFactory.createArtifact[A](rootOutputUrl, path)
+
+  def getStepsByName(targetNames: Iterable[String]) = {
+    val targets = targetNames.flatMap(s => steps.get(s).map(p => (s, p)))
+    if (targets.size != targetNames.size) {
+      val unresolveNames = targetNames.filterNot(steps.contains)
+      sys.error(s"Step names not found: ${unresolveNames.mkString("[", ",", "]")}")
+    }
+    targets
+  }
+
+  /** Run only specified steps in the pipeline.  Upstream dependencies must exist already.  They will not be computed */
+  def runOnly(title: String, targetNames: String*): Iterable[(String, Any)] = {
+    runOnly(title, getStepsByName(targetNames))
+  }
+
+  def runOnly(title: String, targets: Iterable[(String, Producer[_])]): Iterable[(String, Any)] = {
+    val targetStepInfo = targets.map(_._2.stepInfo).toSet
+    val allDependencies = targets.flatMap { case (s, p) => Workflow.upstreamDependencies(p) }
+    val nonExistentDependencies =
+      for {
+        p <- allDependencies if p.isInstanceOf[PersistedProducer[_, _]]
+        pp = p.asInstanceOf[PersistedProducer[_, _ <: Artifact]]
+        if !targetStepInfo(pp.stepInfo)
+        if !pp.artifact.exists
+      } yield pp.stepInfo
+    require(nonExistentDependencies.isEmpty, {
+      val targetNames = targetStepInfo.map(_.className).mkString(",")
+      val dependencyNames = nonExistentDependencies.map(_.className).mkString(",")
+      s"Cannot run steps [$targetNames]. Upstream dependencies [$dependencyNames] have not been computed"
+    })
+    runPipelineReturnResults(title, targets)
+  }
+
+  protected[this] def runPipelineReturnResults(rawTitle: String, targets: Iterable[(String, Producer[_])]): Iterable[(String, Any)] = {
+    // Order the outputs so that the ones with the fewest dependencies are executed first
+    val outputs = targets.toVector.map {
+      case (name, p) => (Workflow.upstreamDependencies(p).size, name, p)
+    }.sortBy(_._1).map { case (count, name, p) => (name, p) }
+    val result: Seq[(String, Any)] = try {
+      val start = System.currentTimeMillis
+      val result = outputs.map { case (name, p) => (name, p.get) }
+      val duration = (System.currentTimeMillis - start) / 1000.0
+      logger.info(f"Ran pipeline in $duration%.3f s")
+      result
+    } catch {
+      case NonFatal(e) =>
+        logger.error("Untrapped exception", e)
+        List()
+    }
+
+    val title = rawTitle.replaceAll("""\s+""", "-")
+    val today = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss").format(new Date())
+
+    val workflowArtifact = createOutputArtifact[FlatArtifact](s"summary/$title-$today.workflow.json")
+    val workflow = Workflow.forPipeline(targets.toMap)
+    SingletonIo.json[Workflow].write(workflow, workflowArtifact)
+
+    val htmlArtifact = createOutputArtifact[FlatArtifact](s"summary/$title-$today.html")
+    SingletonIo.text[String].write(workflow.renderHtml, htmlArtifact)
+
+    val signatureArtifact = createOutputArtifact[FlatArtifact](s"summary/$title-$today.signatures.json")
+    val signatureFormat = Signature.jsonWriter
+    val signatures = targets.map { case (s, p) => signatureFormat.write(p.stepInfo.signature) }.toList.toJson
+    signatureArtifact.write { writer => writer.write(signatures.prettyPrint) }
+
+    logger.info(s"Summary written to ${toHttpUrl(htmlArtifact.url)}")
+    result
+  }
+
+  // Generate a hash unique to this Producer/Serialization combination
+  protected def hashId[T, A <: Artifact](
+    p: Producer[T],
+    io: Serializer[T, A] with Deserializer[T, A]
+  ) =
+    p.stepInfo.copy(
+      dependencies = p.stepInfo.dependencies + ("io" -> io)
+    ).signature.id
+
+  // Convert S3 URLs to an http: URL viewable in a browser
+  def toHttpUrl(url: URI): URI = {
+    url.getScheme match {
+      case "s3" | "s3n" =>
+        new java.net.URI("http", s"${
+          url.getHost
+        }.s3.amazonaws.com", url.getPath, null)
+      case "file" =>
+        new java.net.URI(null, null, url.getPath, null)
+      case _ => url
+    }
+  }
+
+  def dryRun(outputDir: File, rawTitle: String, targets: Iterable[(String, Producer[_])] = persistedSteps.toList): Unit = {
+    val title = s"${
+      rawTitle.replaceAll("""\s+""", "-")
+    }-dryRun"
+    val workflowArtifact = new FileArtifact(new File(outputDir, s"$title.workflow.json"))
+    val workflow = Workflow.forPipeline(targets)
+    SingletonIo.json[Workflow].write(workflow, workflowArtifact)
+
+    val htmlArtifact = new FileArtifact(new File(outputDir, s"$title.html"))
+    SingletonIo.text[String].write(workflow.renderHtml, htmlArtifact)
+
+    val signatureArtifact = new FileArtifact(new File(outputDir, s"$title.signatures.json"))
+    val signatureFormat = Signature.jsonWriter
+    val signatures = targets.map { case (s, p) => signatureFormat.write(p.stepInfo.signature) }.toJson
+    signatureArtifact.write {
+      writer => writer.write(signatures.prettyPrint)
+    }
+
+    logger.info(s"Summary written to $outputDir")
+  }
+
+  protected[this] val steps =
+    scala.collection.mutable.Map.empty[String, PersistedProducer[_, _ <: Artifact]]
+}
+
+object Pipeline {
+  // Create a Pipeline that writes output to the given directory
+  def apply(rootDir: File) =
+    new Pipeline {
+      def rootOutputUrl = rootDir.toURI
+    }
+
+  def configured(cfg: Config) =
+    new ConfiguredPipeline {
+      val config = cfg
+    }
+}
+
+trait ConfiguredPipeline extends Pipeline {
+  val config: Config
+
+  override def rootOutputUrl =
+    config.get[String]("output.dir").map(s => new URI(s))
+      .getOrElse(new File(System.getProperty("user.dir")).toURI)
+
+  protected[this] def getStringList(key: String) =
+    config.get[String](key) match {
+      case Some(s) => List(s)
+      case None => config.get[Seq[String]](key) match {
+        case Some(sList) => sList
+        case None => List()
+      }
+    }
+
+  override def run(rawTitle: String) = {
+    val (targets, isRunOnly) =
+      getStringList("runOnly") match {
+        case seq if seq.nonEmpty =>
+          (getStepsByName(seq), true)
+        case _ =>
+          getStringList("runUntil") match {
+            case seq if seq.nonEmpty =>
+              (getStepsByName(seq), false)
+            case _ =>
+              (persistedSteps.toList, false)
+          }
+      }
+
+    config.get[Boolean]("dryRun") match {
+      case Some(true) =>
+        val outputDir = config.get[String]("dryRunOutput")
+          .getOrElse(System.getProperty("user.dir"))
+        dryRun(new File(outputDir), rawTitle, targets)
+        List()
+      case _ =>
+        if (isRunOnly) {
+          runOnly(rawTitle, targets)
+        } else {
+          runPipelineReturnResults(rawTitle, targets)
+        }
+    }
+  }
+
+  override def persist[T, A <: Artifact: ClassTag](
+    original: Producer[T],
+    io: Serializer[T, A] with Deserializer[T, A],
+    name: String = null,
+    suffix: String = ""
+  ): PersistedProducer[T, A] = {
+    val stepName = Option(name).getOrElse(original.stepInfo.className)
+    val configKey = s"output.persist.$stepName"
+    if (config.hasPath(configKey)) {
+      config.getValue(configKey).unwrapped() match {
+        case path: String if path != "false" =>
+          super.persistToArtifact(original, io, createOutputArtifact[A](path), name)
+        case java.lang.Boolean.FALSE | "false" =>
+          // Disable persistence
+          new ProducerWithPersistence(original, io, createOutputArtifact[A]("save-disabled")) {
+            override def create = original.get
+
+            override def stepInfo = original.stepInfo
+          }
+        case _ =>
+          super.persist(original, io, name, suffix)
+      }
+    } else {
+      super.persist(original, io, name, suffix)
+    }
+  }
+}
+
diff --git a/src/main/scala/org/allenai/pipeline/PipelineStep.scala b/core/src/main/scala/org/allenai/pipeline/PipelineStep.scala
similarity index 99%
rename from src/main/scala/org/allenai/pipeline/PipelineStep.scala
rename to core/src/main/scala/org/allenai/pipeline/PipelineStep.scala
index ca13ec4..6435cdb 100644
--- a/src/main/scala/org/allenai/pipeline/PipelineStep.scala
+++ b/core/src/main/scala/org/allenai/pipeline/PipelineStep.scala
@@ -63,6 +63,7 @@ case class PipelineStepInfo(
       case (id, Some(step: PipelineStep)) =>
         pipelineSteps += ((id, step))
       case (id, None) => // no-op: skip None
+      case (id, Some(x)) => otherPars += ((id, x))
       case x => otherPars += x
     }
     copy(
diff --git a/src/main/scala/org/allenai/pipeline/Producer.scala b/core/src/main/scala/org/allenai/pipeline/Producer.scala
similarity index 78%
rename from src/main/scala/org/allenai/pipeline/Producer.scala
rename to core/src/main/scala/org/allenai/pipeline/Producer.scala
index 0f45751..b405f96 100644
--- a/src/main/scala/org/allenai/pipeline/Producer.scala
+++ b/core/src/main/scala/org/allenai/pipeline/Producer.scala
@@ -49,35 +49,32 @@ trait Producer[T] extends PipelineStep with CachingEnabled with Logging {
   }
 
   private var initialized = false
-  private var timing: Option[Duration] = None
+  protected[this] var timing: Option[Duration] = None
+  protected[this] var executionMode: Duration => ExecutionInfo = Executed
   private lazy val cachedValue: T = createAndTime
 
-  /** Report the amount of time taken in milliseconds, or None if the value is cached
-    * in memory or this stage has not been run yet.
+  /** Report the method by which this Producer's result was obtained
+    * (Read from disk, executed, not needed)
     */
-  def timeTaken: Option[Duration] = timing
+  def executionInfo: ExecutionInfo =
+    timing.map(executionMode).getOrElse(NotRequested)
 
   /** Call `create` but store time taken. */
-  private def createAndTime: T = {
+  protected[this] def createAndTime: T = {
     val (result, duration) = Timing.time(this.create)
     timing = Some(duration)
     result
   }
 
-  /** Persist the result of this step.
-    * Once computed, write the result to the given artifact.
-    * If the artifact we are using for persistence exists,
-    * return the deserialized object rather than recomputing it.
-    *
-    * @tparam  A  the type of artifact being written to (i.e. directory, file)
-    * @param  io  the serialization for data of type T
-    * @param  artifactSource  creation of the artifact to be written
-    */
+  // It doesn't really make sense for a Producer class to control how it's persisted,
+  // because it might depend on the context of a pipeline
+  // Prefer using the Pipeline.persist(...) methods instead
+  @Deprecated
   def persisted[A <: Artifact](
-    io: SerializeToArtifact[T, A] with DeserializeFromArtifact[T, A],
+    io: Serializer[T, A] with Deserializer[T, A],
     artifactSource: => A
   ): PersistedProducer[T, A] =
-    new PersistedProducer(this, io, artifactSource)
+    new ProducerWithPersistence(this, io, artifactSource)
 
   /** Default caching policy is set by the implementing class but can be overridden dynamically.
     *
@@ -104,17 +101,21 @@ trait Producer[T] extends PipelineStep with CachingEnabled with Logging {
   def copy[T2](
     create: () => T2 = self.create _,
     stepInfo: () => PipelineStepInfo = self.stepInfo _,
-    cachingEnabled: () => Boolean = self.cachingEnabled _
+    cachingEnabled: () => Boolean = self.cachingEnabled _,
+    executionInfo: () => ExecutionInfo = self.executionInfo _
   ): Producer[T2] = {
     val _create = create
     val _stepInfo = stepInfo
     val _cachingEnabled = cachingEnabled
+    val _executionInfo = executionInfo
     new Producer[T2] {
       override def create: T2 = _create()
 
       override def stepInfo = _stepInfo()
 
       override def cachingEnabled = _cachingEnabled()
+
+      override def executionInfo = _executionInfo()
     }
   }
 }
@@ -140,38 +141,52 @@ trait CachingDisabled extends CachingEnabled {
   override def cachingEnabled: Boolean = false
 }
 
-class PersistedProducer[T, -A <: Artifact](
-    step: Producer[T],
-    io: SerializeToArtifact[T, A] with DeserializeFromArtifact[T, A],
-    _artifact: A
-) extends Producer[T] {
-  self =>
+/** A Producer that will be stored in a specified Artifact
+  * using the specified serialization logic
+  */
+trait PersistedProducer[T, A <: Artifact] extends Producer[T] {
+  def original: Producer[T]
+  def io: Serializer[T, A] with Deserializer[T, A]
+  def artifact: A
+}
 
-  def artifact: Artifact = _artifact
+/** Implements persistence of a Producer.
+  * If the artifact exists when create() is called, reads from the artifact.
+  * If the artifact does not exist, compute the result, store it in the artifact and return the result
+  */
+class ProducerWithPersistence[T, A <: Artifact](
+    val original: Producer[T],
+    val io: Serializer[T, A] with Deserializer[T, A],
+    val artifact: A
+) extends PersistedProducer[T, A] {
+  self =>
 
   def create: T = {
     val className = stepInfo.className
     if (!artifact.exists) {
-      val result = step.get
+      val result = original.get
+      executionMode = ExecutedAndPersisted
       logger.debug(s"$className writing to $artifact using $io")
-      io.write(result, _artifact)
+      io.write(result, artifact)
       if (result.isInstanceOf[Iterator[_]]) {
+        executionMode = ExecuteAndBufferStream
         logger.debug(s"$className reading type Iterator from $artifact using $io")
-        io.read(_artifact)
+        io.read(artifact)
       } else {
         result
       }
     } else {
+      executionMode = ReadFromDisk
       logger.debug(s"$className reading from $artifact using $io")
-      io.read(_artifact)
+      io.read(artifact)
     }
   }
 
-  override def stepInfo = step.stepInfo.copy(outputLocation = Some(artifact.url))
+  override def stepInfo = original.stepInfo.copy(outputLocation = Some(artifact.url))
 
   override def withCachingDisabled = {
     if (cachingEnabled) {
-      new PersistedProducer(step, io, _artifact) with CachingDisabled
+      new ProducerWithPersistence(original, io, artifact) with CachingDisabled
     } else {
       this
     }
@@ -181,7 +196,7 @@ class PersistedProducer[T, -A <: Artifact](
     if (cachingEnabled) {
       this
     } else {
-      new PersistedProducer(step, io, _artifact) with CachingEnabled
+      new ProducerWithPersistence(original, io, artifact) with CachingEnabled
     }
   }
 }
diff --git a/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala b/core/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala
similarity index 68%
rename from src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala
rename to core/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala
index 0110e35..a502553 100644
--- a/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala
+++ b/core/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala
@@ -1,17 +1,18 @@
 package org.allenai.pipeline
 
 case class ReadFromArtifact[T, A <: Artifact](
-    reader: DeserializeFromArtifact[T, A],
-    val artifact: A
+    reader: Deserializer[T, A],
+    artifact: A
 ) extends Producer[T] {
   def create: T = {
+    executionMode = ReadFromDisk
     require(artifact.exists, s"$artifact does not exist")
     reader.read(artifact)
   }
 
   override def stepInfo =
     reader.stepInfo.copy(
-      parameters = reader.stepInfo.parameters + ("src" -> artifact.url.toString),
       outputLocation = Some(artifact.url)
     )
+      .addParameters("src" -> artifact.url)
 }
diff --git a/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala b/core/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala
similarity index 72%
rename from src/main/scala/org/allenai/pipeline/SaveToArtifact.scala
rename to core/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala
index c776fe2..441f73a 100644
--- a/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala
+++ b/core/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala
@@ -1,9 +1,9 @@
 package org.allenai.pipeline
 
 case class SaveToArtifact[T, A <: Artifact](
-    val input: Producer[T],
-    val writer: SerializeToArtifact[T, A],
-    val artifact: A
+    input: Producer[T],
+    writer: Serializer[T, A],
+    artifact: A
 ) extends Producer[A] with BasicPipelineStepInfo {
   def create: A = {
     if (!artifact.exists) {
diff --git a/src/main/scala/org/allenai/pipeline/Signature.scala b/core/src/main/scala/org/allenai/pipeline/Signature.scala
similarity index 100%
rename from src/main/scala/org/allenai/pipeline/Signature.scala
rename to core/src/main/scala/org/allenai/pipeline/Signature.scala
diff --git a/src/main/scala/org/allenai/pipeline/StreamClosingIterator.scala b/core/src/main/scala/org/allenai/pipeline/StreamClosingIterator.scala
similarity index 100%
rename from src/main/scala/org/allenai/pipeline/StreamClosingIterator.scala
rename to core/src/main/scala/org/allenai/pipeline/StreamClosingIterator.scala
diff --git a/src/main/scala/org/allenai/pipeline/Workflow.scala b/core/src/main/scala/org/allenai/pipeline/Workflow.scala
similarity index 63%
rename from src/main/scala/org/allenai/pipeline/Workflow.scala
rename to core/src/main/scala/org/allenai/pipeline/Workflow.scala
index 43fb812..306efc6 100644
--- a/src/main/scala/org/allenai/pipeline/Workflow.scala
+++ b/core/src/main/scala/org/allenai/pipeline/Workflow.scala
@@ -1,14 +1,13 @@
 package org.allenai.pipeline
 
-import org.allenai.common.Resource
+import java.net.URI
 
+import org.allenai.common.Resource
 import spray.json.DefaultJsonProtocol._
-import spray.json.{ JsString, JsValue, JsonFormat }
+import spray.json._
 
 import scala.io.Source
 
-import java.net.URI
-
 /** DAG representation of the execution of a set of Producers.
   */
 case class Workflow(nodes: Map[String, Node], links: Iterable[Link]) {
@@ -34,44 +33,46 @@ case class Workflow(nodes: Map[String, Node], links: Iterable[Link]) {
     val errors = errorNodes()
     // Collect nodes with output paths to be displayed in the upper-left.
     val outputNodeLinks = for {
-      (id, info) <- nodes.toList.sortBy(_._2.className)
+      (id, info) <- nodes.toList.sortBy(_._2.stepName)
       path <- info.outputLocation
     } yield {
-      s"""<a href="$path">${info.className}</a>"""
+      s"""<a href="$path">${info.stepName}</a>"""
     }
     val addNodes =
       for ((id, info) <- nodes) yield {
         // Params show up as line items in the pipeline diagram node.
-        val paramsText = info.parameters.toList.map {
+        val params = info.parameters.toList.map {
           case (key, value) =>
-            s""""$key=${limitLength(value)}""""
-        }.mkString(",")
+            s"$key=${limitLength(value)}"
+        }
         // A link is like a param but it hyperlinks somewhere.
         val links =
           // An optional link to the source data.
-          info.srcUrl.map(uri => s"""new Link("${link(uri)}","v${if (info.classVersion.nonEmpty) info.classVersion else "src"}")""") ++ // scalastyle:ignore
+          info.srcUrl.map(uri => s"""new Link(${link(uri).toJson},${(if (info.classVersion.nonEmpty) info.classVersion else "src").toJson})""") ++ // scalastyle:ignore
             // An optional link to the output data.
-            info.outputLocation.map(uri => s"""new Link("${link(uri)}","output")""")
+            info.outputLocation.map(uri => s"""new Link(${link(uri).toJson},"output")""")
+        val linksJson = links.mkString("[", ",", "]")
         val clazz = sources match {
           case _ if errors contains id => "errorNode"
           case _ if sources contains id => "sourceNode"
           case _ if sinks contains id => "sinkNode"
           case _ => ""
         }
-        val linksText = links.mkString(",")
+        val name = info.stepName
+        val desc = info.description.getOrElse(if (name == info.className) "" else info.className)
         s"""        g.setNode("$id", {
-           |          class: "$clazz",
-           |          labelType: "html",
-           |          label: generateStepContent("${info.className}",
-           |            " ${info.description.getOrElse("")}",
-           |            ${info.timeTakenMillis.getOrElse("undefined")},
-           |            [$paramsText],
-           |            [$linksText])
-           |        });""".stripMargin
+                                    |       class: "$clazz",
+                                                            |       labelType: "html",
+                                                            |       label: generateStepContent(${name.toJson},
+                                                                                                               |         ${desc.toJson},
+                                                                                                                                         |         ${info.executionInfo.toJson},
+                                                                                                                                                                                 |         ${params.toJson},
+                                                                                                                                                                                                             |         ${linksJson})
+                                                                                                                                                                                                                                     |     });""".stripMargin
       }
     val addEdges =
       for (Link(from, to, name) <- links) yield {
-        s"""        g.setEdge("$from", "$to", {label: "$name"}); """
+        s"""        g.setEdge("$from", "$to", {lineInterpolate: 'basis', label: "$name"}); """
       }
 
     val resourceName = "pipelineSummary.html"
@@ -87,6 +88,7 @@ case class Workflow(nodes: Map[String, Node], links: Iterable[Link]) {
 
 /** Represents a PipelineStep without its dependencies */
 case class Node(
+  stepName: String,
   className: String,
   classVersion: String = "",
   srcUrl: Option[URI] = None,
@@ -95,22 +97,23 @@ case class Node(
   description: Option[String] = None,
   outputLocation: Option[URI] = None,
   outputMissing: Boolean = false,
-  timeTakenMillis: Option[Long] = None
+  executionInfo: String = ""
 )
 
 object Node {
-  def apply(step: PipelineStep): Node = {
+  def apply(stepName: String, step: PipelineStep): Node = {
     val stepInfo = step.stepInfo
     val outputMissing = step match {
       case persisted: PersistedProducer[_, _] =>
         !persisted.artifact.exists
       case _ => false
     }
-    val timeTaken = step match {
-      case producer: Producer[_] => producer.timeTaken map (_.toMillis)
-      case _ => None
+    val executionInfo = step match {
+      case producer: Producer[_] => producer.executionInfo.status
+      case _ => ""
     }
     Node(
+      stepName,
       stepInfo.className,
       stepInfo.classVersion,
       stepInfo.srcUrl,
@@ -119,7 +122,7 @@ object Node {
       stepInfo.description,
       stepInfo.outputLocation,
       outputMissing,
-      timeTaken
+      executionInfo
     )
   }
 }
@@ -128,7 +131,8 @@ object Node {
 case class Link(fromId: String, toId: String, name: String)
 
 object Workflow {
-  def forPipeline(steps: PipelineStep*): Workflow = {
+  def forPipeline(steps: Iterable[(String, PipelineStep)]): Workflow = {
+    val idToName = steps.map { case (k, v) => (v.stepInfo.signature.id, k) }.toMap
     def findNodes(s: PipelineStep): Iterable[PipelineStep] =
       Seq(s) ++ s.stepInfo.dependencies.flatMap {
         case (name, step) =>
@@ -136,10 +140,12 @@ object Workflow {
       }
 
     val nodeList = for {
-      step <- steps
+      (name, step) <- steps
       childStep <- findNodes(step)
     } yield {
-      (childStep.stepInfo.signature.id, Node.apply(childStep))
+      val id = childStep.stepInfo.signature.id
+      val childName = idToName.getOrElse(id, childStep.stepInfo.className)
+      (id, Node(childName, childStep))
     }
 
     def findLinks(s: PipelineStepInfo): Iterable[(PipelineStepInfo, PipelineStepInfo, String)] =
@@ -149,7 +155,7 @@ object Workflow {
     val nodes = nodeList.toMap
 
     val links = (for {
-      step <- steps
+      (stepName, step) <- steps
       (from, to, name) <- findLinks(step.stepInfo)
     } yield Link(from.signature.id, to.signature.id, name)).toSet
     Workflow(nodes, links)
@@ -171,7 +177,7 @@ object Workflow {
           case s => sys.error(s"Invalid URI: $s")
         }
       }
-      jsonFormat9(Node.apply)
+      jsonFormat10(Node.apply)
     }
     jsonFormat(Workflow.apply, "nodes", "links")
   }
@@ -180,20 +186,22 @@ object Workflow {
     case "s3" | "s3n" =>
       new java.net.URI("http", s"${uri.getHost}.s3.amazonaws.com", uri.getPath, null).toString
     case "file" =>
-      new java.net.URI(null, null, uri.getPath, null)
+      new java.net.URI(null, null, uri.getPath, null).toString
     case _ => uri.toString
   }
 
   private val DEFAULT_MAX_SIZE = 40
   private val LHS_MAX_SIZE = 15
 
-  private def limitLength(s: String, maxLength: Int = DEFAULT_MAX_SIZE) =
-    if (s.size < maxLength) {
+  private def limitLength(s: String, maxLength: Int = DEFAULT_MAX_SIZE) = {
+    val trimmed = if (s.size < maxLength) {
       s
     } else {
       val leftSize = math.min(LHS_MAX_SIZE, maxLength / 3)
       val rightSize = maxLength - leftSize
       s"${s.take(leftSize)}...${s.drop(s.size - rightSize)}"
     }
+    trimmed.replaceAll(">", "&gt;").replaceAll("<", "&lt;")
+  }
 
 }
diff --git a/src/test/resources/logback.xml b/core/src/test/resources/logback.xml
similarity index 100%
rename from src/test/resources/logback.xml
rename to core/src/test/resources/logback.xml
diff --git a/src/test/resources/pipeline/features.txt b/core/src/test/resources/pipeline/features.txt
similarity index 100%
rename from src/test/resources/pipeline/features.txt
rename to core/src/test/resources/pipeline/features.txt
diff --git a/src/test/resources/pipeline/labels.txt b/core/src/test/resources/pipeline/labels.txt
similarity index 100%
rename from src/test/resources/pipeline/labels.txt
rename to core/src/test/resources/pipeline/labels.txt
diff --git a/core/src/test/resources/pipeline/scoreModel.py b/core/src/test/resources/pipeline/scoreModel.py
new file mode 100644
index 0000000..0ad118e
--- /dev/null
+++ b/core/src/test/resources/pipeline/scoreModel.py
@@ -0,0 +1,5 @@
+import sys
+f = open(sys.argv[1],'w')
+f.write("Placeholder for model scoring output\n")
+f.close()
+
diff --git a/core/src/test/resources/pipeline/trainModel.py b/core/src/test/resources/pipeline/trainModel.py
new file mode 100644
index 0000000..7480410
--- /dev/null
+++ b/core/src/test/resources/pipeline/trainModel.py
@@ -0,0 +1,4 @@
+import sys
+f = open(sys.argv[1],'w')
+f.write("Placeholder for model training output\n")
+f.close()
diff --git a/src/test/resources/pipeline/xml/doc1.txt b/core/src/test/resources/pipeline/xml/doc1.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc1.txt
rename to core/src/test/resources/pipeline/xml/doc1.txt
diff --git a/src/test/resources/pipeline/xml/doc10.txt b/core/src/test/resources/pipeline/xml/doc10.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc10.txt
rename to core/src/test/resources/pipeline/xml/doc10.txt
diff --git a/src/test/resources/pipeline/xml/doc2.txt b/core/src/test/resources/pipeline/xml/doc2.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc2.txt
rename to core/src/test/resources/pipeline/xml/doc2.txt
diff --git a/src/test/resources/pipeline/xml/doc3.txt b/core/src/test/resources/pipeline/xml/doc3.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc3.txt
rename to core/src/test/resources/pipeline/xml/doc3.txt
diff --git a/src/test/resources/pipeline/xml/doc4.txt b/core/src/test/resources/pipeline/xml/doc4.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc4.txt
rename to core/src/test/resources/pipeline/xml/doc4.txt
diff --git a/src/test/resources/pipeline/xml/doc5.txt b/core/src/test/resources/pipeline/xml/doc5.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc5.txt
rename to core/src/test/resources/pipeline/xml/doc5.txt
diff --git a/src/test/resources/pipeline/xml/doc6.txt b/core/src/test/resources/pipeline/xml/doc6.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc6.txt
rename to core/src/test/resources/pipeline/xml/doc6.txt
diff --git a/src/test/resources/pipeline/xml/doc7.txt b/core/src/test/resources/pipeline/xml/doc7.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc7.txt
rename to core/src/test/resources/pipeline/xml/doc7.txt
diff --git a/src/test/resources/pipeline/xml/doc8.txt b/core/src/test/resources/pipeline/xml/doc8.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc8.txt
rename to core/src/test/resources/pipeline/xml/doc8.txt
diff --git a/src/test/resources/pipeline/xml/doc9.txt b/core/src/test/resources/pipeline/xml/doc9.txt
similarity index 100%
rename from src/test/resources/pipeline/xml/doc9.txt
rename to core/src/test/resources/pipeline/xml/doc9.txt
diff --git a/src/test/scala/org/allenai/pipeline/SamplePipeline.scala b/core/src/test/scala/org/allenai/pipeline/SamplePipeline.scala
similarity index 61%
rename from src/test/scala/org/allenai/pipeline/SamplePipeline.scala
rename to core/src/test/scala/org/allenai/pipeline/SamplePipeline.scala
index 8002e2f..4bb4a40 100644
--- a/src/test/scala/org/allenai/pipeline/SamplePipeline.scala
+++ b/core/src/test/scala/org/allenai/pipeline/SamplePipeline.scala
@@ -1,16 +1,15 @@
 package org.allenai.pipeline
 
+import java.io.{ File, InputStream }
+
 import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
 import org.allenai.pipeline.IoHelpers._
-
 import org.apache.commons.io.FileUtils
 import org.scalatest.{ BeforeAndAfterAll, BeforeAndAfterEach }
 import spray.json.DefaultJsonProtocol._
 
 import scala.util.Random
 
-import java.io.{ File, InputStream }
-
 /** Test Pipeline functionality */
 class SamplePipeline extends UnitSpec
     with BeforeAndAfterEach with BeforeAndAfterAll with ScratchDirectory {
@@ -19,27 +18,11 @@ class SamplePipeline extends UnitSpec
 
   val inputDir = new File("src/test/resources/pipeline")
   val outputDataDir = new File(scratchDir, "data")
-  val input = new RelativeFileSystem(inputDir)
   val featureFile = "features.txt"
   val labelFile = "labels.txt"
 
-  // Enable JSON serialization for our trained model object
-
-  import org.allenai.pipeline.IoHelpers._
-
-  implicit val modelFormat = TrainedModel.jsonFormat
-
-  implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
-    tuple3ColumnFormat[Double, Double, Double](',')
-
-  val pipeline = new Pipeline {
-    def artifactFactory = new RelativeFileSystem(scratchDir)
-  }
-
   "Sample Experiment" should "complete" in {
-    // TSV format for label+features is <label><tab><comma-separated feature values>
-    implicit val featureFormat = columnArrayFormat[Double](',')
-    implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+    val pipeline = Pipeline(scratchDir)
 
     val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
     val docs = Read.fromArtifact(ParseDocumentsFromXML, docDir)
@@ -47,12 +30,23 @@ class SamplePipeline extends UnitSpec
 
     // Define pipeline
     val labelData: Producer[Iterable[Boolean]] =
-      Read.Collection.fromText[Boolean](input.flatArtifact(labelFile))
+      Read.Collection.fromText[Boolean](new FileArtifact(new File(inputDir, labelFile)))
     val Producer2(trainData, testData) = new JoinAndSplitData(docFeatures, labelData, 0.2) -> (("train", "test"))
-    val trainDataPersisted = pipeline.Persist.Collection.asText(trainData, None, ".txt")
-    val model = pipeline.Persist.Singleton.asJson(new TrainModel(trainDataPersisted), None, ".json")
-    val measure: Producer[PRMeasurement] =
-      pipeline.Persist.Collection.asText(new MeasureModel(model, testData), None, ".txt")
+    val trainDataPersisted = {
+      // TSV format for label+features is <label><tab><comma-separated feature values>
+      implicit val featureFormat = columnArrayFormat[Double](',')
+      implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+      pipeline.Persist.Collection.asText(trainData)
+    }
+    val model = {
+      implicit val format = jsonFormat1(TrainedModel)
+      pipeline.Persist.Singleton.asJson(new TrainModel(trainDataPersisted))
+    }
+    val measure: Producer[PRMeasurement] = {
+      implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
+        tuple3ColumnFormat[Double, Double, Double](',')
+      pipeline.Persist.Collection.asText(new MeasureModel(model, testData))
+    }
     pipeline.run("SamplePipeline")
 
     assert(findFile(outputDataDir, "JoinAndSplitData_train", ".txt"), "Training data file created")
@@ -62,21 +56,30 @@ class SamplePipeline extends UnitSpec
   }
 
   "Subsequent Experiment" should "re-use existing data" in {
-    // TSV format for label+features is <label><tab><comma-separated feature values>
-    implicit val featureFormat = columnArrayFormat[Double](',')
-    implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+    val pipeline = Pipeline(scratchDir)
 
     val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
     val docs = Read.fromArtifact(ParseDocumentsFromXML, docDir)
     val docFeatures = new FeaturizeDocuments(docs) // use in place of featureData above
 
     val labelData: Producer[Iterable[Boolean]] =
-      Read.Collection.fromText[Boolean](input.flatArtifact(labelFile))
+      Read.Collection.fromText[Boolean](new FileArtifact(new File(inputDir, labelFile)))
     val Producer2(trainData, testData) = new JoinAndSplitData(docFeatures, labelData, 0.2) -> (("train", "test"))
-    val trainDataPersisted = pipeline.Persist.Collection.asText(trainData, None, ".txt")
-    val model = pipeline.Persist.Singleton.asJson(new TrainModel(trainDataPersisted), None, ".json")
-    val measure =
-      pipeline.Persist.Collection.asText(new MeasureModel(model, testData), None, ".txt")
+    val trainDataPersisted = {
+      // TSV format for label+features is <label><tab><comma-separated feature values>
+      implicit val featureFormat = columnArrayFormat[Double](',')
+      implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+      pipeline.Persist.Collection.asText(trainData)
+    }
+    val model = {
+      implicit val format = jsonFormat1(TrainedModel)
+      pipeline.Persist.Singleton.asJson(new TrainModel(trainDataPersisted))
+    }
+    val measure = {
+      implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
+        tuple3ColumnFormat[Double, Double, Double](',')
+      pipeline.Persist.Collection.asText(new MeasureModel(model, testData))
+    }
     val experimentSummary = pipeline.run("Sample Experiment")
 
     val trainDataFile = new File(trainDataPersisted.artifact.url)
@@ -86,20 +89,33 @@ class SamplePipeline extends UnitSpec
 
     // Pipeline using different instances, with some shared steps
     val (trainDataFile2, measureFile2) = {
+      val pipeline = Pipeline(scratchDir)
+
       val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
       val docs = Read.fromArtifact(ParseDocumentsFromXML, docDir)
       val docFeatures = new FeaturizeDocuments(docs)
 
       val labelData: Producer[Iterable[Boolean]] =
-        Read.Collection.fromText[Boolean](input.flatArtifact(labelFile))
+        Read.Collection.fromText[Boolean](new FileArtifact(new File(inputDir, labelFile)))
       val Producer2(trainData, testData) = new JoinAndSplitData(docFeatures, labelData, 0.2) -> (("train", "test"))
-      val trainDataPersisted = pipeline.Persist.Collection.asText(trainData, None, ".txt")
-      val model = pipeline.Persist.Singleton.asJson(new TrainModelPython(
-        trainDataPersisted,
-        SingletonIo.json[TrainedModel]
-      ), None, ".json")
-      val measure: PersistedProducer[PRMeasurement, FlatArtifact] =
+      val trainDataPersisted = {
+        // TSV format for label+features is <label><tab><comma-separated feature values>
+        implicit val featureFormat = columnArrayFormat[Double](',')
+        implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+        pipeline.Persist.Collection.asText(trainData)
+      }
+      val model = {
+        implicit val format = jsonFormat1(TrainedModel)
+        pipeline.Persist.Singleton.asJson(new TrainModelPython(
+          trainDataPersisted,
+          SingletonIo.json[TrainedModel]
+        ))
+      }
+      val measure: PersistedProducer[PRMeasurement, FlatArtifact] = {
+        implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
+          tuple3ColumnFormat[Double, Double, Double](',')
         pipeline.Persist.Collection.asText(new MeasureModel(model, testData))
+      }
       pipeline.run("SamplePipeline")
       (new File(trainDataPersisted.artifact.url),
         new File(measure.artifact.url))
@@ -126,10 +142,6 @@ object SamplePipeline {
 
   case class TrainedModel(info: String)
 
-  object TrainedModel {
-    val jsonFormat = jsonFormat1(apply)
-  }
-
   type TrainingPoint = (Boolean, Array[Double])
 
   case class JoinAndSplitData(
@@ -138,7 +150,6 @@ object SamplePipeline {
       testSizeRatio: Double
   ) extends Producer[(Iterable[TrainingPoint], Iterable[TrainingPoint])] with Ai2StepInfo {
     def create: (Iterable[TrainingPoint], Iterable[TrainingPoint]) = {
-      val rand = new Random
       val data = labels.get.zip(features.get)
       val testSize = math.round(testSizeRatio * data.size).toInt
       (data.drop(testSize), data.take(testSize))
@@ -160,9 +171,9 @@ object SamplePipeline {
     override val description = "Train teh model.  Teh."
   }
 
+  // Threshold, precision, recall
   type PRMeasurement = Iterable[(Double, Double, Double)]
 
-  // Threshold, precision, recall
   case class MeasureModel(
       val model: Producer[TrainedModel],
       val testData: Producer[Iterable[TrainingPoint]]
@@ -198,7 +209,7 @@ object SamplePipeline {
     }
   }
 
-  object ParseDocumentsFromXML extends ArtifactIo[Iterator[ParsedDocument], StructuredArtifact]
+  object ParseDocumentsFromXML extends Deserializer[Iterator[ParsedDocument], StructuredArtifact]
       with Ai2SimpleStepInfo {
     def read(a: StructuredArtifact): Iterator[ParsedDocument] = {
       for ((id, is) <- a.reader.readAll) yield parse(id, is)
@@ -206,19 +217,17 @@ object SamplePipeline {
 
     def parse(id: String, is: InputStream): ParsedDocument = ParsedDocument(id)
 
-    // Writing back to XML not supported
-    def write(data: Iterator[ParsedDocument], artifact: StructuredArtifact): Unit = ???
-
     override def toString: String = this.getClass.getSimpleName
   }
 
-  case class TrainModelPython(
-    val data: PersistedProducer[Iterable[TrainingPoint], FileArtifact],
-    val modelReader: DeserializeFromArtifact[TrainedModel, FileArtifact]
+  case class TrainModelPython[A <: FlatArtifact](
+    val data: PersistedProducer[Iterable[TrainingPoint], A],
+    val modelReader: Deserializer[TrainedModel, FileArtifact]
   )
       extends Producer[TrainedModel] with Ai2StepInfo {
     def create: TrainedModel = {
-      val inputFile = data.artifact.asInstanceOf[FileArtifact].file
+      val inputFile = File.createTempFile("trainingData", ".txt")
+      data.artifact.copyTo(new FileArtifact(inputFile))
       val outputFile = File.createTempFile("model", ".json")
       import scala.language.postfixOps
       import scala.sys.process._
@@ -233,43 +242,60 @@ object SamplePipeline {
 }
 
 /** An application that write out pipeline files for human viewing. */
-object SamplePipelineApp extends App with Pipeline {
+object SamplePipelineApp extends App {
 
   import org.allenai.pipeline.SamplePipeline._
 
   val inputDir = new File("src/test/resources/pipeline")
   val outputDir = new File("pipeline-output")
-  val artifactFactory = new RelativeFileSystem(outputDir)
+  val pipeline = Pipeline(outputDir)
 
-  //  val featureFile = "features.txt"
-  val labelFile = "labels.txt"
-
-  // Enable JSON serialization for our trained model object
-
-  import org.allenai.pipeline.IoHelpers._
+  val labelFile = new File(inputDir, "labels.txt")
 
-  implicit val modelFormat = TrainedModel.jsonFormat
+  // Read input documents
+  val docs = {
+    val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
+    Read.fromArtifact(ParseDocumentsFromXML, docDir)
+  }
 
-  implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
-    tuple3ColumnFormat[Double, Double, Double](',')
+  // Compute document features
+  val docFeatures =
+    new FeaturizeDocuments(docs)
+
+  // Read labels
+  val labelData =
+    Read.Collection.fromText[Boolean](new FileArtifact(labelFile))
+
+  // Join labels with features and split into train/test
+  val (trainData, testData) = {
+    val joinSplit = new JoinAndSplitData(docFeatures, labelData, 0.2)
+    val Producer2(train, test) = joinSplit -> (("train", "test"))
+    val trainPersisted = {
+      // TSV format for label+features is <label><tab><comma-separated feature values>
+      implicit val featureFormat = columnArrayFormat[Double](',')
+      implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+      pipeline.Persist.Collection.asText(train)
+    }
+    (trainPersisted, test)
+  }
 
-  // TSV format for label+features is <label><tab><comma-separated feature values>
-  implicit val featureFormat = columnArrayFormat[Double](',')
-  implicit val labelFeatureFormat = tuple2ColumnFormat[Boolean, Array[Double]]('\t')
+  // Train model
+  val model = {
+    val train = new TrainModel(trainData)
+    implicit val format = jsonFormat1(TrainedModel)
+    pipeline.Persist.Singleton.asJson(train)
+  }
 
-  val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
-  val docs = Read.fromArtifact(ParseDocumentsFromXML, docDir)
-  val docFeatures = new FeaturizeDocuments(docs)
+  // Measure test accuracy
+  val measure = {
+    val measure = new MeasureModel(model, testData)
+    implicit val prMeasurementFormat: StringSerializable[(Double, Double, Double)] =
+      tuple3ColumnFormat[Double, Double, Double](',')
+    pipeline.Persist.Collection.asText(measure)
+  }
 
-  // Define pipeline
-  val labelData: Producer[Iterable[Boolean]] =
-    Read.Collection.fromText[Boolean](new FileArtifact(new File(inputDir, labelFile)))
-  val Producer2(trainData, testData) = new JoinAndSplitData(docFeatures, labelData, 0.2) -> (("train", "test"))
-  val trainDataPersisted = Persist.Collection.asText(trainData, None, ".txt")
-  val model = Persist.Singleton.asJson(new TrainModel(trainDataPersisted), None, ".json")
-  val measure: Producer[PRMeasurement] =
-    Persist.Collection.asText(new MeasureModel(model, testData), None, ".txt")
-  run("Sample Pipeline")
+  // Run the pipeline
+  pipeline.run("Sample Pipeline")
 
   println(s"Pipeline files written to ${outputDir.getAbsolutePath}")
 }
diff --git a/src/test/scala/org/allenai/pipeline/TestArtifact.scala b/core/src/test/scala/org/allenai/pipeline/TestArtifact.scala
similarity index 100%
rename from src/test/scala/org/allenai/pipeline/TestArtifact.scala
rename to core/src/test/scala/org/allenai/pipeline/TestArtifact.scala
diff --git a/src/test/scala/org/allenai/pipeline/TestArtifactIO.scala b/core/src/test/scala/org/allenai/pipeline/TestArtifactIO.scala
similarity index 100%
rename from src/test/scala/org/allenai/pipeline/TestArtifactIO.scala
rename to core/src/test/scala/org/allenai/pipeline/TestArtifactIO.scala
diff --git a/src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala b/core/src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala
similarity index 53%
rename from src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala
rename to core/src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala
index c043ea8..42f3fbf 100644
--- a/src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala
+++ b/core/src/test/scala/org/allenai/pipeline/TestConfiguredPipeline.scala
@@ -2,9 +2,9 @@ package org.allenai.pipeline
 
 import java.io.File
 
-import com.typesafe.config.{ ConfigValueFactory, ConfigValue, ConfigFactory }
+import com.typesafe.config.{ ConfigFactory, ConfigValueFactory }
 import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
-import spray.json.pimpAny
+import org.allenai.pipeline.IoHelpers._
 
 /** Created by rodneykinney on 5/12/15.
   */
@@ -16,54 +16,74 @@ class TestConfiguredPipeline extends UnitSpec with ScratchDirectory {
   val step1 = new Producer[Int] with Ai2SimpleStepInfo {
     override def create = 1
   }
+
   case class AddOne(p: Producer[Int]) extends Producer[Int] with Ai2StepInfo {
     override def create = 1 + p.get
   }
 
-  import IoHelpers._
-
   val format = SingletonIo.text[Int]
 
   it should "optionally persist" in {
-    val outputDir = new File(scratchDir, "testOptionallyPersist")
-    val pipeline = ConfiguredPipeline(
+    val outputDir = new File(scratchDir, "testpersist")
+    val pipeline = Pipeline.configured(
       baseConfig
         .withValue("output.dir", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
     )
-    pipeline.optionallyPersist(AddOne(step1), "Step2", format)
+    val step = pipeline.persist(AddOne(step1), format)
+    val outputFile = new File(step.asInstanceOf[PersistedProducer[Int, FlatArtifact]].artifact.url)
 
     pipeline.run("test")
 
-    new File(outputDir, "data/Step2Output.txt") should exist
+    outputFile should exist
+  }
+
+  it should "optionally persist to absolute URL" in {
+    val outputDir = new File(scratchDir, "testpersist2")
+    val configuredFile = new File(scratchDir, "subDir/mySpecialPath")
+    val pipeline = Pipeline.configured(
+      baseConfig
+        .withValue("output.dir", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
+        .withValue("output.persist.Step2", ConfigValueFactory.fromAnyRef(configuredFile.toURI.toString))
+    )
+    val step = pipeline.persist(AddOne(step1), format, "Step2")
+    val outputFile = new File(step.asInstanceOf[PersistedProducer[Int, FlatArtifact]].artifact.url)
+
+    pipeline.run("test")
+
+    outputFile should exist
+    outputFile.getCanonicalFile should equal(configuredFile.getCanonicalFile)
   }
 
   it should "recognize dryRun flag" in {
     val outputDir = new File(scratchDir, "testDryRun")
-    val pipeline = ConfiguredPipeline(
+    val pipeline = Pipeline.configured(
       baseConfig
         .withValue("output.dir", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
         .withValue("dryRun", ConfigValueFactory.fromAnyRef(true))
+        .withValue("dryRunOutput", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
     )
-    pipeline.optionallyPersist(AddOne(step1), "Step2", format)
+    val step = pipeline.persist(AddOne(step1), format, "Step2")
+    val outputFile = new File(step.asInstanceOf[PersistedProducer[Int, FlatArtifact]].artifact.url)
 
     pipeline.run("test")
 
-    new File(outputDir, "data/Step2Output.xt") should not(exist)
+    outputFile should not(exist)
   }
 
   it should "recognize runOnly flag" in {
     val outputDir = new File(scratchDir, "testRunOnly")
-    val config = baseConfig
+    val cfg = baseConfig
       .withValue("output.dir", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
       .withValue("runOnly", ConfigValueFactory.fromAnyRef("Step3"))
 
     // config specifies runOnly for step3 with no persisted upstream dependencies
-    val pipeline = ConfiguredPipeline(config)
+    val pipeline = Pipeline.configured(cfg)
     val step2 = AddOne(step1)
-    pipeline.optionallyPersist(AddOne(step2), "Step3", format)
+    val step2Persisted = pipeline.persist(AddOne(step2), format, "Step3")
+    val outputFile = new File(step2Persisted.asInstanceOf[PersistedProducer[Int, FlatArtifact]].artifact.url)
     pipeline.run("test")
 
-    new File(outputDir, "data/Step3Output.txt") should exist
+    outputFile should exist
   }
 
   it should "recognize runOnly flag and fail if upstream dependencies don't exist" in {
@@ -74,28 +94,10 @@ class TestConfiguredPipeline extends UnitSpec with ScratchDirectory {
 
     // config specifies runOnly for step3 but step2 is not persisted
     an[IllegalArgumentException] should be thrownBy {
-      val pipeline = ConfiguredPipeline(config)
-      val step2 = pipeline.optionallyPersist(AddOne(step1), "Step2", format)
-      pipeline.optionallyPersist(AddOne(step2), "Step3", format)
+      val pipeline = Pipeline.configured(config)
+      val step2 = pipeline.persist(AddOne(step1), format, "Step2")
+      pipeline.persist(AddOne(step2), format, "Step3")
       pipeline.run("test")
     }
   }
-
-  it should "recognize tempOutputDir flag" in {
-    val outputDir = new File(scratchDir, "testRunOnly3")
-    val tempOutput = new File(outputDir, "temp-output")
-    val config = baseConfig
-      .withValue("output.dir", ConfigValueFactory.fromAnyRef(outputDir.getCanonicalPath))
-      .withValue("runOnly", ConfigValueFactory.fromAnyRef("Step3"))
-      .withValue("tmpOutput", ConfigValueFactory.fromAnyRef(tempOutput.getCanonicalPath))
-
-    // config specifies runOnly for step3 with no persisted upstream dependencies
-    val pipeline = ConfiguredPipeline(config)
-    val step2 = AddOne(step1)
-    pipeline.optionallyPersist(AddOne(step2), "Step3", format)
-    pipeline.run("test")
-
-    new File(outputDir, "data/Step3Output.txt") should not(exist)
-    new File(tempOutput, "data/Step3Output.txt") should exist
-  }
 }
diff --git a/core/src/test/scala/org/allenai/pipeline/TestExternalProcess.scala b/core/src/test/scala/org/allenai/pipeline/TestExternalProcess.scala
new file mode 100644
index 0000000..1e747b3
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/TestExternalProcess.scala
@@ -0,0 +1,139 @@
+package org.allenai.pipeline
+
+import org.allenai.common.Resource
+import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
+import org.allenai.pipeline.IoHelpers.Read
+
+import org.apache.commons.io.IOUtils
+
+import scala.collection.JavaConverters._
+import scala.io.Source
+
+import java.io.{ File, FileWriter, InputStream, PrintWriter }
+import java.lang.Thread.UncaughtExceptionHandler
+
+/** Created by rodneykinney on 5/14/15.
+  */
+class TestExternalProcess extends UnitSpec with ScratchDirectory {
+
+  import org.allenai.pipeline.ExternalProcess._
+
+  "ExecuteShellCommand" should "return status code" in {
+    val testTrue = new ExternalProcess("test", "a", "=", "a")
+    testTrue.run().returnCode should equal(0)
+    val testFalse = new ExternalProcess("test", "a", "=", "b")
+    testFalse.run().returnCode should equal(1)
+  }
+
+  it should "create output files" in {
+    val outputFile = new File(scratchDir, "testTouchFile/output")
+    val outputArtifact = new FileArtifact(outputFile)
+    val touchFile =
+      new ProducerWithPersistence(RunExternalProcess("touch", OutputFileToken("target"))()
+        .outputs("target"), StreamIo, outputArtifact)
+    touchFile.get
+    outputFile should exist
+  }
+
+  def ppVersionHistTest(pipeline: Pipeline, vh1: Seq[String]): (PersistedProducer[() => InputStream, FlatArtifact]) =
+    {
+      val touchFile1 =
+        RunExternalProcess("touch", OutputFileToken("target"))(
+          versionHistory = vh1
+        ).outputs("target")
+      pipeline.persist(touchFile1, StreamIo, s"${touchFile1.stepInfo.signature.id}")
+    }
+
+  it should "cache when versionHistory is equal" in {
+    val pipeline = Pipeline(new File(scratchDir, "testVersionHistory"))
+    val pp1 = ppVersionHistTest(pipeline, Seq("v1.0"))
+    val pp2 = ppVersionHistTest(pipeline, Seq("v1.0"))
+    pipeline.run("versioning test")
+
+    pp1.artifact.url.getRawPath() should equal(pp2.artifact.url.getRawPath())
+  }
+
+  it should "recompute when versionHistory is unequal" in {
+    val pipeline = Pipeline(new File(scratchDir, "testVersionHistory"))
+    val pp1 = ppVersionHistTest(pipeline, Seq("v1.0"))
+    val pp2 = ppVersionHistTest(pipeline, Seq("v1.1"))
+    pipeline.run("versioning test")
+
+    pp1.artifact.url.getRawPath() should not equal (pp2.artifact.url.getRawPath())
+  }
+
+  it should "only account for the last element of versionHistory" in {
+    val pipeline = Pipeline(new File(scratchDir, "testVersionHistory"))
+    val pp1 = ppVersionHistTest(pipeline, Seq("v1.1"))
+    val pp2 = ppVersionHistTest(pipeline, Seq("v1.0", "v1.1"))
+    pipeline.run("versioning test")
+
+    pp1.artifact.url.getRawPath() should equal(pp2.artifact.url.getRawPath())
+  }
+
+  it should "account for the last element of versionHistory" in {
+    val pipeline = Pipeline(new File(scratchDir, "testVersionHistory"))
+    val pp1 = ppVersionHistTest(pipeline, Seq("v1.1"))
+    val pp2 = ppVersionHistTest(pipeline, Seq("v1.1", "v1.2"))
+    pipeline.run("versioning test")
+
+    pp1.artifact.url.getRawPath() should not equal (pp2.artifact.url.getRawPath())
+  }
+
+  it should "capture stdout" in {
+    val echo = new ExternalProcess("echo", "hello", "world")
+    val stdout = IOUtils.readLines(echo.run().stdout()).asScala.mkString("\n")
+    stdout should equal("hello world")
+  }
+  it should "capture stderr" in {
+    val noSuchParameter = new ExternalProcess("touch", "-x", "foo")
+    val stderr = IOUtils.readLines(noSuchParameter.run().stderr()).asScala.mkString("\n")
+    stderr.size should be > 0
+  }
+  it should "throw an exception if command is not found" in {
+    val noSuchCommand = new ExternalProcess("eccho", "hello", "world")
+    val defaultHandler = Thread.getDefaultUncaughtExceptionHandler
+    // Suppress logging of exception by background thread
+    Thread.setDefaultUncaughtExceptionHandler(new UncaughtExceptionHandler {
+      override def uncaughtException(t: Thread, e: Throwable): Unit = ()
+    })
+    an[Exception] shouldBe thrownBy {
+      noSuchCommand.run()
+    }
+    // Restore exception handling
+    Thread.setDefaultUncaughtExceptionHandler(defaultHandler)
+  }
+  it should "read input files" in {
+    val dir = new File(scratchDir, "testCopy")
+    dir.mkdirs()
+    val inputFile = new File(dir, "input")
+    val outputFile = new File(dir, "output")
+    Resource.using(new PrintWriter(new FileWriter(inputFile))) {
+      _.println("Some data")
+    }
+    val inputArtifact = new FileArtifact(inputFile)
+    val outputArtifact = new FileArtifact(outputFile)
+
+    val copy = {
+      val p = RunExternalProcess("cp", InputFileToken("input"), OutputFileToken("output"))(
+        inputs = Map("input" -> Read.fromArtifact(StreamIo, inputArtifact))
+      )
+        .outputs("output")
+      new ProducerWithPersistence(p, StreamIo, outputArtifact)
+    }
+    copy.get
+    outputFile should exist
+    Source.fromFile(outputFile).mkString should equal("Some data\n")
+
+    copy.stepInfo.dependencies.size should equal(1)
+    Workflow.upstreamDependencies(copy).size should equal(2)
+    copy.stepInfo.dependencies.head._2.stepInfo.parameters("cmd") should equal("cp <input> <output>")
+  }
+
+  it should "pipe stdin to stdout" in {
+    val echo = new ExternalProcess("echo", "hello", "world")
+    val wc = new ExternalProcess("wc", "-c")
+    val result = wc.run(stdinput = echo.run().stdout)
+    IOUtils.readLines(result.stdout()).asScala.head.trim().toInt should equal(11)
+  }
+}
diff --git a/core/src/test/scala/org/allenai/pipeline/TestPipeline.scala b/core/src/test/scala/org/allenai/pipeline/TestPipeline.scala
new file mode 100644
index 0000000..530092b
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/TestPipeline.scala
@@ -0,0 +1,56 @@
+package org.allenai.pipeline
+
+import java.io.File
+
+import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
+
+class TestPipeline extends UnitSpec with ScratchDirectory {
+  case class AddOne(input: Producer[Int]) extends Producer[Int] with Ai2StepInfo {
+    override def create = input.get + 1
+  }
+  "Pipeline" should "run all persisted targets" in {
+    val p1 = Producer.fromMemory(1)
+    val p2 = Producer.fromMemory(2)
+    val p3 = Producer.fromMemory(3)
+    val p4 = Producer.fromMemory(4)
+    val outputDir = new File(scratchDir, "test1")
+    val pipeline = Pipeline(outputDir)
+
+    import IoHelpers._
+    val p1File = new File(pipeline.Persist.Singleton.asText(p1, "One").artifact.url)
+    val p2File = new File(pipeline.Persist.Singleton.asText(p2, "Two").artifact.url)
+    val p3File = new File(pipeline.Persist.Singleton.asText(p3, "Three").artifact.url)
+    val p4Persisted = pipeline.Persist.Singleton.asText(p4, "Four")
+    val p4File = new File(p4Persisted.artifact.url)
+
+    an[IllegalArgumentException] should be thrownBy {
+      val p5 = AddOne(p4Persisted)
+      pipeline.Persist.Singleton.asText(p5, "Five")
+      // p5 has p4 as a dependency, but p4 has not been computed yet
+      pipeline.runOnly("test", "Five")
+    }
+
+    pipeline.runOnly("test", "One")
+    p1File should exist
+    p2File should not(exist)
+
+    pipeline.run("test")
+    p2File should exist
+    p3File should exist
+    p4File should exist
+
+    a[RuntimeException] should be thrownBy {
+      // No such step
+      pipeline.runOnly("test", "Six")
+    }
+
+    val p5 = new Producer[Int] with Ai2SimpleStepInfo {
+      override def create = 5
+      override def stepInfo = super.stepInfo.addParameters(("upstream", p4))
+    }
+    pipeline.Persist.Singleton.asText(p5)
+    // p5 is persisted and its dependency exists.  All clear!
+    pipeline.runOnly("test", "Five")
+  }
+
+}
diff --git a/src/test/scala/org/allenai/pipeline/TestProducer.scala b/core/src/test/scala/org/allenai/pipeline/TestProducer.scala
similarity index 82%
rename from src/test/scala/org/allenai/pipeline/TestProducer.scala
rename to core/src/test/scala/org/allenai/pipeline/TestProducer.scala
index 97c6d9b..1cde80a 100644
--- a/src/test/scala/org/allenai/pipeline/TestProducer.scala
+++ b/core/src/test/scala/org/allenai/pipeline/TestProducer.scala
@@ -1,18 +1,15 @@
 package org.allenai.pipeline
 
-import org.allenai.common.testkit.UnitSpec
+import java.io.File
 
-import org.apache.commons.io.FileUtils
-import org.scalatest.BeforeAndAfterAll
+import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
 
 import scala.util.Random
 
-import java.io.File
-
 /** Created by rodneykinney on 8/19/14.
   */
 // scalastyle:off magic.number
-class TestProducer extends UnitSpec with BeforeAndAfterAll {
+class TestProducer extends UnitSpec with ScratchDirectory {
 
   import scala.language.reflectiveCalls
 
@@ -20,13 +17,9 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
 
   import org.allenai.pipeline.IoHelpers._
 
-  val outputDir = new File("test-output-producer")
+  val outputDir = new File(scratchDir, "test-output-producer")
 
-  val output = new RelativeFileSystem(outputDir)
-
-  val pipeline = new Pipeline {
-    override def artifactFactory = output
-  }
+  val pipeline = Pipeline(outputDir)
 
   val randomNumbers = new Producer[Iterable[Double]] with CachingDisabled {
     def create = {
@@ -53,24 +46,23 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
   }
 
   "PersistedProducer" should "read from file if exists" in {
-    val pStep = randomNumbers.persisted(
-      LineCollectionIo.text[Double],
-      output.flatArtifact("savedNumbers.txt")
-    )
+    val pStep = pipeline.Persist.Collection.asText(randomNumbers)
 
     pStep.get should equal(pStep.get)
 
-    val otherStep = cachedRandomNumbers.persisted(
+    val otherStep = new ProducerWithPersistence(
+      cachedRandomNumbers,
       LineCollectionIo.text[Double],
-      new FileArtifact(new File(outputDir, "savedNumbers.txt"))
+      pStep.artifact
     )
     otherStep.get should equal(pStep.get)
   }
 
   "PersistedProducer" should "always read from file if caching disabled" in {
-    val outputFile = output.flatArtifact("savedNumbersWithChanges.txt")
+    val outputFile = new FileArtifact(new File(outputDir, "savedNumbersWithChanges.txt"))
     val io = LineCollectionIo.text[Double]
-    val pStep = randomNumbers.persisted(
+    val pStep = new ProducerWithPersistence(
+      randomNumbers,
       io,
       outputFile
     ).withCachingDisabled
@@ -92,17 +84,11 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
   }
 
   "PersistentCachedProducer" should "read from file if exists" in {
-    val pStep = randomNumbers.persisted(
-      LineCollectionIo.text[Double],
-      output.flatArtifact("savedCachedNumbers.txt")
-    )
+    val pStep = new ProducerWithPersistence(randomNumbers, LineCollectionIo.text[Double], new FileArtifact(new File(scratchDir, "rng.txt")))
 
     pStep.get should equal(pStep.get)
 
-    val otherStep = randomNumbers.persisted(
-      LineCollectionIo.text[Double],
-      output.flatArtifact("savedCachedNumbers.txt")
-    )
+    val otherStep = new ProducerWithPersistence(randomNumbers, LineCollectionIo.text[Double], new FileArtifact(new File(scratchDir, "rng.txt")))
     otherStep.get should equal(pStep.get)
   }
 
@@ -119,22 +105,19 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
   }
 
   "Persisted iterator" should "re-use value" in {
-    val persisted = randomIterator.persisted(
-      LineIteratorIo.text[Double],
-      output.flatArtifact("randomIterator.txt")
-    )
+    val persisted = pipeline.Persist.Iterator.asText(randomIterator)
     persisted.get.toList should equal(persisted.get.toList)
   }
 
   "Persisted iterator" should "read from file if exists" in {
-    val persisted = randomIterator.withCachingEnabled.persisted(
+    val persisted = pipeline.Persist.Iterator.asText(randomIterator.withCachingEnabled)
+    val otherStep = pipeline.persistToArtifact(
+      randomIterator.withCachingDisabled,
       LineIteratorIo.text[Double],
-      output.flatArtifact("savedCachedIterator.txt")
-    )
-    val otherStep = randomIterator.withCachingDisabled.persisted(
-      LineIteratorIo.text[Double],
-      output.flatArtifact("savedCachedIterator.txt")
+      persisted.artifact,
+      "RNG2"
     )
+    persisted.get.toList should equal(otherStep.get.toList)
   }
 
   "Consumed iterator" should "be called only once" in {
@@ -145,7 +128,7 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
         n.toIterator
       }
     }
-
+    "".hashCode
     consumedIterator.get.size should equal(20)
   }
 
@@ -233,8 +216,8 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
           .addFields(this, "seed", "length")
     }
 
-    val rng1 = pipeline.Persist.Collection.asJson(new RNG(42, 100))
-    val rng2 = pipeline.Persist.Collection.asJson(new RNG(117, 100))
+    val rng1 = pipeline.Persist.Collection.asJson(new RNG(42, 100), "RNG1")
+    val rng2 = pipeline.Persist.Collection.asJson(new RNG(117, 100), "RNG2")
 
     rng1.stepInfo.signature should not equal (rng2.stepInfo.signature)
 
@@ -270,16 +253,5 @@ class TestProducer extends UnitSpec with BeforeAndAfterAll {
 
     has2.stepInfo.signature.id should not equal (has3.stepInfo.signature.id)
   }
-
-  override def beforeAll: Unit = {
-    require(
-      (outputDir.exists && outputDir.isDirectory) || outputDir.mkdirs,
-      s"Unable to create test output directory $outputDir"
-    )
-  }
-
-  override def afterAll: Unit = {
-    FileUtils.deleteDirectory(outputDir)
-  }
 }
 
diff --git a/src/test/scala/org/allenai/pipeline/TestVersionId.scala b/core/src/test/scala/org/allenai/pipeline/TestVersionId.scala
similarity index 100%
rename from src/test/scala/org/allenai/pipeline/TestVersionId.scala
rename to core/src/test/scala/org/allenai/pipeline/TestVersionId.scala
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/CountLines.scala b/core/src/test/scala/org/allenai/pipeline/examples/CountLines.scala
new file mode 100644
index 0000000..37b021c
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/CountLines.scala
@@ -0,0 +1,17 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline.{ Ai2StepInfo, Producer }
+
+/** Created by rodneykinney on 5/16/15.
+  */
+case class CountLines(lines: Producer[Iterable[String]], countBlanks: Boolean = true) extends Producer[Int] with Ai2StepInfo {
+  override protected def create: Int =
+    if (countBlanks)
+      lines.get.size
+    else
+      lines.get.filter(_.trim.length > 0).size
+
+  override def versionHistory = List(
+    "v1.1" // Count whitespace-only lines as blank
+  )
+}
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/CountWords.scala b/core/src/test/scala/org/allenai/pipeline/examples/CountWords.scala
new file mode 100644
index 0000000..74b65d5
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/CountWords.scala
@@ -0,0 +1,16 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline.{ Ai2StepInfo, Producer }
+
+/** Created by rodneykinney on 5/16/15.
+  */
+case class CountWords(lines: Producer[Iterable[String]])
+    extends Producer[Map[String, Int]] with Ai2StepInfo {
+  override def create = {
+    val words = for {
+      line <- lines.get
+      word <- line.split("\\s+")
+    } yield word
+    words.groupBy(w => w).mapValues(_.size)
+  }
+}
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/CountWordsAndLinesPipeline.scala b/core/src/test/scala/org/allenai/pipeline/examples/CountWordsAndLinesPipeline.scala
new file mode 100644
index 0000000..445c572
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/CountWordsAndLinesPipeline.scala
@@ -0,0 +1,38 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline.IoHelpers._
+import org.allenai.pipeline._
+
+import spray.json.DefaultJsonProtocol._
+
+import java.io.File
+
+/** A simple pipeline that counts words and lines in a text file */
+object CountWordsAndLinesPipeline extends App {
+  // Create a pipeline.  Specify the output directory where data will be written
+  val pipeline = Pipeline(new File("pipeline-output"))
+
+  // Define our input:  A collection of lines read from an inputFile
+  val textFile = new File("src/test/resources/pipeline/features.txt")
+  // Must import IoHelpers._ to enable this
+  val lines = Read.Collection.fromText[String](textFile)
+
+  val wordCount = {
+    // The Producer instance
+    val count = CountWords(lines)
+    // Persist this step
+    // Must import spray.json.DefaultJsonProtocol._ to enable this
+    pipeline.Persist.Singleton.asJson(count)
+  }
+
+  val lineCount = {
+    // The Producer instance
+    val count = CountLines(lines)
+    // Persisted
+    pipeline.Persist.Singleton.asText(count)
+  }
+
+  // Run the pipeline
+  pipeline.run("Count words and lines")
+}
+
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/TrainModel.scala b/core/src/test/scala/org/allenai/pipeline/examples/TrainModel.scala
new file mode 100644
index 0000000..c29f680
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/TrainModel.scala
@@ -0,0 +1,91 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline._
+
+import scala.util.Random
+
+import java.io.InputStream
+
+/** Classes for use in model-training pipeline
+  */
+case class TrainModel(trainingData: Producer[Iterable[TrainingPoint]])
+    extends Producer[TrainedModel] with Ai2StepInfo {
+  def create: TrainedModel = {
+    val dataRows = trainingData.get
+    train(dataRows) // Run training algorithm on training data
+  }
+
+  def train(data: Iterable[TrainingPoint]): TrainedModel =
+    TrainedModel(s"Trained model with ${data.size} rows")
+
+  override val description = "Train teh model.  Teh."
+}
+
+case class TrainedModel(info: String)
+
+case class JoinAndSplitData(
+    features: Producer[Iterable[Array[Double]]],
+    labels: Producer[Iterable[Boolean]],
+    testSizeRatio: Double
+) extends Producer[(Iterable[TrainingPoint], Iterable[TrainingPoint])] with Ai2StepInfo {
+  def create = {
+    val data =
+      for ((label, features) <- labels.get.zip(features.get)) yield TrainingPoint(label, features)
+    val testSize = math.round(testSizeRatio * data.size).toInt
+    (data.drop(testSize), data.take(testSize))
+  }
+
+  override val description = "Join and split data."
+}
+
+object ParseDocumentsFromXML extends Deserializer[Iterator[ParsedDocument], StructuredArtifact]
+    with Ai2SimpleStepInfo {
+  def read(a: StructuredArtifact): Iterator[ParsedDocument] = {
+    for ((id, is) <- a.reader.readAll) yield parse(id, is)
+  }
+
+  def parse(id: String, is: InputStream): ParsedDocument = ParsedDocument(id)
+
+  override def toString: String = this.getClass.getSimpleName
+}
+
+case class ParsedDocument(info: String)
+
+case class FeaturizeDocuments(documents: Producer[Iterator[ParsedDocument]])
+    extends Producer[Iterable[Array[Double]]] with Ai2StepInfo {
+  def create: Iterable[Array[Double]] = {
+    val features = for (doc <- documents.get) yield {
+      val rand = new Random
+      Array.fill(8)(rand.nextDouble) // scalastyle:ignore
+    }
+    features.toList
+  }
+}
+
+case class TrainingPoint(label: Boolean, features: Array[Double])
+
+case class PR(precision: Double, recall: Double, threshold: Double)
+
+case class MeasureModel(
+    val model: Producer[TrainedModel],
+    val testData: Producer[Iterable[TrainingPoint]]
+) extends Producer[Iterable[PR]] with Ai2StepInfo {
+  def create = {
+    model.get
+    // Just generate some dummy data
+    val rand = new Random
+    import scala.math.exp
+    var a = 0.0
+    var b = 0.0
+    val prScan = for (i <- (0 until testData.get.size)) yield {
+      val r = PR(exp(-a), 1 - exp(-b), exp(-b))
+      a += rand.nextDouble * .03
+      b += rand.nextDouble * .03
+      r
+    }
+    prScan
+  }
+
+  override val description = "Measure the model."
+}
+
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/TrainModelPipeline.scala b/core/src/test/scala/org/allenai/pipeline/examples/TrainModelPipeline.scala
new file mode 100644
index 0000000..4c1da5d
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/TrainModelPipeline.scala
@@ -0,0 +1,73 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline.IoHelpers._
+import org.allenai.pipeline._
+
+import spray.json.DefaultJsonProtocol._
+
+import java.io.File
+
+/** Example pipeline for the workflow of training a classifier
+  * (The methods are just stubs to show the structure of such a pipeline.
+  * No actual machine-learning code is included.)
+  */
+object TrainModelPipeline extends App {
+  val inputDir = new File("src/test/resources/pipeline")
+  val pipeline = Pipeline(new File("pipeline-output"))
+
+  // Create the training and test data
+  val (trainData, testData) = produceTrainAndTestData(pipeline, inputDir)
+
+  // Train the model
+  val model = {
+    // Save the model in Json format
+    implicit val modelFormat = jsonFormat1(TrainedModel)
+    pipeline.Persist.Singleton.asJson(TrainModel(trainData))
+  }
+
+  // Measure precision/recall of the model using the test data from above
+  val measure = {
+    // Save PR data in comma-separated text file
+    implicit val prFormat = columnFormat3(PR, ',')
+    pipeline.Persist.Collection.asText(MeasureModel(model, testData))
+  }
+
+  pipeline.run("Train Model")
+
+  // Method that encapsulates a sub-pipeline that produces training and test data
+  def produceTrainAndTestData(pipeline: Pipeline, inputDir: File) = {
+    // Input documents.  Use custom IO class to parse XML
+    val docs = {
+      val docDir = new DirectoryArtifact(new File(inputDir, "xml"))
+      Read.fromArtifact(ParseDocumentsFromXML, docDir)
+    }
+
+    // Intermediate step to compute document features.  Not persisted
+    val docFeatures = new FeaturizeDocuments(docs)
+
+    // Read labels
+    val labelData = Read.Collection.fromText[Boolean](new File(inputDir, "labels.txt"))
+
+    // The JoinAndSplitData produces two outputs, which are assigned to separate variables
+    val Producer2(trainData, testData) =
+      new JoinAndSplitData(docFeatures, labelData, 0.2) -> (("train", "test"))
+
+    // The training data will be persisted, but
+    val (trainDataPersisted, testDataPersisted) = {
+      // Define TSV format for saving training data
+
+      // Write features a comma-separated doubles
+      implicit val featureFormat = columnArrayFormat[Double](',')
+
+      // The final format is <label><tab><feature-list>
+      implicit val trainingPointFormat = columnFormat2(TrainingPoint, '\t')
+      (
+        pipeline.Persist.Collection.asText(trainData, "TrainingData"),
+        pipeline.Persist.Collection.asText(testData, "TestData")
+      )
+    }
+
+    (trainDataPersisted, testDataPersisted)
+  }
+
+}
diff --git a/core/src/test/scala/org/allenai/pipeline/examples/TrainModelViaPythonPipeline.scala b/core/src/test/scala/org/allenai/pipeline/examples/TrainModelViaPythonPipeline.scala
new file mode 100644
index 0000000..b0256c4
--- /dev/null
+++ b/core/src/test/scala/org/allenai/pipeline/examples/TrainModelViaPythonPipeline.scala
@@ -0,0 +1,52 @@
+package org.allenai.pipeline.examples
+
+import org.allenai.pipeline.ExternalProcess._
+import org.allenai.pipeline._
+
+import java.io.File
+
+/** Example pipeline that uses an external Python process
+  * to train a model and to score the test data
+  */
+object TrainModelViaPythonPipeline extends App {
+  val inputDir = new File("src/test/resources/pipeline")
+  val pipeline = Pipeline(new File("pipeline-output"))
+
+  // Create the training and test data
+  val (trainData, testData) = TrainModelPipeline.produceTrainAndTestData(pipeline, inputDir)
+
+  // Invoke an external Python process to train a model
+  val trainModel =
+    RunExternalProcess(
+      "python",
+      InputFileToken("script"),
+      OutputFileToken("modelFile"),
+      "-data",
+      InputFileToken("trainingData")
+    )(inputs =
+        Map("trainingData" -> trainData, "script" -> new FileArtifact(new File(inputDir, "trainModel.py"))))
+
+  // Capture the output of the process and persist it
+  val modelFile = pipeline.persist(trainModel.outputs("modelFile"), StreamIo, "TrainedModel")
+
+  val measureModel =
+    RunExternalProcess(
+      "python",
+      InputFileToken("script"),
+      OutputFileToken("prFile"),
+      "-model",
+      InputFileToken("modelFile"),
+      "-data",
+      InputFileToken("testDataFile")
+    )(inputs = Map(
+        "script" -> new FileArtifact(new File(inputDir, "scoreModel.py")),
+        "modelFile" -> modelFile,
+        "testDataFile" -> testData
+      ))
+
+  pipeline.persist(measureModel.outputs("prFile"), StreamIo, "PrecisionRecall")
+
+  // Measure precision/recall of the model using the test data from above
+  pipeline.run("Train Model Python")
+
+}
diff --git a/project/Dependencies.scala b/project/Dependencies.scala
index 5e14080..17051f7 100644
--- a/project/Dependencies.scala
+++ b/project/Dependencies.scala
@@ -4,9 +4,11 @@ import org.allenai.plugins.CoreDependencies
 
 /** Object holding the dependencies Common has, plus resolvers and overrides. */
 object Dependencies extends CoreDependencies {
-  val scalaReflection = "org.scala-lang" % "scala-reflect" % "2.11.5"
   val awsJavaSdk = "com.amazonaws" % "aws-java-sdk" % "1.8.9.1"
+  val scalaReflection = "org.scala-lang" % "scala-reflect" % "2.11.5"
   val commonsIO = "commons-io" % "commons-io" % "2.4"
-
   val ai2Common = allenAiCommon exclude ("org.allenai", "pipeline")
-}
\ No newline at end of file
+  val sparkCore = "org.apache.spark" %% "spark-core" % "1.2.0" % "provided" excludeAll (
+    ExclusionRule(organization = "org.slf4j", name = "slf4j-log4j12")
+  )
+}
diff --git a/s3/build.sbt b/s3/build.sbt
new file mode 100644
index 0000000..e41b9c1
--- /dev/null
+++ b/s3/build.sbt
@@ -0,0 +1,10 @@
+import Dependencies._
+
+name := "pipeline-s3"
+organization := "org.allenai"
+
+StylePlugin.enableLineLimit := false
+
+libraryDependencies ++= Seq(
+  awsJavaSdk
+)
diff --git a/s3/src/main/scala/org/allenai/pipeline/s3/CreateCoreArtifacts.scala b/s3/src/main/scala/org/allenai/pipeline/s3/CreateCoreArtifacts.scala
new file mode 100644
index 0000000..5deac69
--- /dev/null
+++ b/s3/src/main/scala/org/allenai/pipeline/s3/CreateCoreArtifacts.scala
@@ -0,0 +1,36 @@
+package org.allenai.pipeline.s3
+
+import org.allenai.pipeline.{ Artifact, ArtifactFactory, CreateCoreArtifacts => CreateCoreFileArtifacts, Pipeline => basePipeline, UrlToArtifact }
+
+import scala.reflect.ClassTag
+
+import java.net.URI
+
+/** Created by rodneykinney on 5/22/15.
+  */
+object CreateCoreArtifacts {
+  // Create a FlatArtifact or StructuredArtifact from an absolute s3:// URL
+  def fromS3Urls(credentials: => S3Credentials) = new UrlToArtifact {
+    def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = {
+      val c = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
+      val fn: PartialFunction[URI, A] = {
+        case url if c.isAssignableFrom(classOf[S3FlatArtifact])
+          && List("s3", "s3n").contains(url.getScheme) =>
+          val bucket = url.getHost
+          val path = url.getPath.dropWhile(_ == '/')
+          new S3FlatArtifact(path, S3Config(bucket, credentials)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[S3ZipArtifact])
+          && List("s3", "s3n").contains(url.getScheme) =>
+          val bucket = url.getHost
+          val path = url.getPath.dropWhile(_ == '/')
+          new S3ZipArtifact(path, S3Config(bucket, credentials)).asInstanceOf[A]
+      }
+      fn
+    }
+  }
+
+  // Create a FlatArtifact or StructuredArtifact from an input file:// or s3:// URL
+  def fromFileOrS3Urls(credentials: => S3Credentials = S3Config.environmentCredentials): UrlToArtifact =
+    UrlToArtifact.chain(CreateCoreFileArtifacts.fromFileUrls, fromS3Urls(credentials))
+}
+
diff --git a/src/main/scala/org/allenai/pipeline/S3Artifact.scala b/s3/src/main/scala/org/allenai/pipeline/s3/S3Artifact.scala
similarity index 83%
rename from src/main/scala/org/allenai/pipeline/S3Artifact.scala
rename to s3/src/main/scala/org/allenai/pipeline/s3/S3Artifact.scala
index bd3e35d..8c8d5a6 100644
--- a/src/main/scala/org/allenai/pipeline/S3Artifact.scala
+++ b/s3/src/main/scala/org/allenai/pipeline/s3/S3Artifact.scala
@@ -1,23 +1,31 @@
-package org.allenai.pipeline
+package org.allenai.pipeline.s3
 
-import org.allenai.common.Logging
+import java.io.{ File, FileOutputStream, InputStream }
+import java.net.URI
 
 import com.amazonaws.AmazonServiceException
-import com.amazonaws.auth.BasicAWSCredentials
+import com.amazonaws.auth.{ BasicAWSCredentials, EnvironmentVariableCredentialsProvider }
 import com.amazonaws.services.s3.AmazonS3Client
 import com.amazonaws.services.s3.model.{ CannedAccessControlList, ObjectMetadata, PutObjectRequest }
+import org.allenai.common.Logging
+import org.allenai.pipeline._
 
-import java.io.{ File, FileOutputStream, InputStream }
-import java.net.URI
+case class S3Config(bucket: String, credentials: S3Credentials = S3Config.environmentCredentials()) {
+  @transient
+  lazy val service = {
+    val S3Credentials(accessKey, secretKey) = credentials
+    new AmazonS3Client(new BasicAWSCredentials(accessKey, secretKey))
+  }
+}
 
-case class S3Config(service: AmazonS3Client, bucket: String)
+case class S3Credentials(accessKey: String, secretKey: String)
 
 object S3Config {
-  def apply(accessKey: String, secretAccessKey: String, bucket: String): S3Config = {
-    S3Config(new AmazonS3Client(new BasicAWSCredentials(accessKey, secretAccessKey)), bucket)
-  }
-  def apply(bucket: String): S3Config = {
-    S3Config(new AmazonS3Client(), bucket)
+  def environmentCredentials(): S3Credentials = {
+    val credentials = new EnvironmentVariableCredentialsProvider().getCredentials
+    val accessKey = credentials.getAWSAccessKeyId
+    val secretKey = credentials.getAWSSecretKey
+    new S3Credentials(accessKey, secretKey)
   }
 }
 
@@ -82,11 +90,12 @@ trait S3Artifact[A <: Artifact] extends Logging {
 
   override def url: URI = new URI("s3", bucket, s"/$path", null)
 
-  protected val S3Config(service, bucket) = config
+  val service = config.service
+  val bucket = config.bucket
 
   override def exists: Boolean = {
     val result = try {
-      val resp = service.getObjectMetadata(bucket, path)
+      service.getObjectMetadata(config.bucket, path)
       true
     } catch {
       case e: AmazonServiceException if e.getStatusCode == 404 => false
@@ -154,4 +163,4 @@ trait S3Artifact[A <: Artifact] extends Logging {
       cachedFile = Some(makeLocalArtifact(downloadFile))
       cachedFile.get
   }
-}
+}
\ No newline at end of file
diff --git a/s3/src/main/scala/org/allenai/pipeline/s3/S3Pipeline.scala b/s3/src/main/scala/org/allenai/pipeline/s3/S3Pipeline.scala
new file mode 100644
index 0000000..eabe3d0
--- /dev/null
+++ b/s3/src/main/scala/org/allenai/pipeline/s3/S3Pipeline.scala
@@ -0,0 +1,34 @@
+package org.allenai.pipeline.s3
+
+import org.allenai.pipeline.{ ConfiguredPipeline, UrlToArtifact, Pipeline }
+
+import com.typesafe.config.Config
+
+import java.net.URI
+
+/** Created by rodneykinney on 5/24/15.
+  */
+
+trait S3Pipeline extends Pipeline {
+  def credentials: S3Credentials = S3Config.environmentCredentials()
+  override def urlToArtifact = UrlToArtifact.chain(super.urlToArtifact, CreateCoreArtifacts.fromS3Urls(credentials))
+}
+
+object S3Pipeline {
+  def apply(
+    rootUrl: URI,
+    awsCredentials: S3Credentials = S3Config.environmentCredentials()
+  ) = new S3Pipeline {
+    def rootOutputUrl = rootUrl
+    override def credentials = awsCredentials
+  }
+  def configured(
+    cfg: Config,
+    awsCredentials: S3Credentials = S3Config.environmentCredentials()
+  ) = {
+    new ConfiguredPipeline with S3Pipeline {
+      override val config = cfg
+      override def credentials = awsCredentials
+    }
+  }
+}
diff --git a/spark/build.sbt b/spark/build.sbt
new file mode 100644
index 0000000..648de59
--- /dev/null
+++ b/spark/build.sbt
@@ -0,0 +1,19 @@
+import Dependencies._
+
+name := "pipeline-spark"
+organization := "org.allenai"
+
+StylePlugin.enableLineLimit := false
+
+dependencyOverrides += "commons-codec" % "commons-codec" % "1.6"
+dependencyOverrides += "commons-io" % "commons-io" % "2.4"
+dependencyOverrides += "org.scala-lang" % "scala-reflect" % "2.11.5"
+dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"
+dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.4.4"
+dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-annotations" % "2.4.4"
+dependencyOverrides += "org.scala-lang.modules" %% "scala-xml" % "1.0.2"
+
+libraryDependencies ++= Seq(
+  sparkCore,
+  allenAiTestkit % "test"
+)
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/ConvertToRdd.scala b/spark/src/main/scala/org/allenai/pipeline/spark/ConvertToRdd.scala
new file mode 100644
index 0000000..347de0e
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/ConvertToRdd.scala
@@ -0,0 +1,25 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.{ Ai2SimpleStepInfo, Producer }
+
+import org.apache.spark.SparkContext
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+/** Created by rodneykinney on 5/31/15.
+  */
+case class ConvertToRdd[T: ClassTag](
+    collection: Producer[Iterable[T]],
+    sparkContext: SparkContext,
+    parallelism: Option[Int] = None
+) extends Producer[RDD[T]] with Ai2SimpleStepInfo {
+  override def create = parallelism match {
+    case Some(p) => sparkContext.parallelize(collection.get.toVector, p)
+    case None => sparkContext.parallelize(collection.get.toVector)
+  }
+
+  override def stepInfo =
+    super.stepInfo.addParameters("collection" -> collection)
+
+}
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/CreateRddArtifacts.scala b/spark/src/main/scala/org/allenai/pipeline/spark/CreateRddArtifacts.scala
new file mode 100644
index 0000000..6f33aff
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/CreateRddArtifacts.scala
@@ -0,0 +1,45 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline._
+import org.allenai.pipeline.s3.{ S3Credentials, S3Config }
+
+import scala.reflect.ClassTag
+
+import java.io.File
+import java.net.URI
+
+/** Created by rodneykinney on 5/24/15.
+  */
+object CreateRddArtifacts {
+  val fromFileUrls: UrlToArtifact = new UrlToArtifact {
+    def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = {
+      val c = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
+      val fn: PartialFunction[URI, A] = {
+        case url if c.isAssignableFrom(classOf[PartitionedRddFileArtifact])
+          && "file" == url.getScheme =>
+          new PartitionedRddFileArtifact(new File(url)).asInstanceOf[A]
+        case url if c.isAssignableFrom(classOf[PartitionedRddFileArtifact])
+          && null == url.getScheme =>
+          new PartitionedRddFileArtifact(new File(url.getPath)).asInstanceOf[A]
+      }
+      fn
+    }
+  }
+
+  def fromS3Urls(credentials: => S3Credentials): UrlToArtifact = new UrlToArtifact {
+    def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = {
+      val c = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
+      val fn: PartialFunction[URI, A] = {
+        case url if c.isAssignableFrom(classOf[PartitionedRddS3Artifact])
+          && List("s3", "s3n").contains(url.getScheme) =>
+          val bucket = url.getHost
+          val path = url.getPath.dropWhile(_ == '/')
+          new PartitionedRddS3Artifact(S3Config(bucket, credentials), path).asInstanceOf[A]
+      }
+      fn
+    }
+  }
+
+  def fromFileOrS3Urls(credentials: => S3Credentials) =
+    UrlToArtifact.chain(fromFileUrls, fromS3Urls(credentials))
+}
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/DeserializeObject.scala b/spark/src/main/scala/org/allenai/pipeline/spark/DeserializeObject.scala
new file mode 100644
index 0000000..8d90f4d
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/DeserializeObject.scala
@@ -0,0 +1,18 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.{ Ai2StepInfo, Producer, StringSerializable }
+
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+/** Created by rodneykinney on 5/24/15.
+  */
+case class DeserializeObject[T: StringSerializable: ClassTag](
+    serializedObjects: Producer[RDD[String]]
+) extends Producer[RDD[T]] with Ai2StepInfo {
+  override def create = {
+    val convertToObject = SerializeFunction(implicitly[StringSerializable[T]].fromString)
+    serializedObjects.get.map(convertToObject)
+  }
+}
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/IoHelpers.scala b/spark/src/main/scala/org/allenai/pipeline/spark/IoHelpers.scala
new file mode 100644
index 0000000..87fb85a
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/IoHelpers.scala
@@ -0,0 +1,40 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.common.Resource
+import org.allenai.pipeline.{ PipelineStepInfo, Producer }
+import org.allenai.pipeline.s3.{ S3FlatArtifact, S3Config }
+
+import org.apache.spark.SparkContext
+import org.apache.spark.rdd.RDD
+
+import scala.io.Source
+
+/** Created by rodneykinney on 5/24/15.
+  */
+object IoHelpers {
+  object ReadRdd {
+    def fromS3(
+      s3Config: S3Config,
+      sparkContext: SparkContext,
+      paths: Iterable[String],
+      numPartitions: Option[Int]
+    ): Producer[RDD[String]] = {
+      new Producer[RDD[String]] {
+        override def create = {
+          val pathsRdd =
+            numPartitions.map(i => sparkContext.parallelize(paths.toVector, i))
+              .getOrElse(sparkContext.parallelize(paths.toVector))
+          val contentsRdd = pathsRdd.map {
+            path =>
+              Resource.using(
+                Source.fromInputStream(new S3FlatArtifact(path, s3Config).read)
+              )(_.mkString)
+          }
+          contentsRdd
+        }
+        override def stepInfo = PipelineStepInfo("ReadStringRdd")
+      }
+    }
+  }
+
+}
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddArtifact.scala b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddArtifact.scala
new file mode 100644
index 0000000..7b11d61
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddArtifact.scala
@@ -0,0 +1,9 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline._
+
+trait PartitionedRddArtifact extends Artifact {
+  def makePartitionArtifact: Int => FlatArtifact
+  def getExistingPartitions: Iterable[Int]
+  def saveWasSuccessful(): Unit
+}
\ No newline at end of file
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddFileArtifact.scala b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddFileArtifact.scala
new file mode 100644
index 0000000..724e601
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddFileArtifact.scala
@@ -0,0 +1,42 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.FileArtifact
+
+import java.io.File
+import java.net.URI
+
+class PartitionedRddFileArtifact(
+    rootDir: File,
+    prefix: String = "part-",
+    maxPartitions: Int = 100000
+) extends PartitionedRddArtifact {
+  require(
+    (rootDir.exists && rootDir.isDirectory)
+      || rootDir.mkdirs, s"Unable to find or create directory $rootDir"
+  )
+  private val digits = (math.log(maxPartitions - 1) / math.log(10)).toInt + 1
+
+  def makePartitionArtifact: Int => FileArtifact = {
+    val root = rootDir
+    val prefix = this.prefix
+    val digits = this.digits
+    i: Int => new FileArtifact(new File(root, s"$prefix%0${digits}d".format(i)))
+  }
+
+  private def successFile = new FileArtifact(new File(rootDir, "_SUCCESS"))
+
+  def saveWasSuccessful(): Unit =
+    successFile.write(w => w.write(""))
+
+  /** Return true if this data has been written to the persistent store. */
+  override def exists: Boolean = successFile.exists
+
+  override def url: URI = rootDir.toURI
+
+  override def getExistingPartitions: Iterable[Int] = {
+    rootDir.list()
+      .filter(_.startsWith(prefix))
+      .map(_.substring(prefix.length).toInt)
+      .toList
+  }
+}
\ No newline at end of file
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddIo.scala b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddIo.scala
new file mode 100644
index 0000000..f5590b1
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddIo.scala
@@ -0,0 +1,102 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.common.Logging
+import org.allenai.pipeline._
+import org.apache.hadoop.fs.{ FileSystem, Path }
+import org.apache.hadoop.mapred.JobConf
+import org.apache.spark.SparkContext
+import org.apache.spark.SparkContext._
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+/** Created by rodneykinney on 5/24/15.
+  */
+class PartitionedRddIo[T: ClassTag: StringSerializable](
+  sc: SparkContext,
+  cleanOutput: Boolean = true
+) extends ArtifactIo[RDD[T], PartitionedRddArtifact]
+    with BasicPipelineStepInfo with Logging {
+  override def write(data: RDD[T], artifact: PartitionedRddArtifact): Unit = {
+    if (cleanOutput) {
+      deleteArtifact(artifact)
+    }
+    try {
+      val makeArtifact = SerializeFunction(artifact.makePartitionArtifact)
+      val convertToString = SerializeFunction(implicitly[StringSerializable[T]].toString)
+      val linesCounter = sc.accumulator(0L, s"WriteLineTo(${artifact.url}")
+      val stringRdd = data.map {
+        s =>
+          linesCounter += 1
+          convertToString(s)
+      }
+      val savedPartitions = stringRdd.mapPartitionsWithIndex {
+        case (index, partitionData) if partitionData.nonEmpty =>
+          import org.allenai.pipeline.IoHelpers._
+          val io = LineIteratorIo.text[String]
+          val artifact = makeArtifact(index)
+          io.write(partitionData, artifact)
+          Iterator(1)
+        case _ =>
+          Iterator(0)
+      }
+      // Force execution of Spark job
+      val partitionCount = savedPartitions.sum()
+      logger.info(s"Saved $partitionCount partitions to ${artifact.url}")
+      artifact.saveWasSuccessful()
+    } catch {
+      case ex: Throwable =>
+        if (!artifact.exists && cleanOutput) {
+          deleteArtifact(artifact)
+        }
+        throw ex
+    }
+  }
+
+  override def read(artifact: PartitionedRddArtifact): RDD[T] = {
+    val partitionArtifacts = artifact.getExistingPartitions.toVector
+    val makeArtifact = SerializeFunction(artifact.makePartitionArtifact)
+    if (partitionArtifacts.nonEmpty) {
+      val partitions = sc.parallelize(partitionArtifacts, partitionArtifacts.size)
+      val stringRdd = partitions.mapPartitions {
+        ids =>
+          import org.allenai.pipeline.IoHelpers._
+          val io = LineIteratorIo.text[String]
+          for {
+            id <- ids
+            a = makeArtifact(id)
+            row <- io.read(a)
+          } yield row
+      }
+      val convertToObject = SerializeFunction(implicitly[StringSerializable[T]].fromString)
+      val linesCounter = sc.accumulator(0L, s"ReadLineFrom(${artifact.url}")
+      stringRdd.map {
+        s =>
+          linesCounter += 1
+          convertToObject(s)
+      }
+    } else {
+      sc.emptyRDD[T]
+    }
+  }
+
+  protected[this] def deleteArtifact(artifact: PartitionedRddArtifact) = {
+    // Clean up directory
+    val hdfs = new JobConf(sc.hadoopConfiguration)
+    val hdfsDir = FileSystem.get(artifact.url, hdfs)
+    hdfsDir.delete(new Path(artifact.url), true)
+  }
+}
+
+object SerializeFunction {
+  def apply[A, B](f: A => B): A => B = {
+    val locker = com.twitter.chill.Externalizer(f)
+    x => locker.get.apply(x)
+  }
+
+  def func2[A, B, C](f: (A, B) => C): (A, B) => C = {
+    val locker = com.twitter.chill.Externalizer(f)
+    (a, b) => locker.get.apply(a, b)
+  }
+}
+
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddS3Artifact.scala b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddS3Artifact.scala
new file mode 100644
index 0000000..f7ee378
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/PartitionedRddS3Artifact.scala
@@ -0,0 +1,56 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.s3.{ S3Config, S3FlatArtifact }
+
+import com.amazonaws.services.s3.model.ObjectListing
+
+import scala.collection.JavaConverters._
+
+import java.net.URI
+
+class PartitionedRddS3Artifact(
+    val s3Config: S3Config,
+    val rootPath: String,
+    val prefix: String = "part-",
+    val maxPartitions: Int = 100000
+) extends PartitionedRddArtifact {
+  private val cleanRoot = rootPath.reverse.dropWhile(_ == '/').reverse
+
+  private val successArtifact = new S3FlatArtifact(s"$cleanRoot/_SUCCESS", s3Config)
+  private val digits = (math.log(maxPartitions - 1) / math.log(10)).toInt + 1
+
+  def makePartitionArtifact: Int => S3FlatArtifact = {
+    val root = cleanRoot
+    val cfg = s3Config
+    val prefix = this.prefix
+    val digits = this.digits
+    i: Int => new S3FlatArtifact(s"$root/$prefix%0${digits}d".format(i), cfg)
+  }
+
+  def saveWasSuccessful(): Unit = {
+    successArtifact.write(w => w.write(""))
+  }
+
+  /** Return true if this data has been written to the persistent store. */
+  override def exists: Boolean = successArtifact.exists
+
+  override def url: URI = new URI("s3", s3Config.bucket, s"/$rootPath", null)
+
+  override def getExistingPartitions: Iterable[Int] = {
+    val client = s3Config.service
+    def extractKeys(resp: ObjectListing) =
+      resp.getObjectSummaries.asScala
+        .filter(_.getSize > 0)
+        .map(_.getKey)
+
+    val fullPrefix = s"$cleanRoot/$prefix"
+    var resp = client.listObjects(s3Config.bucket, fullPrefix)
+    var keys = extractKeys(resp)
+    while (resp.isTruncated) {
+      resp = client.listNextBatchOfObjects(resp)
+      val newKeys = extractKeys(resp)
+      keys ++= newKeys
+    }
+    keys.toList.map(k => k.substring(fullPrefix.length).toInt)
+  }
+}
\ No newline at end of file
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/ReadInputStreamRddFromS3.scala b/spark/src/main/scala/org/allenai/pipeline/spark/ReadInputStreamRddFromS3.scala
new file mode 100644
index 0000000..c3bf6b7
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/ReadInputStreamRddFromS3.scala
@@ -0,0 +1,69 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.s3.{ S3Config, S3FlatArtifact }
+import org.allenai.pipeline.{ StringSerializable, Ai2SimpleStepInfo, Producer }
+
+import org.apache.spark.SparkContext
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+import java.io.InputStream
+
+/** Created by rodneykinney on 5/24/15.
+  */
+case class ReadObjectRddFromS3[T: StringSerializable: ClassTag](
+    s3Paths: Producer[Iterable[String]],
+    s3Config: S3Config,
+    sparkContext: SparkContext,
+    numPartitions: Option[Int] = None
+) extends Producer[RDD[T]] with Ai2SimpleStepInfo {
+  override def create = {
+    DeserializeObject(
+      ReadStreamContents(
+        ReadInputStreamRddFromS3(s3Paths, s3Config, numPartitions, sparkContext)
+      )
+    ).get
+  }
+
+  override def stepInfo = {
+    val className = scala.reflect.classTag[T].runtimeClass.getSimpleName
+    super.stepInfo
+      .addParameters(
+        "bucket" -> s3Config.bucket,
+        "s3Paths" -> s3Paths,
+        "numPartitions" -> numPartitions
+      )
+      .copy(description = Some(s"Read an RDD of [$className] from lines in the input S3 blobs"))
+  }
+}
+
+case class ReadInputStreamRddFromS3(
+  paths: Producer[Iterable[String]],
+  s3Config: S3Config,
+  numPartitions: Option[Int] = None,
+  sparkContext: SparkContext
+)
+    extends Producer[RDD[() => InputStream]] with Ai2SimpleStepInfo {
+  override def create = {
+    val cfg = s3Config
+    val pathsRdd =
+      numPartitions
+        .map(i => sparkContext.parallelize(paths.get.toVector, i))
+        .getOrElse(sparkContext.parallelize(paths.get.toVector))
+    val contentsRdd = pathsRdd.map {
+      path =>
+        () => new S3FlatArtifact(path, cfg).read
+    }
+    contentsRdd
+  }
+
+  override def stepInfo =
+    super.stepInfo
+      .addParameters(
+        "bucket" -> s3Config.bucket,
+        "paths" -> paths,
+        "numPartitions" -> numPartitions
+      )
+
+}
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/ReadObjectRddFromFiles.scala b/spark/src/main/scala/org/allenai/pipeline/spark/ReadObjectRddFromFiles.scala
new file mode 100644
index 0000000..cec05f9
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/ReadObjectRddFromFiles.scala
@@ -0,0 +1,65 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline._
+
+import org.apache.spark.SparkContext
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+import java.io.{ File, InputStream }
+
+/** Created by rodneykinney on 5/24/15.
+  */
+
+case class ReadObjectRddFromFiles[T: StringSerializable: ClassTag](
+    filePaths: Producer[Iterable[String]],
+    sparkContext: SparkContext,
+    numPartitions: Option[Int] = None
+) extends Producer[RDD[T]] with Ai2SimpleStepInfo {
+  override def create = {
+    DeserializeObject(
+      ReadStreamContents(
+        ReadInputStreamRddFromFiles(filePaths, numPartitions, sparkContext)
+      )
+    ).get
+  }
+
+  override def stepInfo = {
+    val className = scala.reflect.classTag[T].runtimeClass.getSimpleName
+    super.stepInfo
+      .addParameters(
+        "filePaths" -> filePaths,
+        "numPartitions" -> numPartitions
+      )
+      .copy(description = Some(s"Read an RDD of [$className] from lines in the input files"))
+  }
+}
+
+case class ReadInputStreamRddFromFiles(
+  paths: Producer[Iterable[String]],
+  numPartitions: Option[Int] = None,
+  sparkContext: SparkContext
+)
+    extends Producer[RDD[() => InputStream]] with Ai2SimpleStepInfo {
+  override def create = {
+    val pathsRdd =
+      numPartitions
+        .map(i => sparkContext.parallelize(paths.get.toVector, i))
+        .getOrElse(sparkContext.parallelize(paths.get.toVector))
+    val contentsRdd = pathsRdd.map {
+      path =>
+        () => new FileArtifact(new File(path)).read
+    }
+    contentsRdd
+  }
+
+  override def stepInfo =
+    super.stepInfo
+      .addParameters(
+        "paths" -> paths,
+        "numPartitions" -> numPartitions
+      )
+
+}
+
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/ReadStreamContents.scala b/spark/src/main/scala/org/allenai/pipeline/spark/ReadStreamContents.scala
new file mode 100644
index 0000000..25e37e9
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/ReadStreamContents.scala
@@ -0,0 +1,27 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.common.Resource
+import org.allenai.pipeline.{ StreamClosingIterator, Ai2StepInfo, Producer }
+
+import org.apache.spark.rdd.RDD
+
+import scala.io.Source
+
+import java.io.InputStream
+
+/** Created by rodneykinney on 5/24/15.
+  */
+case class ReadStreamContents(
+  streams: Producer[RDD[() => InputStream]]
+)
+    extends Producer[RDD[String]] with Ai2StepInfo {
+  override def create = {
+    streams.get.flatMap {
+      f =>
+        StreamClosingIterator(f()) {
+          is => Source.fromInputStream(is).getLines
+        }
+    }
+  }
+}
+
diff --git a/spark/src/main/scala/org/allenai/pipeline/spark/SparkPipeline.scala b/spark/src/main/scala/org/allenai/pipeline/spark/SparkPipeline.scala
new file mode 100644
index 0000000..29fbec9
--- /dev/null
+++ b/spark/src/main/scala/org/allenai/pipeline/spark/SparkPipeline.scala
@@ -0,0 +1,97 @@
+package org.allenai.pipeline.spark
+
+import java.net.URI
+
+import org.allenai.pipeline._
+import org.allenai.pipeline.s3.{ S3Credentials, S3Pipeline }
+
+import com.typesafe.config.Config
+import org.apache.spark.SparkContext
+import org.apache.spark.rdd.RDD
+
+import scala.reflect.ClassTag
+
+/** Created by rodneykinney on 5/24/15.
+  */
+trait SparkPipeline extends Pipeline {
+  def sparkContext: SparkContext
+
+  def persistRdd[T: ClassTag: StringSerializable](
+    original: Producer[RDD[T]],
+    name: String = null,
+    suffix: String = ""
+  ) = {
+    val io = new PartitionedRddIo[T](sparkContext)
+    val stepName = Option(name).getOrElse(original.stepInfo.className)
+    val path = s"data/$stepName.${hashId(original, io)}$suffix"
+    val artifact = createOutputArtifact[PartitionedRddArtifact](path)
+    // Similar to Iterators, RDDs are lazy data structures.
+    // Caching them in memory will generally force re-calculation of the results
+    // Therefore, when persisting, we disable in-memory caching so that
+    // the results will always be read from the persistent storage
+    persistToArtifact(original, io, artifact, stepName).withCachingDisabled
+  }
+
+  override def urlToArtifact = UrlToArtifact.chain(super.urlToArtifact, CreateRddArtifacts.fromFileUrls)
+}
+
+trait SparkS3Pipeline extends SparkPipeline with S3Pipeline {
+  override def urlToArtifact = UrlToArtifact.chain(super.urlToArtifact, CreateRddArtifacts.fromS3Urls(credentials))
+}
+
+trait ConfiguredSparkPipeline extends SparkPipeline with ConfiguredPipeline {
+  override def persistRdd[T: ClassTag: StringSerializable](
+    original: Producer[RDD[T]],
+    name: String = null,
+    suffix: String = ""
+  ) = {
+    val io = new PartitionedRddIo[T](sparkContext)
+    persist(original, io, name, suffix)
+  }
+}
+
+object SparkPipeline {
+  def apply(
+    sc: SparkContext,
+    rootUrl: URI,
+    awsCredentials: S3Credentials = null
+  ) =
+    awsCredentials match {
+      case null =>
+        new SparkPipeline {
+          override def rootOutputUrl = rootUrl
+          override def sparkContext: SparkContext = sc
+        }
+      case _ =>
+        new SparkS3Pipeline {
+          override def rootOutputUrl = rootUrl
+          override def credentials = awsCredentials
+
+          override def sparkContext: SparkContext = sc
+        }
+    }
+
+  def configured(
+    cfg: Config,
+    sc: SparkContext,
+    awsCredentials: S3Credentials = null
+  ) =
+    awsCredentials match {
+      case null =>
+        new ConfiguredSparkPipeline {
+          override def sparkContext: SparkContext = sc
+
+          override val config: Config = cfg
+        }
+      case _ =>
+        new ConfiguredSparkPipeline with SparkS3Pipeline {
+          override def sparkContext: SparkContext = sc
+
+          override val config: Config = cfg
+
+          override def credentials = awsCredentials
+
+        }
+    }
+}
+
diff --git a/spark/src/test/resources/file1.txt b/spark/src/test/resources/file1.txt
new file mode 100644
index 0000000..94ebaf9
--- /dev/null
+++ b/spark/src/test/resources/file1.txt
@@ -0,0 +1,4 @@
+1
+2
+3
+4
diff --git a/spark/src/test/resources/file2.txt b/spark/src/test/resources/file2.txt
new file mode 100644
index 0000000..3aac70f
--- /dev/null
+++ b/spark/src/test/resources/file2.txt
@@ -0,0 +1,4 @@
+5
+6
+7
+8
diff --git a/spark/src/test/resources/file3.txt b/spark/src/test/resources/file3.txt
new file mode 100644
index 0000000..0d068b5
--- /dev/null
+++ b/spark/src/test/resources/file3.txt
@@ -0,0 +1,4 @@
+9
+10
+11
+12
diff --git a/spark/src/test/resources/file4.txt b/spark/src/test/resources/file4.txt
new file mode 100644
index 0000000..82d35ea
--- /dev/null
+++ b/spark/src/test/resources/file4.txt
@@ -0,0 +1,4 @@
+13
+14
+15
+16
diff --git a/spark/src/test/resources/logback-test.xml b/spark/src/test/resources/logback-test.xml
new file mode 100644
index 0000000..012bb6b
--- /dev/null
+++ b/spark/src/test/resources/logback-test.xml
@@ -0,0 +1,22 @@
+<!--
+  Logback configuration for unit tests
+
+  Logging level ERROR
+
+  Don't log to stdout
+-->
+<configuration>
+    <!-- Appender to a file named based on the application name. -->
+    <appender name="FILE" class="ch.qos.logback.core.FileAppender">
+        <file>logs/unit-tests.log</file>
+        <encoder>
+            <pattern>%-5level %logger{36} [%d{HH:mm:ss.SSS}][%thread]: %msg%n</pattern>
+        </encoder>
+    </appender>
+
+    <root level="${logback_rootLevel:-ERROR}">
+        <appender-ref ref="FILE" />
+    </root>
+
+    <logger name="org.allenai" level="${logback_s2Level:-DEBUG}" />
+</configuration>
diff --git a/spark/src/test/resources/paths.txt b/spark/src/test/resources/paths.txt
new file mode 100644
index 0000000..e3a10d2
--- /dev/null
+++ b/spark/src/test/resources/paths.txt
@@ -0,0 +1,4 @@
+src/test/resources/file1.txt
+src/test/resources/file2.txt
+src/test/resources/file3.txt
+src/test/resources/file4.txt
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/SparkTest.scala b/spark/src/test/scala/org/allenai/pipeline/spark/SparkTest.scala
new file mode 100644
index 0000000..3493337
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/SparkTest.scala
@@ -0,0 +1,20 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.common.testkit.UnitSpec
+
+import org.apache.spark.{ SparkConf, SparkContext }
+
+/** Created by rodneykinney on 5/24/15.
+  */
+class SparkTest extends UnitSpec {
+  val sparkContext = SparkTest.sparkContext
+}
+
+object SparkTest {
+  val sparkContext = {
+    val conf = new SparkConf()
+    conf.set("spark.master", "local")
+    conf.set("spark.app.name", "unit-test")
+    new SparkContext(conf)
+  }
+}
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/TestPersistRdd.scala b/spark/src/test/scala/org/allenai/pipeline/spark/TestPersistRdd.scala
new file mode 100644
index 0000000..b7cdd51
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/TestPersistRdd.scala
@@ -0,0 +1,60 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.common.testkit.ScratchDirectory
+import org.allenai.pipeline._
+
+import scala.util.Random
+
+import java.io.File
+
+/** Created by rodneykinney on 5/24/15.
+  */
+class TestPersistRdd extends SparkTest with ScratchDirectory {
+
+  import org.allenai.pipeline.IoHelpers._
+
+  "RddProducer" should "persist correctly" in {
+    val outputFile = new File(scratchDir, "test-persist")
+    val partitionCount = 10
+    val rand = new Random()
+    val rows = (0 until 100).map(i => rand.nextDouble)
+    val p = Producer.fromMemory(sparkContext.parallelize(rows, partitionCount))
+    val pp = {
+      val af = ArtifactFactory(CreateRddArtifacts.fromFileUrls)
+      val outputArtifact = af.createArtifact[PartitionedRddArtifact](outputFile.toURI)
+      new ProducerWithPersistence(p, new PartitionedRddIo[Double](sparkContext), outputArtifact)
+    }
+    val result = pp.get
+
+    result.collect().toSet should equal(rows.toSet)
+
+    pp.artifact.exists should be(true)
+
+    // Should create one file for each partition, plus one for the _SUCCESS file
+    outputFile.listFiles.size should be(partitionCount + 1)
+  }
+
+  it should "not cache persisted RDDs in memory" in {
+    val rand = new Random()
+    val doubles = (0 until 1000).map(i => rand.nextDouble)
+    val dir = new File(scratchDir, "persistCache")
+    val numbersFile = new File(dir, "numbers.txt")
+    LineCollectionIo.text[Double].write(doubles, new FileArtifact(numbersFile))
+
+    val pipeline = SparkPipeline(sparkContext, dir.toURI)
+
+    val numbersProducer = {
+      val read = Producer.fromMemory(sparkContext.textFile(numbersFile.getPath).map(_.toDouble))
+      pipeline.persistRdd(read)
+    }
+
+    val first = numbersProducer.get.collect().toSet
+    // If the RDD is recomputed, deleting this file will cause it to fail
+    numbersFile.delete()
+    // Should succeed because RDD is read from persisted location
+    val second = numbersProducer.get.collect().toSet
+
+    first should equal(second)
+  }
+
+}
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/TestReadObjects.scala b/spark/src/test/scala/org/allenai/pipeline/spark/TestReadObjects.scala
new file mode 100644
index 0000000..7ad9568
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/TestReadObjects.scala
@@ -0,0 +1,18 @@
+package org.allenai.pipeline.spark
+
+import org.allenai.pipeline.IoHelpers._
+
+import java.io.File
+
+/** Created by rodneykinney on 5/24/15.
+  */
+class TestReadObjects extends SparkTest {
+  "ReadObjectRdd" should "read from files" in {
+    val paths = Read.Collection.fromText[String](new File("src/test/resources/paths.txt"))
+    val objects = ReadObjectRddFromFiles[Int](paths, sparkContext)
+    val result = objects.get.collect().toSet
+
+    result should equal((1 to 16).toSet)
+  }
+
+}
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountLines.scala b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountLines.scala
new file mode 100644
index 0000000..bf838ba
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountLines.scala
@@ -0,0 +1,19 @@
+package org.allenai.pipeline.spark.examples
+
+import org.allenai.pipeline.{ Ai2StepInfo, Producer }
+
+import org.apache.spark.rdd.RDD
+
+/** Created by rodneykinney on 5/24/15.
+  */
+case class CountLines(
+  lines: Producer[RDD[String]],
+  countBlanks: Boolean = true
+)
+    extends Producer[Long] with Ai2StepInfo {
+  override protected def create: Long =
+    if (countBlanks)
+      lines.get.count()
+    else
+      lines.get.filter(_.trim.length > 0).count()
+}
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWords.scala b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWords.scala
new file mode 100644
index 0000000..292cec0
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWords.scala
@@ -0,0 +1,17 @@
+package org.allenai.pipeline.spark.examples
+
+import org.allenai.pipeline.{ Ai2StepInfo, Producer }
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.SparkContext._
+
+/** Created by rodneykinney on 5/24/15.
+  */
+case class CountWords(lines: Producer[RDD[String]])
+    extends Producer[RDD[(String, Int)]] with Ai2StepInfo {
+  override def create = {
+    val words = lines.get.flatMap(s => s.split("\\s+"))
+    words.map(s => (s, 1))
+      .reduceByKey(_ + _)
+  }
+}
diff --git a/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWordsAndLinesPipeline.scala b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWordsAndLinesPipeline.scala
new file mode 100644
index 0000000..b7b21a9
--- /dev/null
+++ b/spark/src/test/scala/org/allenai/pipeline/spark/examples/CountWordsAndLinesPipeline.scala
@@ -0,0 +1,52 @@
+package org.allenai.pipeline.spark.examples
+
+import org.allenai.pipeline.IoHelpers._
+import org.allenai.pipeline.{ CreateCoreArtifacts, ArtifactFactory, Pipeline }
+import org.allenai.pipeline.spark.{ SparkPipeline, CreateRddArtifacts, ReadObjectRddFromFiles }
+
+import org.apache.spark.{ SparkContext, SparkConf }
+
+import java.io.File
+
+/** Created by rodneykinney on 5/24/15.
+  */
+object CountWordsAndLinesPipeline extends App {
+  val sc = initSparkContext()
+
+  val outputDir = new File("pipeline-output")
+
+  val pipeline = {
+    new Pipeline with SparkPipeline {
+      override def rootOutputUrl = new File(System.getProperty("user.dir")).toURI
+      val sparkContext = sc
+    }
+  }
+
+  // Define our input:  A collection of lines read from an inputFile
+  val files = Read.Collection.fromText[String](new File("src/test/resources/paths.txt"))
+  val lines = {
+    val lineRdd = ReadObjectRddFromFiles[String](files, sc, Some(6))
+    pipeline.persistRdd(lineRdd)
+  }
+
+  val countWords = {
+    val count = CountWords(lines)
+    implicit val format = tuple2ColumnFormat[String, Int]()
+    pipeline.persistRdd(count)
+  }
+
+  val countLines = {
+    val count = CountLines(lines)
+    pipeline.Persist.Singleton.asText(count)
+  }
+
+  pipeline.run("CountWordsAndLines")
+
+  def initSparkContext() = {
+    val conf = new SparkConf()
+    conf.set("spark.master", "local")
+    conf.set("spark.app.name", "sample-pipeline")
+    new SparkContext(conf)
+  }
+
+}
diff --git a/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala b/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala
deleted file mode 100644
index 6acef17..0000000
--- a/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala
+++ /dev/null
@@ -1,102 +0,0 @@
-package org.allenai.pipeline
-
-import java.io.File
-import java.net.URI
-
-/** Factory interface for creating flat Artifact instances. */
-trait FlatArtifactFactory[T] {
-  def flatArtifact(input: T): FlatArtifact
-}
-
-/** Factory interface for creating structured Artifact instances. */
-trait StructuredArtifactFactory[T] {
-  def structuredArtifact(input: T): StructuredArtifact
-}
-
-trait ArtifactFactory[T] extends FlatArtifactFactory[T] with StructuredArtifactFactory[T]
-
-object ArtifactFactory {
-  def fromUrl(outputUrl: URI): ArtifactFactory[String] = {
-    outputUrl match {
-      case url if url.getScheme == "s3" || url.getScheme == "s3n" =>
-        new S3(S3Config(url.getHost), Some(url.getPath))
-      case url if url.getScheme == "file" || url.getScheme == null =>
-        new RelativeFileSystem(new File(url.getPath))
-      case _ => sys.error(s"Illegal dir: $outputUrl")
-    }
-  }
-  def flatArtifactFromAbsoluteUrl(s: String): Option[FlatArtifact] = {
-    val url = new URI(s)
-    url.getScheme() match {
-      case "file" =>
-        Some(new FileArtifact(new File(url.getPath)))
-      case "s3" | "s3n" =>
-        Some(new S3FlatArtifact(url.getPath.dropWhile(_ == '/'), S3Config(url.getHost)))
-      case _ => None
-    }
-  }
-  def structuredArtifactFromAbsoluteUrl(s: String): Option[StructuredArtifact] = {
-    val url = new URI(s)
-    url.getScheme() match {
-      case "file" if s.endsWith(".zip") =>
-        Some(new ZipFileArtifact(new File(url.getPath)))
-      case "file" =>
-        Some(new DirectoryArtifact(new File(url.getPath)))
-      case "s3" | "s3n" =>
-        Some(new S3ZipArtifact(url.getPath.dropWhile(_ == '/'), S3Config(url.getHost)))
-      case _ => None
-    }
-  }
-}
-
-class RelativeFileSystem(rootDir: File)
-    extends ArtifactFactory[String] {
-  private def toFile(path: String): File = new File(rootDir, path)
-
-  override def flatArtifact(name: String): FlatArtifact = new FileArtifact(toFile(name))
-
-  override def structuredArtifact(name: String): StructuredArtifact = {
-    val file = toFile(name)
-    if (file.exists && file.isDirectory) {
-      new DirectoryArtifact(file)
-    } else {
-      new ZipFileArtifact(file)
-    }
-  }
-}
-
-object AbsoluteFileSystem extends ArtifactFactory[File] {
-  override def flatArtifact(file: File): FlatArtifact = new FileArtifact(file)
-
-  override def structuredArtifact(file: File): StructuredArtifact = {
-    if (file.exists && file.isDirectory) {
-      new DirectoryArtifact(file)
-    } else {
-      new ZipFileArtifact(file)
-    }
-  }
-
-  def usingPaths: ArtifactFactory[String] =
-    new ArtifactFactory[String] {
-      override def flatArtifact(path: String): FlatArtifact =
-        AbsoluteFileSystem.flatArtifact(new File(path))
-
-      override def structuredArtifact(path: String): StructuredArtifact =
-        AbsoluteFileSystem.structuredArtifact(new File(path))
-    }
-}
-
-class S3(config: S3Config, rootPath: Option[String] = None)
-    extends ArtifactFactory[String] {
-  // Drop leading and training slashes
-  private def toPath(path: String): String = rootPath match {
-    case None => path
-    case Some(dir) =>
-      val base = dir.dropWhile(_ == '/').reverse.dropWhile(_ == '/').reverse
-      s"$base/$path"
-  }
-
-  override def flatArtifact(path: String): FlatArtifact = new S3FlatArtifact(toPath(path), config)
-
-  override def structuredArtifact(path: String): StructuredArtifact = new S3ZipArtifact(toPath(path), config)
-}
diff --git a/src/main/scala/org/allenai/pipeline/Pipeline.scala b/src/main/scala/org/allenai/pipeline/Pipeline.scala
deleted file mode 100644
index 32446ae..0000000
--- a/src/main/scala/org/allenai/pipeline/Pipeline.scala
+++ /dev/null
@@ -1,359 +0,0 @@
-package org.allenai.pipeline
-
-import org.allenai.common.Config._
-import org.allenai.common.Logging
-
-import com.typesafe.config.Config
-import spray.json.DefaultJsonProtocol._
-import spray.json.JsonFormat
-import org.allenai.pipeline.IoHelpers._
-
-import scala.collection.mutable.ListBuffer
-import scala.collection.parallel.mutable
-import scala.reflect.ClassTag
-
-import java.io.File
-import java.net.URI
-import java.text.SimpleDateFormat
-import java.util.Date
-
-import scala.util.control.NonFatal
-
-/** A fully-configured end-to-end pipeline
-  * A top-level main can be constructed as:
-  * object Foo extends App with Step1 with Step2 {
-  * run(step1, step2)
-  * }
-  */
-trait Pipeline extends Logging {
-  def artifactFactory: FlatArtifactFactory[String] with StructuredArtifactFactory[String]
-
-  protected[this] val persistedSteps: ListBuffer[Producer[_]] = ListBuffer()
-
-  /** Run the pipeline.  All steps that have been persisted will be computed, along with any upstream dependencies */
-  def run(title: String) = {
-    runPipelineReturnResults(title, persistedSteps.toSeq)
-  }
-
-  def runOne[T](target: Producer[T]) =
-    runOnly(target.stepInfo.className, List(target): _*).toList(0).asInstanceOf[T]
-
-  /** Run only specified steps in the pipeline.  Upstream dependencies must exist already.  They will not be computed */
-  def runOnly(title: String, runOnlyTargets: Producer[_]*) = {
-    val targets = runOnlyTargets.flatMap(s => persistedSteps.find(_.stepInfo.signature == s.stepInfo.signature))
-    require(targets.size == runOnlyTargets.size, "Specified targets are not members of this pipeline")
-
-    val persistedStepsInfo = persistedSteps.map(_.stepInfo).toSet
-    val overridenStepsInfo = targets.map(_.stepInfo).toSet
-    val nonPersistedTargets = overridenStepsInfo -- persistedStepsInfo
-    require(
-      nonPersistedTargets.size == 0,
-      s"Running a pipeline without persisting the output: [${nonPersistedTargets.map(_.className).mkString(",")}]"
-    )
-    val allDependencies = targets.flatMap(Workflow.upstreamDependencies)
-    val nonExistentDependencies =
-      for {
-        p <- allDependencies if p.isInstanceOf[PersistedProducer[_, _ <: Artifact]]
-        pp = p.asInstanceOf[PersistedProducer[_, _ <: Artifact]]
-        if !overridenStepsInfo(pp.stepInfo)
-        if !pp.artifact.exists
-      } yield pp.stepInfo
-    require(nonExistentDependencies.size == 0, {
-      val targetNames = overridenStepsInfo.map(_.className).mkString(",")
-      val dependencyNames = nonExistentDependencies.map(_.className).mkString(",")
-      s"Cannot run steps [$targetNames]. Upstream dependencies [$dependencyNames] have not been computed"
-    })
-    runPipelineReturnResults(title, targets)
-  }
-
-  private def runPipelineReturnResults(rawTitle: String, outputs: Iterable[Producer[_]]) = {
-    val result = try {
-      val start = System.currentTimeMillis
-      val result = outputs.map(_.get)
-      val duration = (System.currentTimeMillis - start) / 1000.0
-      logger.info(f"Ran pipeline in $duration%.3f s")
-      result
-    } catch {
-      case NonFatal(e) =>
-        logger.error("Untrapped exception", e)
-        List()
-    }
-
-    val title = rawTitle.replaceAll("""\s+""", "-")
-    val today = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss").format(new Date())
-
-    val workflowArtifact = artifactFactory.flatArtifact(s"summary/$title-$today.workflow.json")
-    val workflow = Workflow.forPipeline(outputs.toSeq: _*)
-    SingletonIo.json[Workflow].write(workflow, workflowArtifact)
-
-    val htmlArtifact = artifactFactory.flatArtifact(s"summary/$title-$today.html")
-    SingletonIo.text[String].write(workflow.renderHtml, htmlArtifact)
-
-    val signatureArtifact = artifactFactory.flatArtifact(s"summary/$title-$today.signatures.json")
-    val signatureFormat = Signature.jsonWriter
-    val signatures = outputs.map(p => signatureFormat.write(p.stepInfo.signature)).toList.toJson
-    signatureArtifact.write { writer => writer.write(signatures.prettyPrint) }
-
-    logger.info(s"Summary written to ${toHttpUrl(htmlArtifact.url)}")
-    result
-  }
-
-  def persist[T, A <: Artifact: ClassTag](
-    original: Producer[T],
-    io: ArtifactIo[T, A],
-    fileName: Option[String] = None,
-    suffix: String = ""
-  ): PersistedProducer[T, A] = {
-    makePersisted(original, io, fileName.getOrElse(signaturePath(original, io, suffix)), artifactFactory)
-  }
-
-  protected def signaturePath[T, A <: Artifact](p: Producer[T], io: ArtifactIo[T, A], suffix: String) = {
-    // Although the persistence method does not affect the signature
-    // (the same object will be returned in all cases), it is used
-    // to determine the output path, to avoid parsing incompatible data
-    val signature = p.stepInfo.copy(
-      dependencies = p.stepInfo.dependencies + ("io" -> io)
-    ).signature
-    s"${signature.name}.${signature.id}$suffix"
-  }
-
-  protected def makePersisted[T, A <: Artifact: ClassTag](
-    original: Producer[T],
-    io: ArtifactIo[T, A],
-    path: String,
-    artifactFactory: FlatArtifactFactory[String] with StructuredArtifactFactory[String]
-  ): PersistedProducer[T, A] = {
-    val persisted =
-      implicitly[ClassTag[A]].runtimeClass match {
-        case c if c == classOf[FlatArtifact] =>
-          val artifact = ArtifactFactory.flatArtifactFromAbsoluteUrl(path)
-            .getOrElse(artifactFactory.flatArtifact(s"data/$path")).asInstanceOf[A]
-          original.persisted(io, artifact)
-        case c if c == classOf[StructuredArtifact] =>
-          val artifact = ArtifactFactory.structuredArtifactFromAbsoluteUrl(path)
-            .getOrElse(artifactFactory.structuredArtifact(s"data/$path")).asInstanceOf[A]
-          original.persisted(io, artifact)
-        case _ => sys.error(s"Cannot persist using io class of unknown type $io")
-      }
-    persistedSteps += persisted
-    persisted
-
-  }
-
-  object Persist {
-
-    object Iterator {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[Iterator[T]],
-        path: Option[String] = None,
-        suffix: String = ""
-      ): PersistedProducer[Iterator[T], FlatArtifact] =
-        persist(step, LineIteratorIo.text[T], path, suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[Iterator[T]],
-        path: Option[String] = None,
-        suffix: String = ""
-      )(): PersistedProducer[Iterator[T], FlatArtifact] =
-        persist(step, LineIteratorIo.json[T], path, suffix)
-    }
-
-    object Collection {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[Iterable[T]],
-        path: Option[String] = None,
-        suffix: String = ""
-      )(): PersistedProducer[Iterable[T], FlatArtifact] =
-        persist(step, LineCollectionIo.text[T], path, suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[Iterable[T]],
-        path: Option[String] = None,
-        suffix: String = ""
-      )(): PersistedProducer[Iterable[T], FlatArtifact] =
-        persist(step, LineCollectionIo.json[T], path, suffix)
-    }
-
-    object Singleton {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[T],
-        path: Option[String] = None,
-        suffix: String = ""
-      )(): PersistedProducer[T, FlatArtifact] =
-        persist(step, SingletonIo.text[T], path, suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[T],
-        path: Option[String] = None,
-        suffix: String = ""
-      )(): PersistedProducer[T, FlatArtifact] =
-        persist(step, SingletonIo.json[T], path, suffix)
-    }
-
-  }
-
-  // Convert S3 URLs to an http: URL viewable in a browser
-  def toHttpUrl(url: URI): URI = {
-    url.getScheme match {
-      case "s3" | "s3n" =>
-        new java.net.URI("http", s"${
-          url.getHost
-        }.s3.amazonaws.com", url.getPath, null)
-      case "file" =>
-        new java.net.URI(null, null, url.getPath, null)
-      case _ => url
-    }
-  }
-
-  def dryRun(outputDir: File, rawTitle: String): Iterable[Any] = {
-    val outputs = persistedSteps.toList
-    val title = s"${rawTitle.replaceAll("""\s+""", "-")}-dryRun"
-    val workflowArtifact = new FileArtifact(new File(outputDir, s"$title.workflow.json"))
-    val workflow = Workflow.forPipeline(outputs: _*)
-    SingletonIo.json[Workflow].write(workflow, workflowArtifact)
-
-    val htmlArtifact = new FileArtifact(new File(outputDir, s"$title.html"))
-    SingletonIo.text[String].write(workflow.renderHtml, htmlArtifact)
-
-    val signatureArtifact = new FileArtifact(new File(outputDir, s"$title.signatures.json"))
-    val signatureFormat = Signature.jsonWriter
-    val signatures = outputs.map(p => signatureFormat.write(p.stepInfo.signature)).toList.toJson
-    signatureArtifact.write { writer => writer.write(signatures.prettyPrint) }
-
-    logger.info(s"Summary written to $outputDir")
-    List()
-  }
-}
-
-trait ConfiguredPipeline extends Pipeline {
-  def config: Config
-
-  protected[this] val persistedStepsByConfigKey =
-    scala.collection.mutable.Map.empty[String, Producer[_]]
-
-  private lazy val runOnlySteps = config.get[String]("runOnly").map(_.split(",").toSet).getOrElse(Set.empty[String])
-  private lazy val tmpOutput = config.get[String]("tmpOutput").map(s => ArtifactFactory.fromUrl(new URI(s)))
-  def isRunOnlyStep(stepName: String) = runOnlySteps(stepName)
-
-  override def run(rawTitle: String) = {
-    config.get[Boolean]("dryRun") match {
-      case Some(true) => dryRun(new File(System.getProperty("user.dir")), rawTitle)
-      case _ =>
-        config.get[String]("runOnly") match {
-          case Some(stepConfigKeys) =>
-            val (matchedNames, unmatchedNames) = runOnlySteps.partition(persistedStepsByConfigKey.contains)
-            unmatchedNames.size match {
-              case 0 =>
-                val matches = matchedNames.map(persistedStepsByConfigKey)
-                runOnly(rawTitle, matches.toList: _*)
-              case 1 =>
-                sys.error(s"Unknown step name: ${unmatchedNames.head}")
-              case _ =>
-                sys.error(s"Unknown step names: [${unmatchedNames.mkString(",")}]")
-            }
-          case _ => super.run(rawTitle)
-        }
-    }
-  }
-
-  override lazy val artifactFactory = {
-    val url = new URI(config.getString("output.dir"))
-    val path = url.getPath.stripSuffix("/")
-    val outputDirUrl = new URI(url.getScheme, url.getHost, path, null)
-    ArtifactFactory.fromUrl(outputDirUrl)
-  }
-
-  def optionallyPersist[T, A <: Artifact: ClassTag](
-    original: Producer[T],
-    stepName: String,
-    io: ArtifactIo[T, A],
-    suffix: String = ""
-  ): Producer[T] = {
-
-    val configKey = s"output.persist.$stepName"
-    if (config.hasPath(configKey)) {
-      config.getValue(configKey).unwrapped() match {
-        case java.lang.Boolean.TRUE =>
-          val p =
-            if (isRunOnlyStep(stepName) && tmpOutput.isDefined) {
-              makePersisted(original, io, signaturePath(original, io, suffix), tmpOutput.get)
-            } else {
-              persist(original, io, None, suffix)
-            }
-          persistedStepsByConfigKey(stepName) = p
-          p
-        case path: String =>
-          val p = if (isRunOnlyStep(stepName) && tmpOutput.isDefined) {
-            makePersisted(original, io, path, tmpOutput.get)
-          } else {
-            persist(original, io, Some(path))
-          }
-          persistedStepsByConfigKey(stepName) = p
-          p
-        case _ => original
-      }
-    } else {
-      original
-    }
-  }
-
-  object OptionallyPersist {
-
-    object Iterator {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[Iterator[T]],
-        stepName: String,
-        suffix: String = ""
-      ): Producer[Iterator[T]] =
-        optionallyPersist(step, stepName, LineIteratorIo.text[T], suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[Iterator[T]],
-        stepName: String,
-        suffix: String = ""
-      ): Producer[Iterator[T]] =
-        optionallyPersist(step, stepName, LineIteratorIo.json[T], suffix)
-    }
-
-    object Collection {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[Iterable[T]],
-        stepName: String,
-        suffix: String = ""
-      )(): Producer[Iterable[T]] =
-        optionallyPersist(step, stepName, LineCollectionIo.text[T], suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[Iterable[T]],
-        stepName: String,
-        suffix: String = ""
-      )(): Producer[Iterable[T]] =
-        optionallyPersist(step, stepName, LineCollectionIo.json[T], suffix)
-    }
-
-    object Singleton {
-      def asText[T: StringSerializable: ClassTag](
-        step: Producer[T],
-        stepName: String,
-        suffix: String = ""
-      )(): Producer[T] =
-        optionallyPersist(step, stepName, SingletonIo.text[T], suffix)
-
-      def asJson[T: JsonFormat: ClassTag](
-        step: Producer[T],
-        stepName: String,
-        suffix: String = ""
-      )(): Producer[T] =
-        optionallyPersist(step, stepName, SingletonIo.json[T], suffix)
-    }
-
-  }
-
-}
-
-object ConfiguredPipeline {
-  def apply(cfg: Config): ConfiguredPipeline =
-    new ConfiguredPipeline {
-      val config = cfg
-    }
-}
diff --git a/src/main/scala/org/allenai/pipeline/package.scala b/src/main/scala/org/allenai/pipeline/package.scala
deleted file mode 100644
index 38a9d16..0000000
--- a/src/main/scala/org/allenai/pipeline/package.scala
+++ /dev/null
@@ -1,4 +0,0 @@
-package org.allenai
-
-package object pipeline {
-}
diff --git a/src/test/scala/org/allenai/pipeline/TestPipeline.scala b/src/test/scala/org/allenai/pipeline/TestPipeline.scala
deleted file mode 100644
index 2548e15..0000000
--- a/src/test/scala/org/allenai/pipeline/TestPipeline.scala
+++ /dev/null
@@ -1,91 +0,0 @@
-package org.allenai.pipeline
-
-import java.io.File
-
-import org.allenai.common.testkit.{ ScratchDirectory, UnitSpec }
-
-class TestPipeline extends UnitSpec with ScratchDirectory {
-  "Pipeline" should "run all persisted targets" in {
-    val p1 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 1
-    }
-    val p2 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 2
-    }
-    val p3 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 3
-    }
-    val p4 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 4
-    }
-    val outputDir = new File(scratchDir, "test1")
-    val pipeline = new Pipeline {
-      val artifactFactory = new RelativeFileSystem(outputDir)
-    }
-
-    import IoHelpers._
-    val format = SingletonIo.text[Int]
-    pipeline.persist(p1, format, Some("prod1"))
-    pipeline.persist(p2, format, Some("prod2"))
-    pipeline.persist(p3, format, Some("prod3"))
-    val p4Persisted = pipeline.persist(p4, format, Some("prod4"))
-
-    an[IllegalArgumentException] should be thrownBy {
-      val p5 = new Producer[Int] with Ai2SimpleStepInfo {
-        override def create = 5
-        override def stepInfo = super.stepInfo.addParameters(("upstream", p4Persisted))
-      }
-      pipeline.persist(p5, format, Some("prod5"))
-      // p5 has p4 as a dependency, but p4 has not been computed yet
-      pipeline.runOnly("test", p5)
-    }
-
-    pipeline.runOnly("test", p1)
-    new File(outputDir, "data/prod1") should exist
-    new File(outputDir, "data/prod2") should not(exist)
-
-    pipeline.run("test")
-    new File(outputDir, "data/prod2") should exist
-    new File(outputDir, "data/prod3") should exist
-    new File(outputDir, "data/prod4") should exist
-
-    an[IllegalArgumentException] should be thrownBy {
-      val p5 = new Producer[Int] with Ai2SimpleStepInfo {
-        override def create = 5
-      }
-      // p5 has not been persisted, so we should not specify it as an output
-      pipeline.runOnly("test", p5)
-    }
-
-    val p5 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 5
-      override def stepInfo = super.stepInfo.addParameters(("upstream", p4))
-    }
-    pipeline.persist(p5, format, Some("prod5"))
-    // p5 is persisted and its dependency exists.  All clear!
-    pipeline.runOnly("test", p5)
-  }
-
-  "Pipeline" should "respect both relative and absolute persistence paths" in {
-    val p1 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 1
-    }
-    val p2 = new Producer[Int] with Ai2SimpleStepInfo {
-      override def create = 2
-    }
-
-    val outputDir = new File(scratchDir, "test2")
-    val pipeline = new Pipeline {
-      val artifactFactory = new RelativeFileSystem(outputDir)
-    }
-
-    import IoHelpers._
-    val format = SingletonIo.text[Int]
-    pipeline.persist(p1, format, Some("prod1"))
-    pipeline.persist(p2, format, Some(s"file://$scratchDir/absolute-path/prod2"))
-
-    pipeline.run("test")
-    new File(scratchDir, "absolute-path/prod2") should exist
-  }
-
-}
diff --git a/version.sbt b/version.sbt
index 6646735..b6a4dba 100644
--- a/version.sbt
+++ b/version.sbt
@@ -1 +1 @@
-version in ThisBuild := "1.2.3-SNAPSHOT"
\ No newline at end of file
+version in ThisBuild := "1.3.0-SNAPSHOT"