This repository has been archived by the owner on Jul 7, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 15
Named steps and other goodies #31
Closed
Closed
Changes from all commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
89a1a1b
Setting version to 1.2.1
rodneykinney 1de79fa
Setting version to 1.2.2-SNAPSHOT
rodneykinney aa744e1
Add tmpOutput option to runOne()
rodneykinney 57bf299
Merge branch 'master' of https://github.com/allenai/pipeline
rodneykinney 58fc23f
Change type of output override in runOne()
rodneykinney 755baad
Change runOne() signature
rodneykinney 9d76549
Cleanup
rodneykinney a72721c
Refactor ArtifactFactory
rodneykinney 2bad5f5
Cleanup
rodneykinney 3b7e599
Refactor ArtifactFactory
rodneykinney fe708de
Refactory ArtifactFactory
rodneykinney bb8ba4b
Add S3Config factory method
rodneykinney 6f82a27
ExecuteShellCommand
rodneykinney 1a6d2e2
Refactor ExecuteShellCommand
rodneykinney 6525dfe
Refactor ExecuteShellCommand
rodneykinney d61fdc5
Merge branch 'master' into artifact-factory
rodneykinney f6606ca
Add implicit conversions, BasicPipeline example
rodneykinney 9716ef9
Remove path parameter from persist(). Update README
rodneykinney 730cf08
Improve HTML
rodneykinney 98243e0
Reformat
rodneykinney aa92617
ExecutionInfo
rodneykinney aa228ce
Rename ExecuteShellCommand -> ExternalProcess
rodneykinney 023c157
TrainModel example pipeline
rodneykinney 0d16b79
Add python-based model-training example pipeline
rodneykinney b72a392
Reorganize ExternalProcess
rodneykinney 7caa3a0
Rename Serializer/Deserializer. Update README
rodneykinney 5a985fd
Merge branch 'master' of https://github.com/allenai/pipeline
rodneykinney 6ade036
Rounded corners
rodneykinney 1c7f08e
Default AWS credentials
rodneykinney 518d533
Merge branch 'master' of https://github.com/allenai/pipeline into art…
rodneykinney 88a6fe9
Merge branch 'master' of https://github.com/rodneykinney/pipeline int…
rodneykinney 8adfc14
Reformat
rodneykinney 6b6dbe6
Merge branch 'master' into artifact-factory
rodneykinney d8ef83b
Refactor ArtifactFactory
rodneykinney 3685238
Refactor ArtifactFactory
rodneykinney 52e2d56
Rename, add comments
rodneykinney 1b66768
Clean up warnings
rodneykinney cfd8422
If ExternalProcess has OutputFileToken, check that it actually writes…
jefeweisen ad1abce
Create core/s3 sub-projects
rodneykinney 71da784
Move S3 classes to s3 project
rodneykinney 949a309
rename ExternalProcess.apply to ExternalProcess.a
jefeweisen 031e3a2
comments
jefeweisen 28ea81e
Move ExternalProcess.a to RunExternalProcess.a
jefeweisen dcde424
Moves:
jefeweisen 8a4043d
Make default constructor of RunExternalProcess private to conceal kno…
jefeweisen a69651d
Rename RunExternalProcess.a -> RunExternalProcess.apply
jefeweisen 27e3fbd
Fix oops: forgot to add these files
jefeweisen fd14085
Implement versionHistory on RunExternalProcess
jefeweisen 24d3f40
Add VersionedResource
rodneykinney 7443f59
Simplify artifact creation
rodneykinney 0f4999f
PartitionedRddArtifact and Io
rodneykinney 2b11617
Rdd Persistence working in unit tests
rodneykinney e6acdb3
Reformat
rodneykinney 915a3a1
Cleanup
rodneykinney ca2aa49
Rdd Object initialization and sample pipeline
rodneykinney b3d0aea
Clean up style, logging, format
rodneykinney 36881f8
Add build.sbt
rodneykinney 4b5ed61
Update .gitignore
rodneykinney 591e5b6
Mix-in traits to define urlToArtifacte
rodneykinney 3e489a6
Clean up Pipeline factory methods
rodneykinney 79d9729
Merge pull request #2 from jefeweisen/check_for_external_process_output
rodneykinney a81e174
Merge pull request #3 from jefeweisen/external_process_abstraction
rodneykinney 6d784f3
Merge pull request #4 from jefeweisen/external_process_versionHistory
rodneykinney 74af917
Merge branch 'master' of https://github.com/rodneykinney/pipeline int…
rodneykinney 4fac31a
Fix test
rodneykinney 50005fd
Make Pipeline a trait after all.
rodneykinney 557d780
Resolve relative paths within ArtifactFactory
rodneykinney 412a550
Add Pipeline.createOutputArtifact
rodneykinney f93ea64
Name steps in Pipeline.persist()
rodneykinney cbd8c74
Bump version
rodneykinney 23abeb9
autoGeneratedPath => hashId
rodneykinney 799f09c
Handle empty RDD on read
rodneykinney 6f47a3c
Fix tests
rodneykinney 59ab7ff
Remove type param from PartitionedRddArtifact
rodneykinney af898e6
Fix CreateRddArtifacts
rodneykinney d39cf00
Fix step name resolution
rodneykinney bce37c2
Remove race condition
rodneykinney 9a601b7
LineCollectionIo className
rodneykinney c001a2a
Disable caching for persisted RDDs
rodneykinney b90884d
Deprecate Producer.persist()
rodneykinney efef9f2
Add runUntil. Custom wrapper for S3 credentials
rodneykinney ccd4a7d
Pipeline.persistCustom
rodneykinney e63ea47
Merge branch 'master' of https://github.com/allenai/pipeline
rodneykinney cc5267d
Fix persistRdd, fix addTarget
rodneykinney File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,3 +4,5 @@ | |
.settings | ||
bin/ | ||
target/ | ||
logs/ | ||
pipeline-output/ |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import Dependencies._ | ||
|
||
name := "pipeline-core" | ||
organization := "org.allenai" | ||
|
||
StylePlugin.enableLineLimit := false | ||
|
||
dependencyOverrides += "org.scala-lang" % "scala-reflect" % "2.11.5" | ||
libraryDependencies ++= Seq( | ||
sprayJson, | ||
commonsIO, | ||
ai2Common, | ||
allenAiTestkit % "test", | ||
scalaReflection | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
126 changes: 126 additions & 0 deletions
126
core/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
package org.allenai.pipeline | ||
|
||
import scala.reflect.ClassTag | ||
|
||
import java.io.File | ||
import java.net.URI | ||
|
||
/** Creates an Artifact from a URL | ||
*/ | ||
trait ArtifactFactory { | ||
/** @param url The location of the Artifact. The scheme (protocol) is used to determine the | ||
* specific implementation. | ||
* @tparam A The type of the Artifact to create. May be an abstract or concrete type | ||
* @return The artifact | ||
*/ | ||
def createArtifact[A <: Artifact: ClassTag](url: URI): A | ||
|
||
/** If path is an absolute URL, create an Artifact at that location. | ||
* If it is a relative path, create it relative to the given root URL | ||
*/ | ||
def createArtifact[A <: Artifact: ClassTag](rootUrl: URI, path: String): A = { | ||
val parsed = new URI(path) | ||
val url = parsed.getScheme match { | ||
case null => | ||
val fullPath = s"${rootUrl.getPath.reverse.dropWhile(_ == '/').reverse}/${parsed.getPath.dropWhile(_ == '/')}" | ||
new URI( | ||
rootUrl.getScheme, | ||
rootUrl.getHost, | ||
fullPath, | ||
rootUrl.getFragment | ||
) | ||
case _ => parsed | ||
} | ||
createArtifact[A](url) | ||
} | ||
} | ||
|
||
object ArtifactFactory { | ||
def apply(urlHandler: UrlToArtifact, fallbackUrlHandlers: UrlToArtifact*): ArtifactFactory = | ||
new ArtifactFactory { | ||
val urlHandlerChain = | ||
if (fallbackUrlHandlers.isEmpty) { | ||
urlHandler | ||
} else { | ||
UrlToArtifact.chain(urlHandler, fallbackUrlHandlers.head, fallbackUrlHandlers.tail: _*) | ||
} | ||
|
||
def createArtifact[A <: Artifact: ClassTag](url: URI): A = { | ||
val fn = urlHandlerChain.urlToArtifact[A] | ||
val clazz = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]] | ||
require(fn.isDefinedAt(url), s"Cannot create $clazz from $url") | ||
fn(url) | ||
} | ||
} | ||
|
||
} | ||
|
||
/** Supports creation of a particular type of Artifact from a URL. | ||
* Allows chaining together of different implementations that recognize different input URLs | ||
* and support creation of different Artifact types | ||
*/ | ||
trait UrlToArtifact { | ||
/** Return a PartialFunction indicating whether the given Artifact type can be created from an input URL | ||
* @tparam A The Artifact type to be created | ||
* @return A PartialFunction where isDefined will return true if an Artifact of type A can | ||
* be created from the given URL | ||
*/ | ||
def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] | ||
} | ||
|
||
object UrlToArtifact { | ||
// Chain together a series of UrlToArtifact instances | ||
// The result will be a UrlToArtifact that supports creation of the union of Artifact types and input URLs | ||
// that are supported by the individual inputs | ||
def chain(first: UrlToArtifact, second: UrlToArtifact, others: UrlToArtifact*) = | ||
new UrlToArtifact { | ||
override def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = { | ||
var fn = first.urlToArtifact[A] orElse second.urlToArtifact[A] | ||
for (o <- others) { | ||
fn = fn orElse o.urlToArtifact[A] | ||
} | ||
fn | ||
} | ||
} | ||
|
||
object Empty extends UrlToArtifact { | ||
def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = | ||
PartialFunction.empty[URI, A] | ||
} | ||
|
||
} | ||
|
||
object CreateCoreArtifacts { | ||
// Create a FlatArtifact or StructuredArtifact from an absolute file:// URL | ||
val fromFileUrls: UrlToArtifact = new UrlToArtifact { | ||
def urlToArtifact[A <: Artifact: ClassTag]: PartialFunction[URI, A] = { | ||
val c = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]] | ||
val fn: PartialFunction[URI, A] = { | ||
case url if c.isAssignableFrom(classOf[FileArtifact]) | ||
&& "file" == url.getScheme => | ||
new FileArtifact(new File(url)).asInstanceOf[A] | ||
case url if c.isAssignableFrom(classOf[FileArtifact]) | ||
&& null == url.getScheme => | ||
new FileArtifact(new File(url.getPath)).asInstanceOf[A] | ||
case url if c.isAssignableFrom(classOf[DirectoryArtifact]) | ||
&& "file" == url.getScheme | ||
&& new File(url).exists | ||
&& new File(url).isDirectory => | ||
new DirectoryArtifact(new File(url)).asInstanceOf[A] | ||
case url if c.isAssignableFrom(classOf[DirectoryArtifact]) | ||
&& null == url.getScheme | ||
&& new File(url.getPath).exists | ||
&& new File(url.getPath).isDirectory => | ||
new DirectoryArtifact(new File(url.getPath)).asInstanceOf[A] | ||
case url if c.isAssignableFrom(classOf[ZipFileArtifact]) | ||
&& "file" == url.getScheme => | ||
new ZipFileArtifact(new File(url)).asInstanceOf[A] | ||
case url if c.isAssignableFrom(classOf[ZipFileArtifact]) | ||
&& null == url.getScheme => | ||
new ZipFileArtifact(new File(url.getPath)).asInstanceOf[A] | ||
} | ||
fn | ||
} | ||
} | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,23 +9,23 @@ import scala.io.{ Codec, Source } | |
import scala.reflect.ClassTag | ||
|
||
trait ArtifactIo[T, -A <: Artifact] | ||
extends SerializeToArtifact[T, A] with DeserializeFromArtifact[T, A] | ||
extends Serializer[T, A] with Deserializer[T, A] | ||
|
||
/** Interface for defining how to persist a data type. | ||
* | ||
* @tparam T the type of the data being serialized | ||
* @tparam A the type of the artifact being written (i.e. FileArtifact) | ||
* @tparam A the type of the artifact being written (e.g. FileArtifact) | ||
*/ | ||
trait SerializeToArtifact[-T, -A <: Artifact] extends PipelineStep { | ||
trait Serializer[-T, -A <: Artifact] extends PipelineStep { | ||
def write(data: T, artifact: A): Unit | ||
} | ||
|
||
/** Interface for defining how to persist a data type. | ||
* | ||
* @tparam T the type of the data being serialized | ||
* @tparam A the type of the artifact being read (i.e. FileArtifact) | ||
* @tparam A the type of the artifact being read (e.g. FileArtifact) | ||
*/ | ||
trait DeserializeFromArtifact[+T, -A <: Artifact] extends PipelineStep { | ||
trait Deserializer[+T, -A <: Artifact] extends PipelineStep { | ||
def read(artifact: A): T | ||
} | ||
|
||
|
@@ -54,11 +54,14 @@ class SingletonIo[T: StringSerializable: ClassTag](implicit codec: Codec) | |
_.write(implicitly[StringSerializable[T]].toString(data)) | ||
} | ||
|
||
override def stepInfo: PipelineStepInfo = | ||
override def stepInfo: PipelineStepInfo = { | ||
val className = scala.reflect.classTag[T].runtimeClass.getSimpleName | ||
super.stepInfo.copy( | ||
className = s"SingletonIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]", | ||
parameters = Map("charSet" -> codec.charSet.toString) | ||
className = s"ReadObject[$className]", | ||
parameters = Map("charSet" -> codec.charSet.toString), | ||
description = Some(s"Read [$className] into memory") | ||
) | ||
} | ||
} | ||
|
||
object SingletonIo { | ||
|
@@ -82,11 +85,14 @@ class LineCollectionIo[T: StringSerializable: ClassTag](implicit codec: Codec) | |
override def write(data: Iterable[T], artifact: FlatArtifact): Unit = | ||
delegate.write(data.iterator, artifact) | ||
|
||
override def stepInfo: PipelineStepInfo = | ||
override def stepInfo: PipelineStepInfo = { | ||
val className = scala.reflect.classTag[T].runtimeClass.getSimpleName | ||
super.stepInfo.copy( | ||
className = s"LineCollectionIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]", | ||
parameters = Map("charSet" -> codec.charSet.toString) | ||
className = s"ReadCollection[$className]", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Friendlier names for source-data boxes on pipeline diagrams |
||
parameters = Map("charSet" -> codec.charSet.toString), | ||
description = Some(s"Read collection of [$className] into memory") | ||
) | ||
} | ||
|
||
} | ||
|
||
|
@@ -124,12 +130,15 @@ class LineIteratorIo[T: StringSerializable: ClassTag](implicit codec: Codec) | |
} | ||
} | ||
|
||
override def stepInfo: PipelineStepInfo = | ||
override def stepInfo: PipelineStepInfo = { | ||
val className = scala.reflect.classTag[T].runtimeClass.getSimpleName | ||
super.stepInfo.copy( | ||
className = | ||
s"LineIteratorIo[${scala.reflect.classTag[T].runtimeClass.getSimpleName}]", | ||
parameters = Map("charSet" -> codec.charSet.toString) | ||
s"ReadIterator[$className]", | ||
parameters = Map("charSet" -> codec.charSet.toString), | ||
description = Some(s"Stream iterator of [$className]") | ||
) | ||
} | ||
} | ||
|
||
object LineIteratorIo { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an extensible API for creating Artifacts from URLs. Previously, the
Artifact
class had aurl
method, but there was no way to go in the other direction.