Skip to content

Commit

Permalink
Merge pull request #562 from jvalue/jayvee-formatter
Browse files Browse the repository at this point in the history
Add custom Jayvee formatter
  • Loading branch information
georg-schwarz authored May 16, 2024
2 parents 2b4077f + b29c8e7 commit 8c770d5
Show file tree
Hide file tree
Showing 11 changed files with 730 additions and 463 deletions.
181 changes: 92 additions & 89 deletions example/cars.jv
Original file line number Diff line number Diff line change
Expand Up @@ -12,93 +12,96 @@
// to a SQLite file sink.
pipeline CarsPipeline {

// 2. We describe the structure of the pipeline,
// usually at the top of the pipeline.
// by connecting blocks via pipes.

// 3. Syntax of a pipe
// connecting the block CarsExtractor
// with the block CarsTextFileInterpreter.
CarsExtractor -> CarsTextFileInterpreter;

// 4. The output of the preceding block is hereby used
// as input for the succeeding block.

// 5. Pipes can be further chained,
// leading to an overview of the pipeline.
CarsTextFileInterpreter
-> CarsCSVInterpreter
-> NameHeaderWriter
-> CarsTableInterpreter
-> CarsLoader;


// 6. Below the pipes, we usually define the blocks
// that are connected by the pipes.

// 7. Blocks instantiate a block type by using the oftype keyword.
// The block type defines the available properties that the block
// can use to specify the intended behavior of the block
block CarsExtractor oftype HttpExtractor {

// 8. Properties are assigned to concrete values.
// Here, we specify the URL where the file shall be downloaded from.
url: "https://gist.githubusercontent.com/noamross/e5d3e859aa0c794be10b/raw/b999fb4425b54c63cab088c0ce2c0d6ce961a563/cars.csv";
}

// 9. The HttpExtractor requires no input and produces a binary file as output.
// This file has to be interpreted, e.g., as text file.
block CarsTextFileInterpreter oftype TextFileInterpreter { }

// 10. Next, we interpret the text file as sheet.
// A sheet only contains text cells and is useful for manipulating the shape of data before assigning more strict value types to cells.
block CarsCSVInterpreter oftype CSVInterpreter {
enclosing: '"';
}

// 11. We can write into cells of a sheet using the CellWriter block type.
block NameHeaderWriter oftype CellWriter {
// 12. We utilize a syntax similar to spreadsheet programs.
// Cell ranges can be described using the keywords "cell", "row", "column", or "range" that indicate which
// cells are selected for the write action.
at: cell A1;

// 13. For each cell we selected with the "at" property above,
// we can specify what value shall be written into the cell.
write: ["name"];
}

// 14. As a next step, we interpret the sheet as a table by adding structure.
// We define a value type per column that specifies the data type of the column.
// Rows that include values that are not valid according to the their value types are dropped automatically.
block CarsTableInterpreter oftype TableInterpreter {
header: true;
columns: [
"name" oftype text,
"mpg" oftype decimal,
"cyl" oftype integer,
"disp" oftype decimal,
"hp" oftype integer,
"drat" oftype decimal,
"wt" oftype decimal,
"qsec" oftype decimal,
"vs" oftype integer,
"am" oftype integer,
"gear" oftype integer,
"carb" oftype integer
];
}

// 15. As a last step, we load the table into a sink,
// here into a sqlite file.
// The structural information of the table is used
// to generate the correct table.
block CarsLoader oftype SQLiteLoader {
table: "Cars";
file: "./cars.sqlite";
}

// 16. Congratulations!
// You can now use the sink for your data analysis, app,
// or whatever you want to do with the cleaned data.
// 2. We describe the structure of the pipeline,
// usually at the top of the pipeline.
// by connecting blocks via pipes.

// 3. Syntax of a pipe
// connecting the block CarsExtractor
// with the block CarsTextFileInterpreter.
CarsExtractor
-> CarsTextFileInterpreter;

// 4. The output of the preceding block is hereby used
// as input for the succeeding block.

// 5. Pipes can be further chained,
// leading to an overview of the pipeline.
CarsTextFileInterpreter
-> CarsCSVInterpreter
-> NameHeaderWriter
-> CarsTableInterpreter
-> CarsLoader;


// 6. Below the pipes, we usually define the blocks
// that are connected by the pipes.

// 7. Blocks instantiate a block type by using the oftype keyword.
// The block type defines the available properties that the block
// can use to specify the intended behavior of the block
block CarsExtractor oftype HttpExtractor {

// 8. Properties are assigned to concrete values.
// Here, we specify the URL where the file shall be downloaded from.
url: "https://gist.githubusercontent.com/noamross/e5d3e859aa0c794be10b/raw/b999fb4425b54c63cab088c0ce2c0d6ce961a563/cars.csv";
}

// 9. The HttpExtractor requires no input and produces a binary file as output.
// This file has to be interpreted, e.g., as text file.
block CarsTextFileInterpreter oftype TextFileInterpreter { }

// 10. Next, we interpret the text file as sheet.
// A sheet only contains text cells and is useful for manipulating the shape of data before assigning more strict value types to cells.
block CarsCSVInterpreter oftype CSVInterpreter {
enclosing: '"';
}

// 11. We can write into cells of a sheet using the CellWriter block type.
block NameHeaderWriter oftype CellWriter {
// 12. We utilize a syntax similar to spreadsheet programs.
// Cell ranges can be described using the keywords "cell", "row", "column", or "range" that indicate which
// cells are selected for the write action.
at: cell A1;

// 13. For each cell we selected with the "at" property above,
// we can specify what value shall be written into the cell.
write: [
"name"
];
}

// 14. As a next step, we interpret the sheet as a table by adding structure.
// We define a value type per column that specifies the data type of the column.
// Rows that include values that are not valid according to the their value types are dropped automatically.
block CarsTableInterpreter oftype TableInterpreter {
header: true;
columns: [
"name" oftype text,
"mpg" oftype decimal,
"cyl" oftype integer,
"disp" oftype decimal,
"hp" oftype integer,
"drat" oftype decimal,
"wt" oftype decimal,
"qsec" oftype decimal,
"vs" oftype integer,
"am" oftype integer,
"gear" oftype integer,
"carb" oftype integer
];
}

// 15. As a last step, we load the table into a sink,
// here into a sqlite file.
// The structural information of the table is used
// to generate the correct table.
block CarsLoader oftype SQLiteLoader {
table: "Cars";
file: "./cars.sqlite";
}

// 16. Congratulations!
// You can now use the sink for your data analysis, app,
// or whatever you want to do with the cleaned data.
}
Loading

0 comments on commit 8c770d5

Please sign in to comment.