Skip to content

Latest commit

 

History

History
168 lines (130 loc) · 5.94 KB

README.md

File metadata and controls

168 lines (130 loc) · 5.94 KB

jSchema

logo

Data Modeling in JavaScript

jSchema is a framework for modeling data in JavaScript. By using fundamental data modeling principles you are able to pull multiple datasets into a common schema, define relationships, aggregate, join, and subset datasets to make data easier to work with in the browser.

entity relationship diagram - new page 1

jSchema is going to create an object called jSchema. This object is a metadata representation of all your datasets containing the table names, column names, and keys that define sets. The data itself is stored in a closure within the object and is retrieved via a getter function. Joining data, aggregating data, and filtering data will create a new dataset in your WORK namespace that will persist on the page until either you delete the table or you run a cleanUp method. By default jSchema will be case sensitive. This can be overwritten with:

var s = new jSchema({
  "caseSensitive": false
});

Demo

A complete working demo of how to load tables, join tables, and aggregate tables can be found in the repository in the demo folder. A live demo can be found on the libraries homepage: live demo

How to Use

NPM

If you use npm, npm install jschema.

requirejs

jschema.js uses requirejs to modularly load the library and is included in the package.json. requirejs can be moved to the lib directory of your project and included as:

<script data-main="main" src="lib/require.js"></script></body>

where main is main.js. You can then add jSchema to your application as such (in this case main.js):

requirejs(['lib/jschema'], function(jSchema){
  var s = new jSchema;

  s.add([{a:1, b:2}])
  s.add([{b:2, c:3}], {name:"named_table", primaryKey:"b"})
});

Or load data from external sources:

fetch("education.json")
  .then(response => response.json())
  .then(json => s.add(json, {name:"education", primaryKey:"Age_Group"}))
  .then(fetch("gender.json")
    .then(response => response.json())
    .then(json => s.add(json, {name:"gender", primaryKey:"Age_Group"}))
);

JOIN

Multiple datasets that have primary / foreign key relationships can be joined as such:

s.join("EDUCATION", "GENDER", {name: "joinTable"})

DROP

You can drop tables that are no longer needed in the schema with the drop method:

s.drop("GENDER")

SORT

If you want to sort a dataset by an attribute use the orderBy method:

s.orderBy("GENDER", {
  clause: "Count",
  order: "asc",
  name: "sortBy"
})

GROUP BY

To Group by you need to provide the dataset name, the name of the dimension to group by, and the metrics you wish to aggregate:

s.groupBy("GENDER", {
  dim: "Gender",
  metric: "Count",
  name: "groupBy",
  method: "sum", // supported methods are sum, count, average, min, max
  percision: 2, // default is 2
  dimName: "GenderPK" // use if you need to rename the dimension
})

Output:

  {
    "dim": "Male",
    "val": 37575
  },
  {
    "dim": "Female",
    "val": 44074
  }
]

FILTER

To filter a dataset you can call the filter method and pass three or more arguments. First argument is always the table name. The second and third are the field to filter by, and the value to filter it on. Additional pairs can be included as fourth, fifth parameters and so on. A filtered dataset will be created in the WORK namespace.

s.filter('GENDER', 'Gender', 'Female')

UPDATE

If a new version of the dataset is made available you can update the existing table in the schema by running the update method:

s.update("GENDER", data);

INSERT

Inserting data into a table will add additional rows to your table. This will not affect any of the existing data in the table. A single row can be passed as an object or an array of objects can be passed. Column names need to match in order for the new data to appear when retrieved.

s.insert("TABLE0", {
  "gender": "Female",
  "age_group": "16-24",
  "count": 10
});

REMOVE COLUMNS

If you want to completely remove a column use removeCol to create a clone of the table with the column removed. Pass the dataset as the first argument and the attributes as the second argument. Attributes are col for the column to be removed, and name for the name of the table to clone the table to. Name is optional and if it is not used it will copy the data to the WORK namespace.

s.removeCol("COMPLETE", {
  "col": "REFUGEES_BY_DEST.STATE",
  "name": "REFUGEES_CLEAN"
});

ADD COLUMNS

In order to add a new column to a dataset you need to call the addCol method. This will take an expression (two columns and an arithmetic operator) and combine them into a new field. The function will create a new dataset with the name passed in the attribute name , and will return the schema object. If name is not provided it will create a dataset in the WORK namespace.

s.addCol("REFUGEES_CLEAN", {
  "name": "REFUGEES_ARITHMETIC",
  "expression": "REFUGEES_BY_DEST.VAL / POPULATION.POP_EST_2014",
  "colName": "AVG_POP_DEST"
})

By default all temporary datasets (joins, group by, order by) added to the schema will be prefixed with the namespace WORK. (e.g. joining two tables NAMES and LOCATIONS will result in a table added to the schema called WORK.NAMES_LOCATIONS). Calling the cleanUp method will remove all datasets added to the WORK namespace

s.cleanUp();

Testing

To run mocha test scripts from the terminal type:

npm install
npm run test

Contact

Questions, comments, feature requests, etc are always welcome. I am @ignoreintuition on Twitter.