The command line is a powerful tool for data transformations. We've discussed some CLI tools already such as grep
that can be used to transform data. Let's delve into a few more.
csvlook
is part of csvkit
that we installed earlier. It allows to "pretty print" a csv file in the command line.
Here is an example from an old FiveThirtyEight article on Alcohol Consumption.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvlook
For very long or very wide CSV files, you can pipe the output of csvlook
into less
.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvlook | less
Note how grep
leaves out the header of the CSV. As part of csvkit, there's a version of grep specific to csvs: csvgrep
. This allows 1) to grep the contents of a single column and 2) to view the header after grepping.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvgrep -c 'country' -m France
in2csv
is a data conversion tool built into CSVKit.
https://csvkit.readthedocs.io/en/1.0.2/scripts/in2csv.html
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | in2csv -f json -k members
Note how you have to specify a toplevel key.
jq
is a Command-line JSON processor. Here are a few examples using the superheroes.json dataset.
If you don't have jq installed, run brew install jq
on macOS and sudo apt-get install jq
on Ubuntu.
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq '.members'
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[].name'
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[].powers[]'
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[] | [.name, .secretIdentity, .age] | @csv'