-
Notifications
You must be signed in to change notification settings - Fork 20
/
README.Rmd
239 lines (151 loc) · 5.36 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "# "
)
```
# dplyr-cli
<!-- badges: start -->
![](https://img.shields.io/badge/cool-useless-green.svg)
<!-- badges: end -->
`dplyr-cli` uses the `Rscript` executable to
run dplyr commands on CSV files in the terminal.
`dplyr-cli` makes use of the terminal pipe `|` instead of the magrittr pipe (`%>%`)
to run sequences of commands.
```
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl| mpg|
#> |---:|--------:|
#> | 4| 26.66364|
#> | 6| 19.74286|
#> | 8| 15.10000|
```
## Motivation
I wanted to be able to do quick hacks on CSV files on the command line using
dplyr syntax, but without actually starting a proper R session.
## What dplyr commands are supported?
Any command of the form:
* `dplyr::verb(.data, code)`
* `dplyr::*_join(.data, .rhs)`
Currently two extra commands are supported which are not part of `dplyr`.
* `csv` performs no dplyr command, but only outputs the input data as CSV to stdout
* `kable` performs no dplyr command, but only outputs the input data as a
`knitr::kable()` formatted string to stdout
## Limitations
* Only tested under 'bash' on OSX. YMMV.
* Every command runs in a separate R session.
* When using special shell characters such as `()`, you'll have to quote
your code arguments. Some shells will require more quoting than others.
* "joins" (such as `left_join`) do not currently let you specify the `by` argument,
so there must be columns in common to both dataset
## Usage
```{sh}
dplyr --help
```
## History
#### v0.1.0 2020-04-20
* Initial release
#### v0.1.1 2020-04-21
* Switch to 'Rscript' for easier install for users
* rename 'dplyr.sh' to just 'dplyr'
#### v0.1.2 2020-04-21
* Support for joins e.g. `left_join`
#### v0.1.3 2020-04-22
* More robust tmpdir handling
#### v0.1.4 2022-01-23
* Fix handling for latest `read_csv()`. Fixes #9
## Contributors
* [aborusso](https://github.com/aborruso) - documentation
## Installation
Because this script straddles a great divide between R and the shell, you need
to ensure both are set up correctly for this to work.
1. Install R packages
2. Clone this repo and put `dplyr` in your path
#### Install R packages - within R
`dplyr-cli` is run from the shell but at every invocation is starting a new
rsession where the following packages are expected to be installed:
```{r eval=FALSE}
install.packages('readr') # read in CSV data
install.packages('dplyr') # data manipulation
install.packages('docopt') # CLI description language
```
<details>
<summary> Click to reveal instructions for installing packages on the command line</summary>
To do it from the cli on a linux-ish system, install `r-base` (`sudo apt -y install r-base`) and then run
```bash
sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""
```
</details>
#### Clone this repo and put `dplyr` in your path
You'll then need to download the shell script from this repository and put `dplyr`
somewhere in your path.
```
git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path
```
# Example data
Put an example CSV file on the filesystem. Note: This CSV file is now included as
`mtcars.csv` as part of this git repository, as is a second CSV file for
demonstrating joins - `cyl.csv`
```{r}
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
```
# Example 1 - Basic Usage
```{sh}
# cat contents of input CSV into dplyr-cli.
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
```
```{sh}
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
```
```{sh}
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
```
# Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)
```{sh}
cat mtcars.csv | \
dplyr mutate "cyl2 = 2 * cyl" | \
dplyr filter "cyl == 8" | \
dplyr kable
```
# Example 3 - set up some aliases for convenience
```{sh}
alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
```
# Example 4 - joins
Limitations:
* first argument after a join command must be an existing file (either CSV or RDS)
* You can't yet specify a `by` argument for a join, so there must be a column in
common to join by
```{sh}
cat cyl.csv
```
```{sh}
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
```
## Security warning
`dplyr-cli` uses `eval(parse(text = ...))` on user input. Do not expose this
program to the internet or random users under any circumstances.
## Inspirations
* [xsv](https://github.com/BurntSushi/xsv) - a fast CSV command line toolkit
written in Rust
* [jq](https://stedolan.github.io/jq/) - a command line JSON processor.
* [miller](http://johnkerl.org/miller/doc/)