forked from software-tools-books/js4ds
-
Notifications
You must be signed in to change notification settings - Fork 0
/
capstone.tex
703 lines (573 loc) · 19.8 KB
/
capstone.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
\chapter{Capstone Project}\label{s:capstone}
It's time to bring everything together in an extended example:
a (slightly) interactive visualization of species data from \url{https://figshare.com/articles/Portal_Project_Teaching_Database/1314459}.
Our plan is to:
\begin{itemize}
\item
slice data for testing,
\item
write a data server to serve that data,
\item
test the server,
\item
build an interactive tabular display of our data, and
\item
add visualization.
\end{itemize}
This will require a few new ideas,
but will mostly recapitulate what's come before.
\section{Data Manager}\label{s:capstone-data}
The data manager is exactly the same as the one we built in \chapref{s:dataman}.
As a reminder,
the key class is:
\begin{minted}{js}
class DataManager {
constructor (filename) {
// ...read and store data from CSV file...
}
getSurveyStats () {
// ...return summary statistics...
}
getSurveyRange (minYear, maxYear) {
// ...return slice of data...
}
}
\end{minted}
\section{Server}\label{s:capstone-server}
The server is going to be almost the same as the one in \chapref{s:server}.
However, we need to connect it to the data manager class.
We'll do this by having the driver create a data manager,
and then pass that data manager to the server as the latter is being created:
\begin{minted}{js}
const DataManager = require('./data-manager')
const server = require('./server-0')
const PORT = 3418
const filename = process.argv[2]
const db = new DataManager(filename)
const app = server(db)
app.listen(PORT, () => {
console.log(`listening on port ${PORT}...`)
})
\end{minted}
As you can probably guess from the fact that we've called it \texttt{driver-0},
we're going to be making some changes down the road\ldots{}
Here's the start of the server it works with:
\begin{minted}{js}
const express = require('express')
// 'dataManager' is a global variable that refers to our database.
// It must be set when the server is created.
let dataManager = null
// Main server object.
const app = express()
// ...handle requests...
module.exports = (dbm) => {
dataManager = dbm
return app
}
\end{minted}
We'll look at handling requests for data in the next section.
The most important thing for now is the way we manage the connection to the data manager.
Down at the bottom of \texttt{server-0.js},
we export a function that assigns its single argument to a variable called \texttt{dataManager}.
Inside the methods that handle requests,
we'll be able to send database queries to \texttt{dataManager}.
This variable is global within this file,
but since it's not exported,
it's invisible outside.
Variables like this are called \gref{g:module-variable}{module variables},
and give us a way to share information among the functions in a module
without giving anything outside the module a way to cause (direct) harm to that information.
\section{API}\label{s:capstone-api}
The next step is to decide what our server's API will be,
i.e.,
what URLs it will understand and what data they will fetch.
\texttt{GET\ /survey/stats} will get summary statistics as a single JSON record,
and \texttt{GET\ /survey/:start/:end} gets aggregate values for a range of years.
(We will add error checking on the year range as an exercise.)
Anything else will return a 404 error code and a snippet of HTML telling us we're bad people.
We will put this code in \texttt{server.js} and a command-line driver in \texttt{driver.js} for testability.
The server functions are:
\begin{minted}{js}
// Get survey statistics.
app.get('/survey/stats', (req, res, next) => {
const data = dataManager.getSurveyStats()
res.setHeader('Content-Type', 'application/json')
res.status(200).send(data)
})
// Get a slice of the survey data.
app.get('/survey/:start/:end', (req, res, next) => {
const start = parseInt(req.params.start)
const end = parseInt(req.params.end)
const data = dataManager.getSurveyRange(start, end)
res.setHeader('Content-Type', 'application/json')
res.status(200).send(data)
})
\end{minted}
We also write an error handling function:
\begin{minted}{js}
// Nothing else worked.
app.use((req, res, next) => {
page = `<html><body><p>error: "${req.url}" not found</p></body></html>`
res.status(404)
.send(page)
})
\end{minted}
Now let's write our first test:
\begin{minted}{js}
const path = require('path')
const assert = require('assert')
const request = require('supertest')
const DataManager = require('./data-manager')
const make_server = require('./server-0')
TEST_DATA_PATH = path.resolve(__dirname, 'test-data.csv')
describe('server', () => {
it('should return statistics about survey data', (done) => {
expected = {
minYear: 1979,
maxYear: 2000,
count: 10
}
const db = new DataManager(TEST_DATA_PATH)
const server = make_server(db)
.get('/survey/stats')
.expect(200)
.expect('Content-Type', 'application/json')
.end((err, res) => {
assert.deepEqual(res.body, expected, '')
done()
})
})
})
\end{minted}
Note that the range of years is 1979-2000,
which is \emph{not} the range in the full dataset.
We run this with:
\begin{minted}{shell}
$ npm test -- src/capstone/back/test-server.js
\end{minted}
\noindent
and it passes.
\section{The Display}\label{s:capstone-display}
The front end is a straightforward recapitulation of what we've done before.
There is a single HTML page called \texttt{index.html}:
\begin{minted}{html}
<!DOCTYPE html>
<html>
<head>
<title>Creatures</title>
<meta charset="utf-8">
<script src="app.js" async></script>
</head>
<body>
<div id="app"></div>
</body>
</html>
\end{minted}
The main application in \texttt{app.js} imports components to display summary statistics,
choose a range of years,
and display annual data.
There is not usually such a close coupling between API calls and components,
but it's not a bad place to start.
Note that we are using \texttt{import} because we're trying to be modern,
even though the back end still needs \texttt{require}.
\begin{minted}{js}
import React from 'react'
import ReactDOM from 'react-dom'
import SurveyStats from './SurveyStats'
import ChooseRange from './ChooseRange'
import DataDisplay from './DataDisplay'
class App extends React.Component {
constructor (props) {
// ...constructor...
}
componentDidMount = () => {
// ...initialize...
}
onStart = (start) => {
// ...update start year...
}
onEnd = (end) => {
// ...update end year...
}
onNewRange = () => {
// ...handle submission of year range...
}
render = () => {
// ...render current application state...
}
}
ReactDOM.render(
<App />,
document.getElementById('app')
)
\end{minted}
The constructor defines the URL for the data source and sets up the initial state,
which has summary data,
start and end years,
and data for those years:
\begin{minted}{js}
constructor (props) {
super(props)
this.baseUrl = 'http://localhost:3418'
this.state = {
summary: null,
start: '',
end: '',
data: null
}
}
\end{minted}
The method \texttt{componentDidMount} is new:
it fetches data for the very first time
so that the user sees something useful on the page when they first load it.
\begin{minted}{js}
componentDidMount = () => {
const url = `${this.baseUrl}/survey/stats`
fetch(url).then((response) => {
return response.json()
}).then((summary) => {
this.setState({
summary: summary
})
})
}
\end{minted}
We don't call this method ourselves;
instead,
React automatically calls it once our application and its children have been loaded and initialized.
We can't fetch the initial data in the application's constructor because
we have no control over the order in which bits of display are initialized.
On the upside,
we can use \texttt{response.json()} directly because we know the source is returning JSON data.
This method is the only place where the summary is updated,
since the data isn't changing underneath us:
Next up we need to handle typing in the ``start'' and ``end'' boxes.
The HTML controls in the web page will capture the characters without our help,
but we need those values in our state variables:
\begin{minted}{js}
onStart = (start) => {
this.setState({
start: start
})
}
onEnd = (end) => {
this.setState({
end: end
})
}
\end{minted}
When the button is clicked,
we send a request for JSON data to the appropriate URL
and record the result in the application's state.
React will notice the state change and call \texttt{render} for us.
More precisely,
the browser will call the first \texttt{then} callback when the response arrives,
and the second \texttt{then} callback when the data has been converted to JSON.
\begin{minted}{js}
onNewRange = () => {
const params = {
method: 'GET',
headers: {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
}
const url = `${this.baseUrl}/survey/${this.state.start}/${this.state.end}`
fetch(url, params).then((response) => {
return response.json()
}).then((data) => {
this.setState({
data: data
})
})
}
\end{minted}
Now let's update the display with \texttt{SurveyStats}, \texttt{ChooseRange}, \texttt{DataChart}, and \texttt{DataDisplay},
which are all stateless components
(i.e., they display things but don't change anything):
\begin{minted}{js}
render = () => {
const tableStyle = {overflow: 'scroll', height: '200px'}
return (
<div>
<h1>Creatures</h1>
<SurveyStats data={this.state.summary} />
<ChooseRange
start={this.state.start} onStart={this.onStart}
end={this.state.end} onEnd={this.onEnd}
onNewRange={this.onNewRange} />
<DataChart data={this.state.data} />
<div style={tableStyle}>
<DataDisplay data={this.state.data} />
</div>
</div>
)
}
\end{minted}
\section{The Tables}\label{s:capstone-tables}
We will display survey stats as tables,
with a paragraph fallback when there's no data.
First, we display summary statistics for the whole data set
(as returned by the \texttt{GET\ /survey/stats} query we wrote a handler for earlier)
as a table at the top of the page.
(Again, we need parentheses around the HTML fragment so that it will parse properly.)
\begin{minted}{js}
import React from 'react'
const SurveyStats = ({data}) => {
if (data === null) {
return (<p>no data</p>)
}
return (
<table>
<tbody>
<tr><th>record count</th><td>{data.record_count}</td></tr>
<tr><th>year low</th><td>{data.year_low}</td></tr>
<tr><th>year high</th><td>{data.year_high}</td></tr>
</tbody>
</table>
)
}
export default SurveyStats
\end{minted}
Next, we display aggregated statistics for a given range of years
(the \texttt{GET\ /survey/:start/:end} query)
in another table.
\begin{minted}{js}
import React from 'react'
const DataDisplay = ({data}) => {
if (! data) {
return (<p>no data</p>)
}
const columns = [
'year',
'min_hindfoot_length',
'ave_hindfoot_length',
'max_hindfoot_length',
'min_weight',
'ave_weight',
'max_weight'
]
return (
<table>
<tbody>
<tr>{columns.map((c) => (<th>{c}</th>))}</tr>
{data.filter(r => r).map((record) => {
return (<tr>{columns.map((c) => (<td>{record[c]}</td>))}</tr>)
})}
</tbody>
</table>
)
}
export default DataDisplay
\end{minted}
Like \texttt{SurveyStats}, \texttt{DataDisplay} returns a table listing the results returned from the server.
Unlike \texttt{SurveyStats},
this component needs to check whether each record exists before it builds the table row.
Remember that,
when we defined how the year range query is handled in \texttt{DataManager} earlier,
we told it to only return record objects for those years that have data.
Here, we add \texttt{.filter(r\ =\textgreater{}\ r)} before mapping the data to the callback
to ensure that \texttt{DataDisplay} will only try to make \texttt{tr} elements from non-\texttt{null} records.
We do the same when plotting the data.
\section{The Chart}\label{s:capstone-chart}
We initially tried using Vega-Lite directly for the chart,
but after a few failures and some googling,
we switched to \texttt{react-vega-lite}:
Vega-Lite's \texttt{vega-embed} wants to modify an existing DOM element when called,
while \texttt{react-vega-lite} constructs an element to be put in place at the right time,
which proved easier to use.
The steps are:
\begin{enumerate}
\item
Create a paragraph placeholder if there's no data.
\item
Re-organize non-\texttt{null} data into the form the chart needs.
\item
Construct a spec like the ones we have seen before.
\item
Create options to turn off the annoying links (also seen before).
\item
Return an instance of the \texttt{VegaLite} component.
\end{enumerate}
\begin{minted}{js}
import React from 'react'
import VegaLite from 'react-vega-lite'
const DataChart = ({data}) => {
if (! data) {
return (<p>no data</p>)
}
const values = data
.filter(r => r)
.map(r => ({x: r.ave_hindfoot_length, y: r.ave_weight}))
let spec = {
'$schema': 'https://vega.github.io/schema/vega-lite/v2.0.json',
'description': 'Mean Weight vs Mean Hindfoot Length',
'mark': 'point',
'encoding': {
'x': {'field': 'x', 'type': 'quantitative'},
'y': {'field': 'y', 'type': 'quantitative'}
}
}
let options = {
'actions': {
'export': false,
'source': false,
'editor': false
}
}
let scatterData = {
'values': values
}
return (<VegaLite spec={spec} data={scatterData} options={options}/>)
}
export default DataChart
\end{minted}
The other components are similar to those we have seen before.
\section{Running It}\label{s:capstone-run}
In order to test our application,
we need to run a data server,
and then launch our application with Parcel.
The easiest way to do that is to open two windows on our computer
and make each half the width (or height) of our screen
so that we can see messages from both halves of what we're doing.
In one window,
we run:
\begin{minted}{shell}
$ node src/capstone/back/driver-0.js src/capstone/back/test-data.csv
\end{minted}
Note that we \emph{don't} use \texttt{npm\ run\ dev} to trigger Parcel:
this is running on the server,
so no bundling is necessary.
In our other window,
we run:
\begin{minted}{shell}
$ npm run dev src/capstone/front/index.html
\end{minted}
\noindent
which displays:
\begin{minted}{text}
> [email protected] dev /Users/stj/js4ds
> parcel serve -p 4000 "src/capstone/front/index.html"
Server running at http://localhost:4000
+ Built in 20.15s.
\end{minted}
\figpdf{figures/capstone-first-attempt.png}{First Attempt at Viewing Capstone Project}{f:capstone-first-attempt}
We then open \texttt{http://localhost:4000} in our browser and see \figref{f:capstone-first-attempt}.
That's unexpected: we should see the initial data displayed.
If we open the console in the browser and reload the page,
we see this error message:
\begin{minted}{text}
Cross-Origin Request Blocked:
The Same Origin Policy disallows reading the remote resource \
at http://localhost:3418/survey/stats.
(Reason: CORS header 'Access-Control-Allow-Origin' missing).
\end{minted}
The ``Learn More'' link given with the error message takes us to \href{https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSMissingAllowOrigin}{this page},
which uses many science words we don't know.
A web search turns up \href{https://en.wikipedia.org/wiki/Cross-origin_resource_sharing}{this article on Wikipedia},
which tells us that \gref{g:cors}{Cross-origin resource sharing} (CORS)
is a security mechanism.
If a page loads some JavaScript,
and that JavaScript is allowed to send requests to servers other than the one that the page came from,
then villains would be able to do things like send passwords saved in the browser to themselves.
The details are too complex for this tutorial;
the good news is that they've been wrapped up in a Node library called \texttt{cors},
which we can add to our server with just a couple of lines of code:
\begin{minted}{js}
const express = require('express')
const cors = require('cors') // added
let dataManager = null
const app = express()
app.use(cors()) // added
app.get('/survey/stats', (req, res, next) => {
// ...as before...
})
app.get('/survey/:start/:end', (req, res, next) => {
// ...as before...
})
app.use((req, res, next) => {
// ...as before...
})
module.exports = (dbm) => {
// ...as before...
}
\end{minted}
Since this code is saved in \texttt{server-1.js},
we need to create a copy of the driver called \texttt{driver-1.js} that invokes it.
Let's run that:
\begin{minted}{shell}
$ node src/capstone/back/driver-1.js src/capstone/back/test-data.csv
\end{minted}
and then re-launch our application to get \figref{f:capstone-second-attempt}.
\figpdf{figures/capstone-second-attempt.png}{Second Attempt at Viewing Capstone Project}{f:capstone-second-attempt}
Much better!
Now we can type some dates into the ``start'' and ``end'' boxes and,
after we press ``update'',
we get a chart and table of the aggregated statistics for the year range given
(\figref{f:capstone-complete}).
\figpdf{figures/capstone-complete.png}{Completed Capstone Project}{f:capstone-complete}
We've built an interface,
used it to submit queries that are then handled by a server,
which returns data that can be converted to content by our React components,
and our capstone project is complete.
\section{Exercises}\label{s:capstone-exercises}
\exercise{Reporting Other Data}
A user has asked for the number of male and female animals observed for each year.
\begin{enumerate}
\item
Should you add this to the existing query for yearly data or create a new API call?
\item
Implement your choice.
\end{enumerate}
\exercise{Error Checking}
\begin{enumerate}
\item
Modify the server to return 400 with an error message
if the range of years requested is invalid
\item
Compare your implementation to someone else's.
Did you define ``invalid'' in the same way?
I.e., will your programs always return the same status codes for every query?
\end{enumerate}
\exercise{Adding More Tests}
\begin{enumerate}
\item
Our updated \texttt{server-1.js} doesn't have a test associated with it.
Make a copy of \texttt{test-server.js} and adjust it to test this version of the
server.
\item
What other elements of the application could and should be tested?
Try writing tests for those.
In doing this, did you discover imperfections in the example code?
If so, we'd love to hear about them (see \appref{s:contributing}).
\end{enumerate}
\exercise{Use All the Data}
Create a database using all of the survey data and test the display.
What bugs or shortcomings do you notice compared to displaying test data?
\exercise{Merging Displays}
The \texttt{SurveyStats} and \texttt{DataDisplay} components in the front end both display tables.
\begin{enumerate}
\item
Write a new component \texttt{TableDisplay} that will display an arbitrary table
given a list of column names
and a list of objects which all have (at least) those fields.
\item
Replace \texttt{SurveyStats} and \texttt{DataDisplay} with your new component.
\item
Modify your component so that it generates a unique ID for each value in the table.
(Hint: you may need to pass in a third parameter to each call
to serve as the root or stem of those unique IDs.)
\end{enumerate}
\exercise{Formatting}
Modify \texttt{DataDisplay} to format fractional numbers with a single decimal place,
but leave the integers as they are.
Ask yourself why,
seven decades after the invention of digital computers,
this isn't easier.
\exercise{Data, Data Everywhere}
Modify \texttt{DataChart} so that the word \texttt{data} isn't used in so many different ways.
Does doing this make you feel better about yourself as a person?
Modify it again so that the height and width of the chart are passed in as well.
Did that help?
\section*{Key Points}
\input{keypoints/capstone}