-
Notifications
You must be signed in to change notification settings - Fork 1
/
camp2.html
213 lines (173 loc) · 10.9 KB
/
camp2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Bootcamp day 2</title>
<meta name="description" content="Bootcamp Day 2">
<meta name="author" content="Chan Chi-Loong, V/R">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<link rel="stylesheet" href="css/reveal.min.css">
<link rel="stylesheet" href="css/theme/night.css" id="theme">
<link rel="stylesheet" href="css/style.css">
<!-- If the query includes 'print-pdf', use the PDF print sheet -->
<script>
document.write( '<link rel="stylesheet" href="css/print/' + ( window.location.search.match( /print-pdf/gi ) ? 'pdf' : 'paper' ) + '.css" type="text/css" media="print">' );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section data-state="alert" data-transition="zoom">
<h1 class="shadowfont white">Data manipulation</h1>
<h3>Bootcamp day 2</h3>
</section>
<section>
<h2>Pivot Tables</h2>
<p><b>Pivot Tables</b> are an easy way to analyze data using a spreadsheet.</p>
<p>They exist in Microsoft Excel (cloud and desktop), Google sheets and opensource office tools (like Libre Office).</p>
</section>
<section>
<h2>Boot up Excel</h2>
<p>For this bootcamp we'll be using Microsoft Excel. I'm using the free office365 version, and it should be similar to your SUTD version.</p>
<p>Please download a sample version of this sample <a href="data/sales_data_sample.csv" target="_blank">sales data</a> CSV file (source: <a href="https://www.kaggle.com/datasets/kyanyoga/sample-sales-data?resource=download" target="_blank">Kaggle link</a>) and open it up in Excel.</p>
</section>
<section>
<h2>Analysis</h2>
<ul>
<li>See sales numbers of each product line and product code</li>
<li>Cross reference with year</li>
<li>Add basic visualization so that it is easier to see</li>
<li>Add deal size filter</li>
</ul>
</section>
<section>
<h2>Exercise</h2>
<ul>
<li>Which country had the most number of deals? For what product lines? What was the total sales amount?</li>
<li>What is the average deal size of a small motorcycle deal in 2003 in USA?</li>
<li>What product lines did each customer in the UK buy across 2003 to 2005 and what is the total sales amount?</li>
</ul>
</section>
<section>
<h2>Synthesis</h2>
<p>Now let's put everything together.</p>
<p>Which university in Singapore has the highest graduate employment rate in 2021?</p>
<p>What course and what's the mean monthly salary?</p>
</section>
<section>
<h2>Other perspectives</h2>
<p>Higher order analytic tools like <i>PowerBI</i> and <i>Tableau</i>. Really useful for connectors to various data platforms.</p>
<p>SQL is useful to know if you're doing deeper dives into data.</p>
<p><a href="https://www.simplilearn.com/elt-tools-article" target="_blank">ETL tools</a> (Extract, transform and load) tools. Mainly if you have loads of data and it is a warehousing and architecture problem.</p>
</section>
<section data-state="alert" data-transition="zoom">
<h1 class="shadowfont white">Data merge</h1>
<h3>Using some programming</h3>
</section>
<section>
<h2>Table merge</h2>
<p>If you're merging tables on a column Excel can do it using <a href="https://www.ablebits.com/office-addins-blog/excel-merge-tables-matching-columns/" target="_blank">power query or table wizard</a>.</p>
<p>Tools like PowerBI or Tableau can probably do this as well.</p>
</section>
<section>
<h2>JSON merge</h2>
<p>For JSON you can combine your datasets in a GIS (Geographic Information System) tool like ESRI ArcGIS or the opensource QGIS. Or use cloud-based tools like Carto or Mapbox.</p>
<p>If you take my module on visualization we'll go through these perspectives in more detail when we talk about geospatial viz.</p>
</section>
<section>
<h2>Libraries</h2>
<h5>for data manipulation</h5>
<p>There are tons of libraries out there. Here is just a small sample of popular libraries for some languages.</p>
<ul>
<li>R: dplyr. Very SQL-like</li>
<li>Python: pandas, numpy</li>
<li>javascript: underscore, D3</li>
</ul>
</section>
<section>
<h2>Exercise</h2>
<p>We're going to do some very simple programming using JS.</p>
<p>We're going to merge a JSON (GeoJSON format) and a CSV file on a common field.</p>
</section>
<section>
<h3>Why: Task</h3>
<p class="smallfont">We're preparing data for a chloropleth visualization from 2 data sources</p>
<img src="img/SGassignment4.jpg"/>
</section>
<section>
<h2>Exercise</h2>
<p>Download the sample <a href="assignments/camp2-exercise.html" target="_blank">HTML file</a>, the <a href="assignments/sgmap.json" target="_blank">JSON</a> and <a href="assignments/population2022.csv" target="_blank">CSV</a> in the same directory.</p>
<p>Fire up an editor and take a look.</p>
</section>
<section>
<h2>CORS</h2>
<p>Oh no! A whole bunch of CORS (cross origin resource sharing) errors when trying to run the html file.</p>
<p>This is for security reasons, making sure that javascript doesn't inject some malicious code into your system when you load data.</p>
</section>
<section>
<h2>Local web server</h2>
<p>The best way to get around this is to run a web server, which locks down the protocol to just web traffic.</p>
<p>(You can also disable security under developer mode but this is not recommended)</p>
</section>
<section>
<h2>VSCode or http-server</h2>
<p>Two easy ways to get a local live server (amongst many, many others!)</p>
<p>Fire up VSC and download the <a href="https://owlhowto.com/how-to-setup-live-server-on-visual-studio-code/" target="_blank">live server extension and run it</a>.</p>
<p>Alternatively, if you have NPM (node package manager) installed - just run <i>npx http-server</i> in the local directory.</p>
</section>
<section>
<section>
<h2>Code: JS</h2>
<p>Let's use the console.log to examine the data structure of the two files.</p>
<p>And then let's go through the merge step-by-step.</p>
</section>
<section>
<h2>Code: Python</h2>
<p>I'm not going through this but in Python you can use Pandas dataframe to merge the 2 files.</p>
</section>
</section>
<section>
<h2>Final code</h2>
<p>A version of the merged data file is <a href="assignments/assignment4-starter.html" target="_blank">here</a>.</p>
<p class="smallfont">You'll see this again in assinment 4 if you're taking my module.</p>
</section>
<section>
<h2>Questions?</h2>
<div class="plain">
<div class="rotate-icon"><a href="http://www.vslashr.com"><img width="100" src="img/VslashR_logo_white.svg"></a></div>
</div>
<p><small>Chi-Loong | V/R</small></p>
</section>
</div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>
<script>
// Full list of configuration options available here:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
theme: Reveal.getQueryHash().theme, // available themes are in /css/theme
transition: Reveal.getQueryHash().transition || 'default', // default/cube/page/concave/zoom/linear/fade/none
// Optional libraries used to extend on reveal.js
dependencies: [
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: 'plugin/zoom-js/zoom.js', async: true, condition: function() { return !!document.body.classList; } },
{ src: 'plugin/notes/notes.js', async: true, condition: function() { return !!document.body.classList; } }
]
});
</script>
</body>
</html>