-
Notifications
You must be signed in to change notification settings - Fork 1
/
data.html
271 lines (230 loc) · 15.2 KB
/
data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
<!DOCTYPE HTML>
<!--
Landed by HTML5 UP
html5up.net | @n33co
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<link rel="icon"
type="image/ico"
href="images/fav.ico">
<title>DataHack</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!--[if lte IE 8]><script src="staticassets/js/ie/html5shiv.js"></script><![endif]-->
<link rel="stylesheet" href="staticassets/css/main.css" />
<link href="https://file.myfontastic.com/oXBpAhW2PdxMqAShGGCV59/icons.css" rel="stylesheet">
<!--[if lte IE 9]><link rel="stylesheet" href="staticassets/css/ie9.css" /><![endif]-->
<!--[if lte IE 8]><link rel="stylesheet" href="staticassets/css/ie8.css" /><![endif]-->
</head>
<body class="landing">
<div id="page-wrapper">
<!-- Header -->
<header id="header">
<h1 id="logo" class:"special"><a href="index.html">DATAHACK</a></h1>
<nav id="nav">
<ul>
<!-- <li><a href="index.html">Home</a></li> -->
<li><a class="scrolly" href="index.html#about">Information</a></li>
<li><a class="scrolly" href="index.html#data">Data</a></li>
<li><a class="scrolly" href="index.html#sponsors">Sponsors </a></li>
<li><a class="scrolly" href="index.html#email">Newsletter </a></li>
<li><a href="mentors.html">Mentors </a></li>
<li><a href="#">FAQs </a></li>
<li><a class="scrolly" href="#footer">Social </a></li>
<li><a href="/register" target="_blank" class="button special">Sign Up!</a></li>
<li><a href="/login" target="_blank" class="button">Login</a></li>
</ul>
</nav>
</header>
<!-- Content -->
<section id="info" class="wrapper style1 special fade-up">
<div class="container">
<header class="major">
<h2>Highlighted Datasets</h2>
<p>These are some open datasets we think you could enjoy working on.</p>
</header>
<div class="box alt">
<div class="row uniform">
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href="http://pokeapi.co/docsv2/#info" target="_blank"><h3>The Pokémon Pokédex</h3></a>
<a href="http://pokeapi.co/docsv2/#info" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_pokemon.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align"> Pokémon started out as a Japanese card game and became a worldwide phenomenom. The link is to a public API providing access to all the information about all Pokémons, throughout all existing (seven) generations + including berries!
Two great projects for this dataset would be to create PokéBots: (1) a bot that you could compete against and (2) a bot that could help you train your Pokémons.
<br><br>
A link to the API: <a href="http://pokeapi.co/docsv2/#info" target="_blank">http://pokeapi.co/docsv2/#info</a>
<br><br>
Some additional Pokémon resources are:
<ul class="align-left">
<li> <a href="https://github.com/veekun/pokedex" target="_blank"> Pokédex Python module </a> - The name says it all. </li>
<li> <a href="http://pokemondb.net/pokedex" target="_blank"> The Pokédex </a> - A website holding all information about Pokémon, they have no public API (as far as we could tell), but you can scrape it for info. </li>
</ul>
</p>
</section>
<br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href="https://www.reddit.com/r/datasets/" target="_blank"><h3>Datasets Subreddit</h3></a>
<a href="https://www.reddit.com/r/datasets/" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_reddit.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align">
Reddit is one of the biggest social bullitien boards hosting many communities, one of these communities is the dataset community, used to both requesting and publishing open datasets.
<br><br>
A link to the community: <a href="https://www.reddit.com/r/datasets/" target="_blank">https://www.reddit.com/r/datasets/</a>
<br><br>
Some examples are:
<ul class="align-left">
<li> <a href="https://www.reddit.com/r/datasets/comments/50tmud/us_prisoner_admissions_term_and_population_data/" target="_blank"> United States Prisioner Dataset </a> - Containing ~18M record of prisoners admission, term, population etc. </li>
<li> <a href="https://www.reddit.com/r/datasets/comments/4xc8jo/urbana_illinois_police_department_arrests_since/?ref=share&ref_source=link" target="_blank"> Urbana Police Arrests </a> - Urbana (a city in the Champaign county, in Illinois USA) published a list of all police arrests since 1988. </li>
<li> <a href="https://www.reddit.com/r/datasets/comments/4r97sd/monthly_grain_prices_in_england_12701955/" target="_blank"> Monthly Grain Prices in Englad </a> - All grain prices from 1270-1955. </li>
<li> <a href="https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/" target="_blank"> All Reddit Posts </a> - A dataset containing all posts published in Reddit. </li>
<li> There are people who request datasets (and being provided with one) so if you browse this community, don't just go into the 'dataset' tagged posts. </li>
<li> A <a href="https://morph.io/masterofpun/r_datasets" taget="_blank">scrapper</a> for this community was also developed, so you can browse all available datasets in an easy fashion.</li>
</ul>
</p>
</section>
<br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href="http://www.omuni.org/" target="_blank"><h3>Open Municipality Budgets</h3></a>
<a href="http://www.omuni.org/" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_omuni.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align">
Open local budget is a project under The Public Knowledge Workshop aimed at making local authorities budgets accessible to the public.
While the project is at beta, many budgets are already online (some in a more accessible format than others) browsable,
though a single budget at a time, in the projects <a href="http://www.omuni.org/" target="_blank" >home page</a>.
Being in beta means that there is plenty of room for improvement - this is where you can come in!
Some project ideas based on this data are:
<ul class="align-left">
<li>How can we compare between budgets of different municipalities or of the same
municipality but in different years? And how can such a comparison be visualized?</li>
<li>Given a budget can we understand how it is invested geographically/demographically?
Moreover, can we derive how a given municipality invests the money it got from taxes/arnona
in proportion to the amount payed by each region in the city?
<li>Generalize the above two ideas into a tool that is able to take complex queries and produce
meaningful results (not necesarily visualized).</li>
</ul>
<p class="align"> The budget data is available <a href="https://drive.google.com/drive/folders/0B9KCjEIdzJZUSUF2NVRaNW9JYTg?sort=7&direction=d">in this Google Drive folder</a>.</p>
</p>
</section>
<br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href="https://archive.org/details/stackexchange" target="_blank"><h3>Stack Exchange</h3></a>
<a href="https://archive.org/details/stackexchange" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_stackexchange.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align">
Starting in Stack-Overflow, the Stack-Exchange network is a collection of Q&A websites, each dealing with a different topic - from porgramming to home improvement.
These vast knowledge bases, some containing over a few millions of answers, are available to download in XML format.
<br><br>
A link to the dataset: <a href="https://archive.org/details/stackexchange" target="_blank">https://archive.org/details/stackexchange</a>
<br><br>
Some projects that you could attempt using this dataset are:
<ul class="align-left">
<li> How many questions are unique? We believe that most questions have been answered before (in some form or another) so why not develop an automated answering system? </li>
<li> Could we teach a machine to code based on answers from Stack-Overflow? </li>
<li> Is there similarity between different sites relating to similar topics? For instance, do questions asked around Latin-based languages have a similar answer? </li>
</ul>
</p>
</section>
<br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href=""http://kikar.org/" target="_blank"><h3>Israeli MKs' Facebook Posts' Comments</h3></a>
<a href=""http://kikar.org/" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_kikar.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align">
We provide a unique dataset of facebook comments to statuses published by Israeli MKs during 2015-2016.
In total there are about 5 million such comments, out of which 1,600 are labeled according to the sentiment of the comment's text.
A great challenge is to use the 1,600 labeled comments, in order to find the sentiment of all the comments.
In this <a href = "https://drive.google.com/open?id=0Bz-MJkSVg93LaXRFRkdJckYtV0E">folder</a> you'll find the labeled data, some information about the labels, and the unlabeled data.
This dataset was collected by the team of <a href = "http://kikar.org/">Kikar Hamedina</a>, and they will be more than happy to help.
Contact the data team if you wish us to get you in touch.
</p>
</section>
<!-- <br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<a href="https://www.githubarchive.org/" target="_blank"><h3>GitHub</h3></a>
<a href="https://www.githubarchive.org/" target="_blank">
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_github.png"></a>
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align"> GitHub is the largest host of open-source projects, containing over 31 million (!) projects with contributions of over 12 million people.
This dataset contains all of the public timeline (commits, issues, user info, etc.) of GitHub starting from 2011.
Some cool things you could do with this dataset are:
<ul class="align-left">
<li> Find trends in programming languages through time.</li>
<li> How many times did you use "copy-paste"? How many times did this happen on GitHub, or how unique are the projects?</li>
<li> Predict possible issues in new projects. </li>
</ul>
</p>
</section> -->
<br><br>
<sideimg class="2u 6u(medium) 12u$(xsmall)">
<h3>Additional Datasets</h3>
<img class="icon alt major fa-area-chart circle link" src="images/data/logo_data.png">
</sideimg>
<section class="10u$ 6u(medium) 12u$(xsmall)">
<p class="align">
Here are some additional resources which you can use to find open datasets.
This is really just the tip of the iceberg, so if you don't find anything interesting here,
it dosen't mean that it dosen't exist at all. If you have something specific in mind and need our help,
mail us at <a href="mailto:[email protected]">[email protected]</a>.
<ul class="align-left">
<li> <a href="https://www.kaggle.com/datasets" target="_blank">Kaggle</a> - A datascience community, contains some nice datasets such as school fires in Sweden from 1998-2014. </li>
<li> <a href="http://www.gdeltproject.org/" target="_blank">The GDELT Project</a> - The GDELT Project monitors the world's broadcast, print, and web news from around the world and identifies people, locations, organizations, emotions and more. </li>
<li> <a href="http://developer.nytimes.com/" target="_blank">The New York Times</a> - One of the most widespread newspapers in the world.</li>
<li> <a href="https://nctu.partners.org/ProACT/" target="_blank">Prize4Life</a> - A dataset containing clinical records of ALS patients. </li>
<li> <a href="http://apps.ecmwf.int/datasets/" target="_blank">European Centre for Medium-Range Weather Forecasts</a> - Datasets containing weather information, one of which contains atmospheric data from 1900 until 2010!</li>
<li> <a href="https://data.nasa.gov/data" target="_blank"> NASA </a> - A dataset containing all of NASA's data from biological measurements to software usage.</li>
</ul>
</p>
</section>
</div>
</div>
</section>
<!-- Footer -->
<footer id="footer">
Social
<ul class="icons">
<li><a href=" https://www.snapchat.com/add/datahack" class="icon icon-snap" target="_blank"><span class="label">SnapChat</span></a></li>
<li><a href="http://www.meetup.com/DataHack" class="icon icon-meetup" target="_blank"><span class="label">LinkedIn</span></a></li>
<li><a href="https://twitter.com/datahackil" class="icon fa-twitter" target="_blank"><span class="label">Twitter</span></a></li>
<li><a href="https://www.facebook.com/datahackil" class="icon fa-facebook" target="_blank"><span class="label">Facebook</span></a></li>
<li><a href="https://www.linkedin.com/company/10256604" class="icon fa-linkedin" target="_blank"><span class="label">LinkedIn</span></a></l>
<li><a href="mailto:[email protected]" class="icon fa-envelope"><span class="label">Email</span></a></li>
</ul>
<ul class="copyright">
<li>© DataHack 2016 All rights reserved.</li><li>Thanks to HTML5 UP</li> <li>[email protected]</li>
</ul>
</footer>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js" type="text/javascript"></script>
<script type="text/javascript">
$(document).ready(function(){
$(".slidingDiv").hide();
$(".show_hide").show();
$('.show_hide').click(function(){
$(".slidingDiv").slideToggle();
});
});
</script>
<!-- Scripts -->
<script src="staticassets/js/jquery.min.js"></script>
<script src="staticassets/js/jquery.scrolly.min.js"></script>
<script src="staticassets/js/jquery.dropotron.min.js"></script>
<script src="staticassets/js/jquery.scrollex.min.js"></script>
<script src="staticassets/js/skel.min.js"></script>
<script src="staticassets/js/util.js"></script>
<!--[if lte IE 8]><script src="staticassets/js/ie/respond.min.js"></script><![endif]-->
<script src="staticassets/js/main.js"></script>
</body>
</html>