Skip to content

Latest commit

 

History

History
679 lines (430 loc) · 30.1 KB

README.md

File metadata and controls

679 lines (430 loc) · 30.1 KB

Wharton Undergraduate Data Analytics Club (WUDAC)

Welcome!

Welcome to Wharton Undergraduate Data Analytics Club. In this repository we host and compile resources for students in the hopes that this will aid in their learning process.

Getting Started

If you are a member of WUDAC, our private resources are hosted below. Please make sure to have a Github account, and sign up for the Student pack for free private repositories (2 years).

For the long haul

Get the tools

Text Editors
Visualization Frameworks
Database

Penn Academics & Organizations

From our WUDAC Alum, James Wang: What classes should I take at UPenn if I want to become a data scientist?

Newest Classes (Updated Fall 2017)

Penn has several organizations and academic programs tailored towards Data Analytics and Data Science, our club is only one of many.


Graduate Programs:

Fellowships


Sources / Repositories:

Github

Machine Learning Glossary - Google Developers

Good Reads:

Data

Open Source Projects

Data Sources by size

Massive

A. OpenDataSoft - 2600+ OPEN DATA PORTALS AROUND THE WORLD

Large
Medium
Small

Career

Job Portals / Leads

Resume / Cover Letters

A. Blogs / Sources

B. Opinion Articles / Tips

Interview Prep

By Topic

Introduction
  • To-do: Elevator pitch
Product/Case
A/B Testing

Sample Q&A:

  • Explaining CI's and Significance: If the statistical test returns significant, then you conclude that the effect is unlikely to arise from random chance alone. If you reject something with 95% confidence, then in the case that there is no true effect, then a result like ours (or a result more extreme than ours) will happen in less than 5% of all possible samples.

  • Why is randomization important in experimental design? How would you answer the question, does attending local meetups cause Etsy sellers to gather more sales?

    • Randomization is at the core of experimentation because it balances out these confounding variables. By assigning 50% of users to a control group and 50% of users to a treatment group, you can ensure that the rough level of seller commitment is on average balanced between the two groups, as is every single other possible confounding variable, measured or not.
  • What things might we need to be worried about if we have an experiment with 20 different metrics? What if we run 20 experiments simultaneously?

    • The more metrics you are measuring, the more likely you are to get at least one false positive. Ways to attempt to correct for this include changing your confidence level (e.g. Bonferroni Correction) or doing family-wide tests before you dive in to the individual metrics (e.g. Fisher's Protected LSD). However, these are not used often in practice, and most people decide to just proceed with caution and be wary of spurious results.
  • Null Hypothesis: "Disputing a null hypothesis is a matter of running the experiment long enough to rule out an incidental outcome. This concept is also referred to as reaching statistical significance."

Coding
Statistics & Probability

General Topics / Papers

Chatbots

Guides / Tutorials
APIs & Engines

Computer Vision

Deep Learning

Deep Learning Reading Roadmap Generating text with deep learning

Machine Learning

ML Algorithms explained

Time-dependent Analysis

Natural Language Processing (NLP) / Advanced Text Mining

Courses
Algorithms / Key Concepts
Courses

Online Courses

Stanford Repository, self-paced

  • Natural Language Processing (NLP) / Advanced Text Mining

CS224n: Natural Language Processing with Deep Learning

Portfolio

How to share your data science portfolio Github Project - Best Practices

Regex


Programming Languages - Data Science Core (Python / R / SQL + Databases + Shell)

Databases and Data Engineering

Amazon Web Services (AWS) - Cloud Computing Services:

  • Create an account: Home
  • AWS Educate for Students: Apply
AWS Lambda
Command Line / Linux:

Python:

Topics

Tutorials:

Algorithms

PCA
K-Means
Beginner / Intermediate
Advanced:

R

Getting started: Installation, setup, learning from the basics

Check the WUDAC Dropbox for the main resources (WIP)

Tutorials:

Beginner / Intermediate
Advanced:

SQL

Tutorials:
Topics:

Overview: SQL As Understood By SQLite

Database Fundamentals - Microsoft Virtual Academy

  • Other

Query Planner


Programming Languages - Software Engineering / Other skills

Android

  • LinkedIn Learning (Paywall): Android

AngularJS

Bootstrap

C++ / C / C# / Objective-C

C++ C C# Objective-C

Cassandra

Excel / VBA

Excel VBA (Visual Basic for Applications)

Git / GitHub

Go

  • LinkedIn Learning (Paywall): Go

Hadoop (Apache Hadoop)

Haskell

HTML/CSS/JS/Other:

HTML CSS

iOS

  • LinkedIn Learning (Paywall): iOS

Java

  • CIS 110 / 120 (...)
  • Testing Framework - CompileJava
  • LinkedIn Learning (Paywall): Java

Javascript:

General Topics
Node.js

Julia

  • LinkedIn Learning (Paywall): Julia

Kotlin

  • LinkedIn Learning (Paywall): Kotlin

MATLAB:

Markdown

MongoDB:

  • Quora:

Prerequisites Learning

Perl

  • LinkedIn Learning (Paywall): Perl

PHP

  • LinkedIn Learning (Paywall): PHP

Ruby:

Scala

  • LinkedIn Learning (Paywall): Scala

Spark

SPSS

  • LinkedIn Learning (Paywall): SPSS

Swift

  • LinkedIn Learning (Paywall): Swift

Tableau

  • LinkedIn Learning (Paywall): Tableau

Wordpress


Official Blogs / Research Sites:

AirBnB

Facebook Research

Netflix

Wealthfront

University Data Science Teams / Clubs


Podcasts:


Other Blogs: