Skip to content

UMDBigDataClub/Challenges-MLB-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

List of MLB Data Challenges

Here are a number of challenges for the MLB dataset, which can be found here: https://www.kaggle.com/pschale/mlb-pitch-data-20152018. These are intended for those new to pandas and sci-kit learn who want to learn data analysis using Python. The challenges are ordered from easiest to most difficult.

Challenge 1: Windiest Venue

Which venue is the windiest? What is the average wind speed at this venue?

Challenge 2: Pitcher Speed

Which pitchers throw the fastest? Do faster-throwing pitchers get more strikeouts? Do they give up less runs?

Challenge 3: Pitch Classification

Find some way to classify pitches based on the attributes in the dataset. Which types of pitches are most difficult for batters to hit?

Challenge 4: Batter Analysis

Select a batter. Which types of pitches are they best at hitting? Which types are they the worst at hitting? Write code that will perform this analysis on any given batter in the data set.

Challenge 5: Pitch Prediction

Create an algorithm that, given a pitcher and sequence of pitches already thrown, will predict what type of pitch will be thrown next.

Challenge 6: At-bat Outcome Prediction

Create an algorithm that will, given the variables in the dataset, predict the outcome of a given at-bat.

About

Data science challenges for the MLB data set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published