Here are a number of challenges for the MLB dataset, which can be found here: https://www.kaggle.com/pschale/mlb-pitch-data-20152018. These are intended for those new to pandas and sci-kit learn who want to learn data analysis using Python. The challenges are ordered from easiest to most difficult.
Which venue is the windiest? What is the average wind speed at this venue?
Which pitchers throw the fastest? Do faster-throwing pitchers get more strikeouts? Do they give up less runs?
Find some way to classify pitches based on the attributes in the dataset. Which types of pitches are most difficult for batters to hit?
Select a batter. Which types of pitches are they best at hitting? Which types are they the worst at hitting? Write code that will perform this analysis on any given batter in the data set.
Create an algorithm that, given a pitcher and sequence of pitches already thrown, will predict what type of pitch will be thrown next.
Create an algorithm that will, given the variables in the dataset, predict the outcome of a given at-bat.