Skip to content

Latest commit

 

History

History
104 lines (59 loc) · 7.34 KB

M1.md

File metadata and controls

104 lines (59 loc) · 7.34 KB

Milestone 1

Dataset

Find a dataset (or multiple) that you will explore. Assess the quality of the data it contains and how much preprocessing / data-cleaning it will require before tackling visualization. We recommend using a standard dataset as this course is not about scraping nor data processing.

Our work is based on two datasets from Kaggle, one with international football match records and the other with player data.

The first dataset ("match_results.csv") contains 43,170 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team. And we will be interested in the following information:

  • date : date of the match
  • home_team : the name of the home team
  • away_team : the name of the away team
  • home_score : full-time home team score including extra time, not including penalty-shootouts
  • away_score : full-time away team score including extra time, not including penalty-shootouts
  • tournament : the name of the tournament
  • city : the name of the city/town/administrative unit where the match was played
  • country : the name of the country where the match was played

The second dataset ("players_22.csv") provided include the players data for the Career Mode from FIFA 22 . This data provides more than 100 attributes for players such as

  • short_name and long_name : player name
  • player_position : player preferred positions in the court
  • overall : Player current overall score
  • valeur_eur : player value in EUR
  • ...

These two dataset are clean but we still need to do some data pre-processing, especially in terms of data transformation and data reduction.

  • Some new properties of data are created from existing attributes to help in the data analysis process. For example, home_team, home_score, away_score can be transformed to another attribute like home_win_rate for each team, which will be directly used in the future work.

  • We only focus on the following qualifaction matchs toward FIFA world cup after 1990 in the first dataset column tournament :

    • AFC Asian Cup

    • AFC Asian Cup qualification

    • African Cup of Nations

    • African Cup of Nations qualification

    • Copa América

    • Copa América qualification

    • FIFA World Cup

    • FIFA World Cup qualification

    • Oceania Nations Cup

    • UEFA Euro

    • UEFA Euro qualification

    • UEFA Nations League

  • In order to improve the efficiency of data retrieval, we remove some irrelevant or unnecessary columns in the second dataset. For example, the ability value of a player in a position he not belongs to (If we put a goalkeeper to play as a striker.), and the appearance value of a player, etc.

Problematic

Frame the general topic of your visualization and the main axis that you want to develop.

  • What am I trying to show with my visualization?
  • Think of an overview for the project, your motivation, and the target audience.

In the face of the upcoming 2022 World Cup, we wanted to visualize in our project the previous head-to-head records of the World Cup teams, the information of each team, and the information of each player. Our target audience is everyone who wants to know and see the World Cup data. In our website, you can see the head-to-head records of the teams, each team's match records in recent years, player information, etc. It is also possible to compare the teams that are about to play against each other, giving a more intuitive feeling from the data. This can even help viewers predict game scores and win football pools.

Exploratory Data Analysis

Pre-processing of the dataset you chosen

  • Show some Basic statistics and get insights about the data

We are doing analysis on our datasets independently at this stage to gain a general overview of the data. Moreover, we have applied data transformation to obtain crucial information from just home_score and away_score, which also enables a better efficiency for the coming milestones. For a detailed statistics and insights please refer to the Notebook here.

Related Work

  • What others have already done with the data? Visualizations that you found on other websites or magazines (might be unrelated to your data).

Many websites and professional organizations have shown similar visualization results of World Cup matches, but we are not sure whether they use the same data as we proposed. For example :

  • This website displays matches results of World Cup history. By clicking the button of a specific team, we are able to get its corresponding historical match records. Some other results also display players’ ability value :

image

  • Then, In this website, players’ rating data versus their personal value information are presented. The author also makes the comparison between each country’s players and the rest, which directly visualizes the level of each team in each position :

image

  • What's more, some football game producers also use players data to construct corresponding virtual players for their games, such like “FIFA online 4” :

2-1P1101Q226

  • Why is your approach original?

We will focus on analyzing and visualizing the data of the previous World Cup matches, the overall teams data, and the statistical information of players in each team. Most of other common visualization results only display relevant game data and team information. Unlike them, we hope to be able to analyze from multiple dimensions from both teams and players perspectives. We will strive to show the different characteristics of each team, the impact of the player's ability in each position on the overall ability of the team, and the most common elements needed to build a strong team, etc.

  • What source of inspiration do you take?

China has been making continuous efforts to enter the World Cup finals, but the results are always unsatisfactory. Considering such a large population base, we would like to use these data to display the necessary factors to become a strong football team, explore the shortcomings faced by Chinese football, and hopefully find possible improvement measures.