Introduction:
The purpose of this project is to combine my passion for NBA basketball with my growing expertise in data analysis. Ever since I could legally participate, sports betting has been a thrilling hobby of mine. As an avid NBA fan, I’ve always been captivated by the intricate details and statistics that define each game. By analyzing performance data of key players, I aim to make more informed betting decisions while deepening my appreciation for the sport. This project is not just about numbers; it’s about uncovering the stories behind the data and connecting with the game on a deeper level.
In this project, I will be analyzing Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić’s 2023–2024 season performance based on various factors such as 3 point percentage, field goal percentage , free throw percentage, minutes played, turnover vs. assist,+/- , as well as utilizing supervised machine learning technique to predict their individual points in the NBA final 2023–2024.
Dataset:
The dataset used in this project is sourced directly from Stathead.com, a comprehensive sports statistics website powered by Sports Reference. Stathead offers in-depth statistical information, historical data, and advanced analytics for a variety of sports, including basketball. It’s an invaluable resource for sports enthusiasts, analysts, and bettors who seek detailed insights into player and team performances.
To acquire the data, I utilized Stathead’s robust search and filtering features to extract specific player statistics for the NBA. The process involved:
- Navigating to the Basketball Section: Accessing the basketball section of Stathead, which provides extensive data on players, teams, seasons, and individual games.
- Using the Player Season Finder Tool: Leveraging the Player Season Finder tool to filter and download player performance data for Jaylen Brown, Kyrie Irving, Luka Dončić, and Jayson Tatum.
- Customizing Data Extraction: Specifying the required seasons, statistics, and additional filters to tailor the dataset to my analysis needs.
- Exporting the Data: Downloading the data in a CSV format, which allows for easy manipulation and analysis using various data analysis tools.
The resulting dataset comprises 32 columns and 316 rows post cleaning for the regular season & playoff (Up to final Game 3). Key dimensions include:
- Game Information: Date, opponent, home/away status.
- Performance Metrics: Points (PTS), minutes played (MP), field goal percentage (FG%), three-point percentage (3P%), free throw percentage (FT%), plus/minus (+/-).
- Advanced Statistics: Player efficiency rating, usage rate, and more.
By sourcing data from Stathead.com, I ensured that the dataset is accurate, comprehensive, and relevant to my analysis objectives. This rigorous data acquisition process forms the foundation of my project, enabling a detailed and insightful exploration of NBA player performances.
Guiding Questions & Objectives:
- How do the shooting percentages (3P%, FG%, FT%) of Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić vary throughout the 2023–2024 season?
- What is the relationship between turnovers and assists for these players, and how does it impact their overall performance?
- How does the number of minutes played (MP) influence the performance metrics of each player?
- How do the plus/minus (+/-) ratings of these players reflect their impact on the game’s outcome?
- Can we use supervised machine learning techniques to predict the individual points scored by these players in the NBA Finals 2023–2024 Game 3 to Game 7?
Data Wrangling Techniques:
1. Loading the Data
First, we load the datasets for each player from CSV files using the pandas
library. This step initializes our dataframes, which we will then clean and transform.
2. Handling Missing Values
We ensure the integrity of our dataset by removing rows with missing values in key columns. (Most Kyrie Irving’s missing games due to injury this season) This step is crucial because missing values in these columns can significantly affect our analysis.
3. Converting Data Types
To facilitate analysis, we convert the MP
(Minutes Played) column from a string format (e.g., '38:00') to an integer format representing total minutes. We extract the first two characters of the string and convert them to an integer.
4. Adding Home/Away Column
The Unnamed: 5
column indicates whether a game was played at home or away. We create a new column Home/Away
based on the presence of '@' in the Unnamed: 5
column. This categorization helps us analyze performance differences between home and away games.
5. Converting Percentages
Columns such as FG%
, 3P%
, and FT%
are initially in decimal format (e.g., 0.45). We convert these columns to a more readable percentage format (e.g., '45.0%') by multiplying by 100 and adding a '%' sign.
6. Applying Transformations
We apply the above transformations to each dataframe for the players Jaylen Brown, Kyrie Irving, Luka Dončić, and Jayson Tatum.
Data Analysis Discussion:
- How do the shooting percentages (3P%, FG%, FT%) of Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić vary throughout the 2023–2024 season?
Objective: To analyze the consistency and effectiveness of each player’s shooting over the course of the season.
Shooting Percentages (3P%, FG%, FT%) by Player and Home/Away Status
The bar chart visualizes the average shooting percentages for Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić during the 2023–2024 NBA season, separated by home and away games.
Three-Point Percentage (3P%)
- Jaylen Brown: Home: 35.66%, Away: 35.28%
- Jayson Tatum: Home: 35.66%, Away: 35.96%
- Kyrie Irving: Home: 37.80%, Away: 38.88%
- Luka Dončić: Home: 34.53%, Away: 39.15%
Insights: Luka Dončić and Kyrie Irving perform better away. Jaylen Brown is consistent across both.
Field Goal Percentage (FG%)
- Jaylen Brown: Home: 52.84%, Away: 48.87%
- Jayson Tatum: Home: 46.38%, Away: 45.89%
- Kyrie Irving: Home: 47.17%, Away: 49.28%
- Luka Dončić: Home: 46.95%, Away: 48.28%
Insights: Jaylen Brown excels at home. Kyrie Irving and Luka Dončić slightly better away. Jayson Tatum remains consistent.
Free Throw Percentage (FT%)
- Jaylen Brown: Home: 75.76%, Away: 57.81%
- Jayson Tatum: Home: 83.30%, Away: 84.52%
- Kyrie Irving: Home: 91.21%, Away: 85.02%
- Luka Dončić: Home: 81.32%, Away: 76.96%
Insights: Jaylen Brown’s FT% drops away. Jayson Tatum and Kyrie Irving maintain high FT% with minimal differences.
Recommendations:
Jaylen Brown excels at home in FG% and FT%, utilize him more at home.
Kyrie Irving and Luka Dončić perform better away in FG% and 3P%.
Jayson Tatum is steady across all metrics, regardless of location.
Jaylen Brown needs to improve his FT% in away games.
What is the relationship between turnovers and assists for these players, and how does it impact their overall performance?
Objective: To explore how each player’s assist-to-turnover ratio affects their efficiency and contribution to the team’s success.
Analysis and Discussion
Assist/Turnover Ratio (Ast/TO) by Player and Home/Away Status
The table visualizes the Assist/Turnover ratio (Ast/TO), a calculated field by dividing assist by turnover for Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić during the 2023–2024 NBA season, separated by home and away games.
Jaylen Brown: Home: 1.629, Away: 1.409
Jayson Tatum: Home: 2.702, Away: 2.465
Kyrie Irving: Home: 3.248, Away: 3.394
Luka Dončić: Home: 2.495, Away: 3.030
Jaylen Brown: Better Ast/TO ratio at home.
Jayson Tatum: Slightly better at home, consistent performance.
Kyrie Irving: High Ast/TO ratio with a slight improvement away.
Luka Dončić: Improved efficiency away.
Recommendations
Jaylen Brown: Improve decision-making in away games.
Jayson Tatum: Maintain consistent playmaking abilities.
Kyrie Irving: Sustain high Ast/TO ratio both home and away.
Luka Dončić: Replicate away game efficiency at home.
How does the number of minutes played (MP) influence the performance metrics of each player?
Objective: To investigate whether longer playing times correlate with better or worse performance statistics for the players.
Analysis and Discussion
Minutes Played (MP) vs. Game Score (GmSc) by Player and Home/Away Status
The bar chart visualizes the average game score (GmSc) and minutes played (MP) ratio, a feature engineered column, dividing MP by GmSc for Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić during the 2023–2024 NBA season, separated by home and away games.
MP/GmSc Ratio by Player and Home/Away Status
- Jaylen Brown: Home: 2.338, Away: 3.724
- Jayson Tatum: Home: 1.963, Away: 1.894
- Kyrie Irving: Home: 2.314, Away: 2.180
- Luka Dončić: Home: 1.602, Away: 2.871
Insights:
- Jaylen Brown: Higher ratio away.
- Jayson Tatum and Kyrie Irving: Consistent home and away.
- Luka Dončić: Higher ratio away.
Recommendations
- Jaylen Brown: Leverage away performance.
- Jayson Tatum and Kyrie Irving: Maintain consistency.
- Luka Dončić: Replicate away efficiency at home.
What is the average plus/minus (+/-) ratings of these players?
Objective: To showcase players’ average plus/minus(+/-) throughout the season.
Plus/Minus (+/-) by Player and Home/Away Status
The bar chart visualizes the average plus/minus (+/-) rating for Jaylen Brown, Jayson Tatum, Kyrie Irving, and Luka Dončić during the 2023–2024 NBA season, separated by home and away games.
- Jaylen Brown: Home: 7.262, Away: 5.889
- Jayson Tatum: Home: 11.022, Away: 6.050
- Kyrie Irving: Home: 4.563, Away: 3.355
- Luka Dončić: Home: 5.357, Away: 3.783
Insights:
- Jaylen Brown: Better +/- at home.
- Jayson Tatum: Significantly better +/- at home.
- Kyrie Irving: Higher +/- at home.
- Luka Dončić: Higher +/- at home.
Recommendations
- Jaylen Brown: Capitalize on strong home performance.
- Jayson Tatum: Maximize home advantage.
- Kyrie Irving: Leverage better home performance.
- Luka Dončić: Utilize home game strategies to boost +/-
Can we use supervised machine learning techniques to predict the individual points scored by these players in the NBA Finals 2023–2024?
Objective: To apply an 80–20 train-test split to build and evaluate a predictive model for estimating the players’ points in crucial finals games.
This one is really fun, I would like to go in detail explaining the step I did. First, I loaded Linear Regression, MSE, R2, Standard Scaler, and Train Test Split from Sklearn.
The features I am using are 3point %, FG %, FT %, +/-, Game Score, and Minutes Played, with target being Points. Then I normalized the features using StandardScaler() and scaler.fit_transform(x), x being my features. After that, I split the data into 80–20 train-test. Lastly, I created a model with linear regression.
After I created the model, I calculated the average of the features for each of my players and predicted their next game’s points as shown above.
In addition, although a MSE of 8.58 is not ideal, but our model boasts a 0.835 R-squared value. What this means is that the model has a mean square error of 8.58 and 83.5% of variance of the target variable can be explained by predictors. Overall, I believe this model is good enough to influence my decision on players’ points betting.
Limitations:
For the linear regression model, it is sensitive to outlier (i.e. Luka’s 76 point night). However, I did not want to compromise the integrity of the data by transforming all the points since the spread was big enough.
Time: This project is for my master’s class. However, there was only 1 week before the approval of my data and due date, preventing me to dive deeper into more specific statistics.
Generalization: Due to my time constraint, I was unable to dive deeper, and had to generalize some data and make meaning comparison. When I have time in the future, I will definitely come back to this and dive deeper.
Defense: It is hard to categorize defense as NBA is a very dynamic game, the effect Jrue Holiday has on Kyrie Irving is definitely a factor to his poor performance in G1 & G2. However, there isn’t enough concrete data to analyze.
Conclusion:
In conclusion, the star players are performing very consistently. Although there were limitations to this project, I was still able to create meaning analyzations & predictions via Tableau & Pandas (sklearn library for linear regression model & prediction).
In the future, I would like to explore deeper and create a more complex model for this.
Thank you very much for reading this, I hope more people can link data to something they love.