Introduction

The NBA introduced the three-point line in the 1979–80 season, but it was rarely used during its early years. For a long time, it was considered a novelty rather than a core part of team strategy.

Starting in the early 2000s, however, teams began using the three-point shot more consistently and effectively. This shift set the stage for what many now call the “three-point revolution”, a complete transformation in how NBA offenses are designed and executed.

In this project, I explore how three-point shooting efficiency (3P%) has evolved between 2003 and 2023 and examine whether higher three-point efficiency is associated with greater team success, measured by win percentage.

We use the NBA Games Stats Kaggle dataset covering regular-season games from 2003–2023. This dataset includes:

Other datasets (e.g., player-level stats) were excluded for focus and simplicity.

Glossary for non-NBA readers:

Data Import

games <- read_csv("games.csv")
glimpse(games)
## Rows: 26,651
## Columns: 21
## $ GAME_DATE_EST    <date> 2022-12-22, 2022-12-22, 2022-12-21, 2022-12-21, 2022…
## $ GAME_ID          <dbl> 22200477, 22200478, 22200466, 22200467, 22200468, 222…
## $ GAME_STATUS_TEXT <chr> "Final", "Final", "Final", "Final", "Final", "Final",…
## $ HOME_TEAM_ID     <dbl> 1610612740, 1610612762, 1610612739, 1610612755, 16106…
## $ VISITOR_TEAM_ID  <dbl> 1610612759, 1610612764, 1610612749, 1610612765, 16106…
## $ SEASON           <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,…
## $ TEAM_ID_home     <dbl> 1610612740, 1610612762, 1610612739, 1610612755, 16106…
## $ PTS_home         <dbl> 126, 120, 114, 113, 108, 112, 143, 106, 110, 99, 101,…
## $ FG_PCT_home      <dbl> 0.484, 0.488, 0.482, 0.441, 0.429, 0.386, 0.643, 0.55…
## $ FT_PCT_home      <dbl> 0.926, 0.952, 0.786, 0.909, 1.000, 0.840, 0.875, 0.61…
## $ FG3_PCT_home     <dbl> 0.382, 0.457, 0.313, 0.297, 0.378, 0.317, 0.636, 0.42…
## $ AST_home         <dbl> 25, 16, 22, 27, 22, 26, 42, 25, 22, 23, 19, 29, 29, 2…
## $ REB_home         <dbl> 46, 40, 37, 49, 47, 62, 32, 38, 49, 39, 37, 46, 48, 4…
## $ TEAM_ID_away     <dbl> 1610612759, 1610612764, 1610612749, 1610612765, 16106…
## $ PTS_away         <dbl> 117, 112, 106, 93, 110, 117, 113, 113, 116, 104, 98, …
## $ FG_PCT_away      <dbl> 0.478, 0.561, 0.470, 0.392, 0.500, 0.469, 0.494, 0.44…
## $ FT_PCT_away      <dbl> 0.815, 0.765, 0.682, 0.735, 0.773, 0.778, 0.760, 0.90…
## $ FG3_PCT_away     <dbl> 0.321, 0.333, 0.433, 0.261, 0.292, 0.462, 0.364, 0.26…
## $ AST_away         <dbl> 23, 20, 20, 15, 20, 27, 32, 17, 19, 17, 29, 25, 25, 2…
## $ REB_away         <dbl> 44, 37, 46, 46, 47, 47, 36, 38, 45, 39, 36, 39, 40, 4…
## $ HOME_TEAM_WINS   <dbl> 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1,…

Data Wrangling

We reshape the dataset to treat home and away teams separately, allowing team-level analysis regardless of game location.

games_long <- games %>%
  select(SEASON, GAME_ID, HOME_TEAM_WINS,
         TEAM_ID_home, PTS_home, FG3_PCT_home,
         TEAM_ID_away, PTS_away, FG3_PCT_away) %>%
  pivot_longer(
    cols = c(TEAM_ID_home, PTS_home, FG3_PCT_home,
             TEAM_ID_away, PTS_away, FG3_PCT_away),
    names_to = c(".value", "home_away"),
    names_pattern = "(.*)_(home|away)"
  ) %>%
  mutate(
    win = case_when(
      home_away == "home" & HOME_TEAM_WINS == 1 ~ 1,
      home_away == "away" & HOME_TEAM_WINS == 0 ~ 1,
      TRUE ~ 0
    )
  )
glimpse(games_long)
## Rows: 53,302
## Columns: 8
## $ SEASON         <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2…
## $ GAME_ID        <dbl> 22200477, 22200477, 22200478, 22200478, 22200466, 22200…
## $ HOME_TEAM_WINS <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0…
## $ home_away      <chr> "home", "away", "home", "away", "home", "away", "home",…
## $ TEAM_ID        <dbl> 1610612740, 1610612759, 1610612762, 1610612764, 1610612…
## $ PTS            <dbl> 126, 117, 120, 112, 114, 106, 113, 93, 108, 110, 112, 1…
## $ FG3_PCT        <dbl> 0.382, 0.321, 0.457, 0.333, 0.313, 0.433, 0.297, 0.261,…
## $ win            <dbl> 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0…

We then summarize at the team-season level to analyze season averages.

nba_clean <- games_long %>%
  group_by(SEASON, TEAM_ID) %>%
  summarise(
    avg_fg3_pct = mean(FG3_PCT, na.rm = TRUE),
    win_rate = mean(win),
    avg_pts = mean(PTS, na.rm = TRUE),
    games_played = n()
  ) %>%
  filter(games_played >= 30)  # Exclude teams with fewer than 30 games for stability

glimpse(nba_clean)
## Rows: 599
## Columns: 6
## Groups: SEASON [20]
## $ SEASON       <dbl> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 200…
## $ TEAM_ID      <dbl> 1610612737, 1610612738, 1610612739, 1610612740, 161061274…
## $ avg_fg3_pct  <dbl> 0.3213253, 0.3361149, 0.3209762, 0.3185556, 0.3504819, 0.…
## $ win_rate     <dbl> 0.3666667, 0.4255319, 0.4555556, 0.5154639, 0.2888889, 0.…
## $ avg_pts      <dbl> 92.66265, 94.68966, 92.88095, 91.08889, 89.31325, 104.584…
## $ games_played <int> 90, 94, 90, 97, 90, 95, 95, 89, 95, 90, 112, 102, 95, 108…

Data Visualization

Average 3-Point Percentage Over Time

nba_clean %>%
  group_by(SEASON) %>%
  summarise(avg_3p_pct = mean(avg_fg3_pct)) %>%
  ggplot(aes(x = SEASON, y = avg_3p_pct)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  labs(title = "Average 3-Point Percentage in the NBA (2003–2023)",
       x = "Season", y = "3-Point Percentage") +
  theme_minimal()

Interpretation:
Three-point shooting efficiency improved from roughly 32% in 2003 to approximately 36% by 2023. This gradual rise highlights a strategic league-wide shift toward emphasizing perimeter shooting.

3-Point Percentage vs. Win Rate

nba_clean %>%
  ggplot(aes(x = avg_fg3_pct, y = win_rate)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "green") +
  labs(title = "Relationship Between 3-Point Percentage and Win Rate",
       x = "Average 3-Point Percentage",
       y = "Win Rate") +
  theme_minimal()

Interpretation:
There is a clear positive relationship: teams shooting above league-average three-point percentages often enjoy win rates above 50%. This supports the notion that shooting efficiency from beyond the arc has become a critical driver of team success.

Modeling / Analysis

We fit a simple linear regression model to assess whether three-point percentage predicts win rate.

model <- lm(win_rate ~ avg_fg3_pct, data = nba_clean)
summary(model)
## 
## Call:
## lm(formula = win_rate ~ avg_fg3_pct, data = nba_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34134 -0.09006  0.00442  0.09145  0.30307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.77719    0.09407  -8.262 9.29e-16 ***
## avg_fg3_pct  3.60411    0.26673  13.512  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1208 on 597 degrees of freedom
## Multiple R-squared:  0.2342, Adjusted R-squared:  0.2329 
## F-statistic: 182.6 on 1 and 597 DF,  p-value: < 2.2e-16

Interpretation of Results:
The model shows that each 1% improvement in 3P% is associated with approximately a 3.6% increase in win rate (β = 3.60, p < 0.001). While three-point efficiency explains about 23% of the variation in win rates (R-squared = 0.23), other factors such as defense, turnovers, and rebounds also play important roles.

Conclusion

This project confirms that three-point shooting efficiency has steadily risen over the past two decades and is significantly associated with team success. Teams emphasizing perimeter efficiency have gained a strategic advantage in the modern NBA landscape.

Limitations:

Future Directions:

References