Hockey: Strong Link or Whole Chain Sport?

I’m not a sport’s person. I’ve tried! I like going to games, and can totally get wrapped up in the excitement, but just never felt that loyalty to a team that so many fans have.

My five year old just started hockey (well, skating lessons) and I thought it would be fun to “Money Ball” Hockey. First question? Is it a strong link (one player matters), or a full chain (all the players matter).

For each team, using the 2024 season, pull in the stats of the top players, worst players, and an average across the team to see what is most correlated to the with the net goals (scores scored – scores allowed)

Scope

I’m going to pull in a complete seasons data, and compare the player who scores the most goals, versus the rest of the team average, against the success metric of net goals. I’m then going to weigh the correlations and significances against one another, to see what more contributes to a teams success; either the top player or the whole team (minus the top player).

Assumptions

While the net goals scored is influenced by goals scored with both team averages, and top-players; it ignores other factors. For example, a really strong goalie could prevent more goals, but isn’t included in the analysis. IE, this is really only looking at the offensive side of the hockey… field? rink? court?

Methodology

I found a really cool website called MoneyPuck that tracks alot of the stats already, and offers simple exports using CSV’s. I grabbed the Skaters and Team Level Datasets, and went to work.

Import Datasets from MoneyPuck
Select Stats
Order by Season and Teams
Pull Top Player Stats per Team
Pull Average Stats per Team
Pull Success Metric
Correlate Top Player and Average Player Against Success Metric

Import Datasets from MoneyPuck

We are keeping it pretty simple, with just two imports. Pandas, and SciPy.

import pandas as pd
from scipy.stats import pearsonr

I then visited the website MoneyPuck to download the player and team CSVs, and uploaded and converted them into two separate DataFrames.

## Load and preprocess data
skater_df = pd.read_csv('skaters.csv')
team_df = pd.read_csv('teams.csv')

There is a TON of data here, that I had to learn about. Luckily, there was a handy directory. First, I had to isolate the “situation” to all, using an order by function.

## Filter to only include 'all' situation
team_df = team_df[team_df["situation"] == "all"]
skater_df = skater_df[skater_df["situation"] == "all"]

Then I only kept the stats I needed for the Analysis, to help clean up the DataFrames.

## Sort dataframes
skater_df_sorted = skater_df[["season","name","team",'position','games_played','icetime','I_F_faceOffsWon',"I_F_goals"]]
skater_df_sorted = skater_df_sorted.sort_values(by=["season","team"], ascending=[True,False])

team_df_sorted = team_df[["season","team","goalsFor","goalsAgainst"]]
team_df_sorted = team_df_sorted.sort_values(by=["season","team"], ascending=[True,False])

Now I need the success metric! Since I have “goalsFor”, and “goalsAgainst”, I can create a new column for “net goals”. Then, I am going to normalize the success metric, to help control for outliers, and wide range gaps.

## Update skater dataframe to get best player and other player average per team per season
skater_team_grouped = skater_df_sorted.groupby(['season','team']).agg({
    'I_F_goals': [
        ('other player average', lambda x: x[x != x.max()].mean()),
        ('best player', 'max'),
    ]
}).reset_index()

skater_team_grouped.columns = ['season', 'team', 'other player average', 'best player',]

Now we’re going to create a new DataFrame by merging both the Teams DataFrame with the Success Metric, and the Skater DataFrame. The Season and Team columns in both DataFrames should contain the same data, so we’re going to merge on those columns.

## Merge dataframes
merged_df = pd.merge(team_df_sorted, skater_team_grouped, on=['season', 'team'])

Finally, I’m going to run the correlation tests. I’m going to use the pearsonr function, to get two variables. The Correlation Coefficient and the P-Value. The higher the Correlation Coefficient, the better (with perfect being 1). For the P-Value, any value under .05 means the metric is statically significant.

## Calculate Pearson correlation coefficients
corr_best, p_val_best = pearsonr(merged_df['best player'], merged_df['net_normalized'])
corr_other, p_val_other = pearsonr(merged_df['other player average'], merged_df['net_normalized'])
print(f"Pearson correlation between best player goals and team success: {corr_best:.4f} (p-value: {p_val_best:.4f})")
print(f"Pearson correlation between other player average goals and team success: {corr_other:.4f} (p-value: {p_val_other:.4f})")

Results

Both are significantly positively correlated, so both matter. Overall, in the 2024 Hockey Season, there is a 76.36% (P-Value < 0.0000) correlation between the strength of the “non-best players” on the teams net-goal differential. Yet the strongest player undoubtably have an impact, with a 44.32% correlation (P-Value <.0110).

This brings up a few more interesting questions. Is this isolated to 2024? What about the years of that Crosby guy? How does this sport compare to the NBA? MLB?

Stay Curious!

Source

MoneyPuck.com -Download Data. n.d. Retrieved January 20, 2026. https://moneypuck.com/data.htm.

Full Code Dump

import pandas as pd
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

## Load and preprocess data
skater_df = pd.read_csv('skaters.csv')
team_df = pd.read_csv('teams.csv')

## Filter to only include 'all' situation
team_df = team_df[team_df["situation"] == "all"]
skater_df = skater_df[skater_df["situation"] == "all"]

## Sort dataframes
skater_df_sorted = skater_df[["season","name","team",'position','games_played','icetime','I_F_faceOffsWon',"I_F_goals"]]
skater_df_sorted = skater_df_sorted.sort_values(by=["season","team"], ascending=[True,False])

team_df_sorted = team_df[["season","team","goalsFor","goalsAgainst"]]
team_df_sorted = team_df_sorted.sort_values(by=["season","team"], ascending=[True,False])

## Calculate goal net and normalized net for teams, creating the success metric
team_df_sorted["goal_net"] = team_df_sorted["goalsFor"] - team_df_sorted["goalsAgainst"]
team_df_sorted["net_normalized"] = (team_df_sorted["goal_net"] - team_df_sorted["goal_net"].mean()) / team_df_sorted["goal_net"].std()
team_df_sorted.sort_values(by=["season","goal_net"], ascending=[True,False], inplace=True)

## Update skater dataframe to get best player and other player average per team per season
skater_team_grouped = skater_df_sorted.groupby(['season','team']).agg({
    'I_F_goals': [
        ('other player average', lambda x: x[x != x.max()].mean()),
        ('best player', 'max'),
    ]
}).reset_index()

skater_team_grouped.columns = ['season', 'team', 'other player average', 'best player',]

## Merge dataframes
merged_df = pd.merge(team_df_sorted, skater_team_grouped, on=['season', 'team'])


## Calculate Pearson correlation coefficients
corr_best, p_val_best = pearsonr(merged_df['best player'], merged_df['net_normalized'])
corr_other, p_val_other = pearsonr(merged_df['other player average'], merged_df['net_normalized'])
print(f"Pearson correlation between best player goals and team success: {corr_best:.4f} (p-value: {p_val_best:.4f})")
print(f"Pearson correlation between other player average goals and team success: {corr_other:.4f} (p-value: {p_val_other:.4f})")

fig = merged_df.plot.scatter(x='other player average', y='net_normalized', title='Other Player Average Goals vs Team Success')
fig.set_xlabel('Other Player Average Goals')
fig.set_ylabel('Team Success (Normalized Goal Net)')

fig.figure.savefig('other_player_average_vs_team_success.png')

fig2 = merged_df.plot.scatter(x='best player', y='net_normalized', title='Best Player Goals vs Team Success', color='orange')
fig2.set_xlabel('Best Player Goals')
fig2.set_ylabel('Team Success (Normalized Goal Net)')
fig2.figure.savefig('best_player_vs_team_success.png')

Hockey: Strong Link or Whole Chain Sport?

Scope

Assumptions

Methodology

Import Datasets from MoneyPuck

Results

Source

Full Code Dump

Like this:

Discover more from The Curiosity Project

Comments

Leave a ReplyCancel reply

More posts

Last Frost Date

False Spring

March Madness Final Update

NBA: Strong Link or Weak Link

Hockey: Strong Link or Whole Chain Sport?

Scope

Assumptions

Methodology

Import Datasets from MoneyPuck

Results

Source

Full Code Dump

Share this:

Like this:

Discover more from The Curiosity Project

Comments

Leave a ReplyCancel reply

More posts

Last Frost Date

False Spring

March Madness Final Update

NBA: Strong Link or Weak Link

Discover more from The Curiosity Project