NHL Prediction Model Performance & Calibration
How to Read These Metrics
Accuracy
The percentage of games where the predicted winner (team with >50% win
probability) actually won. Simple but incomplete — it ignores how
confident the model was.
Brier Score
Measures the mean squared error of probability predictions (0–1 scale,
lower is better). A coin-flip baseline yields 0.25; our model targets
values below 0.24. Brier score rewards well-calibrated confidence levels,
not just picking the right side.
Log Loss
A logarithmic scoring rule that heavily penalises confident wrong
predictions. Assigning 90% to a team that loses costs far more than
assigning 55%. This keeps the model honest about uncertainty.
Calibration
Shows whether stated probabilities match real outcomes. In a well-calibrated
model, games given a 70% win probability should be won about 70% of the
time. The calibration tables below group predictions into decile bins so
you can verify this directly.
RMSE (Root Mean Squared Error)
For goal total predictions, RMSE measures how far predicted totals are
from actual totals on average, in goal units.
Why calibration matters most: For probabilistic predictions,
calibration is more important than raw accuracy. A model that says "55%"
every game can be 55% accurate but useless for decision-making. A
well-calibrated model tells you how much to trust each prediction.
Learn more in our analytics guide
and methodology .
Game Predictions (Multi-Window)
Window Start End Games Accuracy Brier Log Loss Avg Winner Prob RMSE Total
last 30 2026-03-16 2026-04-15 244 56.6% 0.2360 0.6631 53.2% 2.637 season to date 2025-10-01 2026-04-15 1306 56.3% 0.2440 0.6808 52.1% 2.588 multi season 2023-10-10 2026-04-15 3930 61.9% 0.2270 0.6447 53.9% 2.202
Totals (Over 5.5)
Window Games Accuracy Brier Log Loss Avg Outcome Prob
last 30 244 50.4% 0.2685 0.7341 50.6% season to date 1306 54.5% 0.2614 0.7210 51.5% multi season 3930 62.7% 0.2260 0.6423 55.2%
Playoff Game Performance
Start End Games Accuracy Brier Log Loss
2024-04-20 2025-06-17 174 63.2% 0.2304 0.6530
Daily Performance
Show:
Last 14 Days
Last 30 Days
Full Season
Date
Games
Accuracy
Brier
Log Loss
Prediction Recap Highlights
Definition: High confidence means the model assigned the predicted team
a win probability well above 50%. Edges that hit are the highest-confidence correct calls,
misses are the highest-confidence incorrect calls, and surprise results show the largest
absolute gap between win probability and the actual outcome.
Show:
Last 30 Days
Last 90 Days
Full Season
Biggest Model Edges That Hit
Date
Matchup
Win Prob
Outcome
Biggest Misses (High Confidence)
Date
Matchup
Win Prob
Outcome
Surprise Results
Date
Matchup
Surprise
Outcome
Calibration (Win Prob Deciles) — Last 30
Bin Count Mean Pred Observed
2 2 25.2% 0.0% 3 22 37.3% 31.8% 4 66 45.5% 48.5% 5 72 54.6% 44.4% 6 49 65.2% 59.2% 7 26 73.4% 76.9% 8 7 83.3% 85.7%
Calibration (Win Prob Deciles) — Season To Date
Bin Count Mean Pred Observed
2 10 26.8% 30.0% 3 104 36.8% 42.3% 4 311 45.7% 44.7% 5 447 54.7% 49.7% 6 287 64.6% 59.6% 7 131 73.5% 69.5% 8 16 83.5% 75.0%
Calibration (Win Prob Deciles) — Multi Season
Bin Count Mean Pred Observed
1 1 16.6% 0.0% 2 57 26.7% 7.0% 3 368 36.3% 31.5% 4 969 45.5% 44.0% 5 1270 54.8% 54.3% 6 833 64.4% 66.0% 7 366 73.7% 77.6% 8 66 83.2% 92.4%
Calibration (Over 5.5) — Last 30
Bin Count Mean Pred Observed
1 1 16.8% 0.0% 2 7 26.5% 71.4% 3 48 35.4% 60.4% 4 32 45.1% 59.4% 5 78 55.1% 55.1% 6 40 64.0% 57.5% 7 28 75.4% 57.1% 8 8 84.3% 50.0% 9 2 91.3% 100.0%
Calibration (Over 5.5) — Season To Date
Bin Count Mean Pred Observed
1 10 18.7% 70.0% 2 48 25.9% 47.9% 3 187 35.6% 52.9% 4 145 44.5% 57.9% 5 445 55.4% 58.4% 6 202 63.8% 55.0% 7 204 74.5% 59.8% 8 56 84.8% 60.7% 9 9 94.3% 88.9%
Calibration (Over 5.5) — Multi Season
Bin Count Mean Pred Observed
1 34 17.8% 26.5% 2 157 26.0% 24.2% 3 572 35.6% 38.5% 4 464 44.4% 46.8% 5 1251 55.3% 56.4% 6 622 63.6% 62.4% 7 614 74.5% 72.8% 8 176 84.8% 80.1% 9 40 93.5% 97.5%
Team Calibration (Home, Top 15 by Volume)
Team Count Mean Pred Observed Bias
FLA 146 60.9% 62.3% -1.4% EDM 144 61.1% 61.8% -0.7% DAL 143 55.9% 63.6% -7.8% CAR 137 66.2% 70.1% -3.9% TOR 133 53.5% 54.1% -0.6% WPG 132 56.4% 62.9% -6.5% VGK 132 58.5% 59.8% -1.4% WSH 131 54.8% 58.8% -4.0% NYR 131 53.8% 51.9% +1.9% VAN 130 53.0% 43.1% +9.9% BOS 130 51.1% 57.7% -6.6% COL 130 63.2% 66.2% -2.9% TBL 128 58.3% 63.3% -5.0% LAK 128 59.2% 54.7% +4.5% STL 126 50.5% 57.1% -6.6%
Team Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 FLA EDM DAL CAR TOR WPG VGK WSH NYR VAN BOS COL TBL LAK STL
Starter Calibration (Home)
Window Starter Status Games Accuracy Brier Log Loss
last 30 Starter 244 56.6% 0.2360 0.6631 season to date Starter 1306 56.3% 0.2440 0.6808 multi season Unknown 18 72.2% 0.2276 0.6475 multi season Starter 3912 61.9% 0.2270 0.6446
Cross-Validation (Expanding Window)
Summary: 3 folds |
Brier: 0.2495 |
Log Loss: 0.6925 |
RMSE Total: 2.394
Show fold details
Fold Train N Val N Brier Log Loss RMSE
Fold 1 701 2,097 0.2542 0.7024 2.432 Fold 2 1,399 1,399 0.2491 0.6915 2.376 Fold 3 2,103 695 0.2453 0.6838 2.373
In-Game Checkpoints — Last 30
Checkpoint Games Accuracy Brier Log Loss
end_p1 244 63.1% 0.2082 0.5982 end_p2 244 82.8% 0.1140 0.3590 ot_start 53 62.3% 0.1999 0.5650 p3_10 244 86.9% 0.0830 0.2619 p3_5 244 88.5% 0.0773 0.2435 pregame 244 54.9% 0.2422 0.6772
In-Game Checkpoints — Season To Date
Checkpoint Games Accuracy Brier Log Loss
end_p1 1306 66.5% 0.2080 0.6019 end_p2 1306 79.3% 0.1423 0.4362 ot_start 326 63.8% 0.1924 0.5496 p3_10 1306 84.0% 0.1027 0.3231 p3_5 1306 85.8% 0.0879 0.2758 pregame 1306 53.4% 0.2478 0.6888
In-Game Calibration — Pregame (Last 30 Days)
Bin Count Mean Pred Observed
3 3 39.8% 0.0% 4 51 46.4% 45.1% 5 152 53.8% 50.0% 6 36 62.9% 69.4% 7 2 70.8% 100.0%
In-Game Calibration — End P2 (Last 30 Days)
Bin Count Mean Pred Observed
0 40 4.5% 0.0% 1 19 14.2% 0.0% 2 19 24.2% 26.3% 3 29 34.4% 31.0% 4 11 46.1% 63.6% 5 18 54.6% 44.4% 6 22 65.9% 72.7% 7 17 76.5% 82.4% 8 25 83.8% 92.0% 9 44 93.8% 100.0%
In-Game Calibration — P3 10 (Last 30 Days)
Bin Count Mean Pred Observed
0 55 3.1% 1.8% 1 29 15.0% 6.9% 2 8 23.1% 25.0% 3 3 38.9% 33.3% 4 11 44.8% 36.4% 5 27 54.2% 51.9% 6 10 62.1% 40.0% 7 1 77.6% 100.0% 8 26 86.6% 88.5% 9 74 97.2% 100.0%
xG Holdout — Contextual Train: 2023-10-10 – 2025-11-16 | Test: 2025-11-17 – 2026-04-13
Games (test): 991 | Shots (test): 85192 | ROC AUC: 0.786 | Log Loss: 0.2216 | Brier: 0.0601
xG Splits — Contextual Strength State
Split Shots Goal Rate AUC Log Loss Brier
Even 68274 6.2% 0.780 0.2022 0.0540 PP 14330 10.6% 0.729 0.2942 0.0810 PK 1823 7.1% 0.822 0.2143 0.0605 EmptyNet 765 50.7% 0.743 0.6100 0.2120
xG Splits — Contextual Shot Type
Split Shots Goal Rate AUC Log Loss Brier
wrist 36458 7.1% 0.814 0.2045 0.0554 snap 21423 8.3% 0.776 0.2461 0.0693 slap 10084 5.0% 0.712 0.1852 0.0461 tip-in 8203 6.6% 0.667 0.2332 0.0602 backhand 6300 9.5% 0.827 0.2447 0.0698 deflected 1372 11.7% 0.713 0.3243 0.0944 wrap-around 542 4.4% 0.642 0.1760 0.0414 bat 436 8.9% 0.840 0.2307 0.0631 poke 269 9.7% 0.738 0.2692 0.0704 between-legs 56 12.5% 0.803 0.3043 0.0891 nan 42 61.9% 0.728 2.3192 0.2680 cradle 7 14.3% 1.000 0.2044 0.0574
xG Holdout — Neutral Train: 2023-10-10 – 2025-11-16 | Test: 2025-11-17 – 2026-04-13
Games (test): 991 | Shots (test): 85192 | ROC AUC: 0.783 | Log Loss: 0.2247 | Brier: 0.0613
xG Splits — Neutral Strength State
Split Shots Goal Rate AUC Log Loss Brier
Even 68274 6.2% 0.781 0.2021 0.0540 PP 14330 10.6% 0.705 0.3133 0.0885 PK 1823 7.1% 0.822 0.2146 0.0605 EmptyNet 765 50.7% 0.744 0.6043 0.2089
xG Splits — Neutral Shot Type
Split Shots Goal Rate AUC Log Loss Brier
wrist 36458 7.1% 0.809 0.2083 0.0569 snap 21423 8.3% 0.774 0.2486 0.0703 slap 10084 5.0% 0.713 0.1850 0.0461 tip-in 8203 6.6% 0.669 0.2331 0.0602 backhand 6300 9.5% 0.812 0.2558 0.0741 deflected 1372 11.7% 0.712 0.3249 0.0946 wrap-around 542 4.4% 0.655 0.1758 0.0415 bat 436 8.9% 0.835 0.2328 0.0639 poke 269 9.7% 0.747 0.2677 0.0705 between-legs 56 12.5% 0.777 0.3116 0.0909 nan 42 61.9% 0.713 2.3896 0.2752 cradle 7 14.3% 1.000 0.2025 0.0577
Monthly Performance Trends
Track how model performance varies month-to-month across the season.
Month Games Accuracy Brier Log Loss
2023-10 140 62.9% 0.2195 0.6268 2023-11 213 66.2% 0.2152 0.6191 2023-12 219 66.7% 0.2185 0.6270 2024-01 208 62.5% 0.2167 0.6219 2024-02 172 67.4% 0.2064 0.6016 2024-03 228 71.1% 0.2035 0.5949 2024-04 132 62.9% 0.2265 0.6434 2024-10 166 70.5% 0.2040 0.5965 2024-11 220 66.8% 0.2146 0.6182 2024-12 214 69.2% 0.2056 0.6007 2025-01 224 60.7% 0.2275 0.6461 2025-02 122 55.7% 0.2473 0.6862 2025-03 234 62.0% 0.2279 0.6475 2025-04 132 53.8% 0.2466 0.6861 2025-10 180 57.2% 0.2494 0.6940 2025-11 225 53.8% 0.2439 0.6795 2025-12 226 54.0% 0.2499 0.6936 2026-01 240 56.7% 0.2439 0.6809 2026-02 74 67.6% 0.2234 0.6378 2026-03 242 54.5% 0.2493 0.6913 2026-04 119 59.7% 0.2266 0.6442
Playoff Model Performance
Game-level and series-level accuracy across playoff rounds.
Playoff Games
Round Games Accuracy Brier Log Loss
All Rounds 86 52.3% 0.2506 0.6945 Round 1 47 44.7% 0.2525 0.6982 Round 2 23 56.5% 0.2530 0.6993 Round 3 10 80.0% 0.2340 0.6613 Round 4 6 50.0% 0.2547 0.7026
Playoff Series
Round Series Accuracy Brier Log Loss
All Rounds 15 66.7% 0.2312 0.6553 Round 1 8 62.5% 0.2444 0.6827 Round 2 4 50.0% 0.2215 0.6346 Round 3 2 100.0% 0.1999 0.5913 Round 4 1 100.0% 0.2269 0.6470
Playoff Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 3 4 5