NHL Prediction Model Performance & Calibration
How to Read These Metrics
Accuracy
The percentage of games where the predicted winner (team with >50% win
probability) actually won. Simple but incomplete — it ignores how
confident the model was.
Brier Score
Measures the mean squared error of probability predictions (0–1 scale,
lower is better). A coin-flip baseline yields 0.25; our model targets
values below 0.24. Brier score rewards well-calibrated confidence levels,
not just picking the right side.
Log Loss
A logarithmic scoring rule that heavily penalises confident wrong
predictions. Assigning 90% to a team that loses costs far more than
assigning 55%. This keeps the model honest about uncertainty.
Calibration
Shows whether stated probabilities match real outcomes. In a well-calibrated
model, games given a 70% win probability should be won about 70% of the
time. The calibration tables below group predictions into decile bins so
you can verify this directly.
RMSE (Root Mean Squared Error)
For goal total predictions, RMSE measures how far predicted totals are
from actual totals on average, in goal units.
Why calibration matters most: For probabilistic predictions,
calibration is more important than raw accuracy. A model that says "55%"
every game can be 55% accurate but useless for decision-making. A
well-calibrated model tells you how much to trust each prediction.
Learn more in our analytics guide
and methodology .
Game Predictions (Multi-Window)
Window Start End Games Accuracy Brier Log Loss Avg Winner Prob RMSE Total
last 30 2026-03-17 2026-04-16 245 55.9% 0.2362 0.6632 53.2% 2.639 season to date 2025-10-01 2026-04-16 1312 56.2% 0.2439 0.6807 52.2% 2.588 multi season 2023-10-10 2026-04-16 3936 61.8% 0.2271 0.6449 53.9% 2.202
Totals (Over 5.5)
Window Games Accuracy Brier Log Loss Avg Outcome Prob
last 30 245 50.6% 0.2693 0.7363 50.6% season to date 1312 54.4% 0.2616 0.7214 51.5% multi season 3936 62.9% 0.2257 0.6417 55.2%
Playoff Game Performance
Start End Games Accuracy Brier Log Loss
2024-04-20 2026-05-24 263 63.1% 0.2396 0.6732
Daily Performance
Show:
Last 14 Days
Last 30 Days
Full Season
Date
Games
Accuracy
Brier
Log Loss
Prediction Recap Highlights
Definition: High confidence means the model assigned the predicted team
a win probability well above 50%. Edges that hit are the highest-confidence correct calls,
misses are the highest-confidence incorrect calls, and surprise results show the largest
absolute gap between win probability and the actual outcome.
Show:
Last 30 Days
Last 90 Days
Full Season
Biggest Model Edges That Hit
Date
Matchup
Win Prob
Outcome
Biggest Misses (High Confidence)
Date
Matchup
Win Prob
Outcome
Surprise Results
Date
Matchup
Surprise
Outcome
Calibration (Win Prob Deciles) — Last 30
Bin Count Mean Pred Observed
2 2 25.2% 0.0% 3 22 37.3% 31.8% 4 66 45.4% 50.0% 5 71 54.5% 43.7% 6 49 65.2% 57.1% 7 27 73.6% 77.8% 8 8 83.5% 87.5%
Calibration (Win Prob Deciles) — Season To Date
Bin Count Mean Pred Observed
2 10 26.8% 30.0% 3 104 36.8% 42.3% 4 313 45.7% 44.7% 5 447 54.7% 49.7% 6 289 64.6% 59.2% 7 132 73.5% 69.7% 8 17 83.6% 76.5%
Calibration (Win Prob Deciles) — Multi Season
Bin Count Mean Pred Observed
1 1 18.3% 0.0% 2 56 26.6% 7.1% 3 368 36.3% 31.8% 4 969 45.5% 44.0% 5 1272 54.9% 54.0% 6 835 64.4% 66.0% 7 368 73.7% 77.7% 8 67 83.3% 92.5%
Calibration (Over 5.5) — Last 30
Bin Count Mean Pred Observed
1 2 16.8% 50.0% 2 7 26.5% 71.4% 3 49 35.4% 59.2% 4 31 44.9% 58.1% 5 78 55.1% 55.1% 6 39 63.9% 56.4% 7 29 75.5% 58.6% 8 8 84.3% 50.0% 9 2 91.3% 100.0%
Calibration (Over 5.5) — Season To Date
Bin Count Mean Pred Observed
1 11 18.5% 72.7% 2 48 25.9% 47.9% 3 189 35.6% 52.9% 4 146 44.5% 58.2% 5 445 55.4% 58.2% 6 203 63.8% 55.2% 7 205 74.5% 60.0% 8 56 84.8% 60.7% 9 9 94.3% 88.9%
Calibration (Over 5.5) — Multi Season
Bin Count Mean Pred Observed
1 34 18.2% 29.4% 2 157 26.0% 23.6% 3 584 35.7% 38.2% 4 459 44.4% 46.4% 5 1254 55.3% 56.5% 6 617 63.6% 62.7% 7 614 74.5% 73.1% 8 178 84.9% 80.3% 9 39 93.5% 97.4%
Team Calibration (Home, Top 15 by Volume)
Team Count Mean Pred Observed Bias
EDM 149 61.0% 61.7% -0.7% DAL 147 56.0% 62.6% -6.6% CAR 147 66.4% 68.7% -2.3% FLA 146 60.9% 62.3% -1.4% COL 141 63.7% 65.2% -1.5% VGK 141 58.5% 59.6% -1.1% WPG 133 56.4% 62.4% -6.0% BOS 133 50.9% 56.4% -5.5% TOR 133 53.5% 54.1% -0.6% TBL 132 58.4% 62.1% -3.7% MIN 132 54.5% 52.3% +2.2% LAK 131 58.9% 53.4% +5.4% WSH 131 54.8% 58.8% -4.0% NYR 131 53.7% 51.9% +1.8% MTL 131 48.7% 50.4% -1.7%
Team Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 EDM DAL CAR FLA COL VGK WPG BOS TOR TBL MIN LAK WSH NYR MTL
Starter Calibration (Home)
Window Starter Status Games Accuracy Brier Log Loss
last 30 Starter 245 55.9% 0.2362 0.6632 season to date Starter 1312 56.2% 0.2439 0.6807 multi season Unknown 18 72.2% 0.2279 0.6480 multi season Starter 3918 61.8% 0.2271 0.6449
Cross-Validation (Expanding Window)
Summary: 3 folds |
Brier: 0.2495 |
Log Loss: 0.6925 |
RMSE Total: 2.394
Show fold details
Fold Train N Val N Brier Log Loss RMSE
Fold 1 701 2,097 0.2542 0.7024 2.432 Fold 2 1,399 1,399 0.2491 0.6915 2.376 Fold 3 2,103 695 0.2453 0.6838 2.373
In-Game Checkpoints — Last 30
Checkpoint Games Accuracy Brier Log Loss
end_p1 51 68.6% 0.1947 0.5889 end_p2 51 78.4% 0.1677 0.5093 ot_start 14 71.4% 0.2148 0.6131 p3_10 51 78.4% 0.1114 0.3390 p3_5 51 84.3% 0.0851 0.2730 pregame 51 51.0% 0.2527 0.6987
In-Game Checkpoints — Season To Date
Checkpoint Games Accuracy Brier Log Loss
end_p1 1384 67.0% 0.2065 0.5979 end_p2 1384 78.7% 0.1450 0.4423 ot_start 345 64.1% 0.1942 0.5547 p3_10 1384 83.5% 0.1027 0.3212 p3_5 1384 85.5% 0.0883 0.2767 pregame 1384 53.4% 0.2477 0.6886
In-Game Calibration — Pregame (Last 30 Days)
Bin Count Mean Pred Observed
4 6 48.4% 33.3% 5 43 54.9% 48.8% 6 2 64.5% 50.0%
In-Game Calibration — End P2 (Last 30 Days)
Bin Count Mean Pred Observed
0 6 4.5% 16.7% 1 6 14.4% 16.7% 2 4 26.5% 25.0% 3 7 34.9% 28.6% 4 3 47.9% 0.0% 5 3 54.5% 66.7% 6 7 64.0% 57.1% 7 5 75.5% 60.0% 8 4 84.4% 100.0% 9 6 94.3% 100.0%
In-Game Calibration — P3 10 (Last 30 Days)
Bin Count Mean Pred Observed
0 15 2.9% 6.7% 1 7 15.4% 14.3% 4 2 43.7% 100.0% 5 7 53.9% 28.6% 6 3 62.6% 66.7% 8 4 84.8% 75.0% 9 13 97.3% 100.0%
xG Holdout — Contextual Train: 2023-10-10 – 2025-12-11 | Test: 2025-12-12 – 2026-05-14
Games (test): 885 | Shots (test): 76090 | ROC AUC: 0.785 | Log Loss: 0.2214 | Brier: 0.0601
xG Splits — Contextual Strength State
Split Shots Goal Rate AUC Log Loss Brier
Even 60771 6.2% 0.778 0.2023 0.0539 PP 12929 10.6% 0.727 0.2915 0.0809 PK 1690 7.2% 0.841 0.2112 0.0602 EmptyNet 700 51.1% 0.743 0.6079 0.2114
xG Splits — Contextual Shot Type
Split Shots Goal Rate AUC Log Loss Brier
wrist 32222 7.2% 0.812 0.2072 0.0564 snap 19339 8.4% 0.775 0.2483 0.0701 slap 9096 4.8% 0.720 0.1796 0.0446 tip-in 7349 6.3% 0.663 0.2264 0.0579 backhand 5643 8.9% 0.823 0.2357 0.0658 deflected 1234 11.8% 0.697 0.3301 0.0958 wrap-around 461 5.2% 0.738 0.1821 0.0455 bat 398 8.5% 0.789 0.2426 0.0661 poke 248 8.9% 0.658 0.2861 0.0747 between-legs 52 9.6% 0.783 0.2554 0.0678 nan 41 61.0% 0.732 1.6486 0.2915 cradle 7 14.3% 1.000 0.2286 0.0656
xG Holdout — Neutral Train: 2023-10-10 – 2025-11-16 | Test: 2025-11-17 – 2026-04-13
Games (test): 991 | Shots (test): 85192 | ROC AUC: 0.783 | Log Loss: 0.2247 | Brier: 0.0613
xG Splits — Neutral Strength State
Split Shots Goal Rate AUC Log Loss Brier
Even 68274 6.2% 0.781 0.2021 0.0540 PP 14330 10.6% 0.705 0.3133 0.0885 PK 1823 7.1% 0.822 0.2146 0.0605 EmptyNet 765 50.7% 0.744 0.6043 0.2089
xG Splits — Neutral Shot Type
Split Shots Goal Rate AUC Log Loss Brier
wrist 36458 7.1% 0.809 0.2083 0.0569 snap 21423 8.3% 0.774 0.2486 0.0703 slap 10084 5.0% 0.713 0.1850 0.0461 tip-in 8203 6.6% 0.669 0.2331 0.0602 backhand 6300 9.5% 0.812 0.2558 0.0741 deflected 1372 11.7% 0.712 0.3249 0.0946 wrap-around 542 4.4% 0.655 0.1758 0.0415 bat 436 8.9% 0.835 0.2328 0.0639 poke 269 9.7% 0.747 0.2677 0.0705 between-legs 56 12.5% 0.777 0.3116 0.0909 nan 42 61.9% 0.713 2.3896 0.2752 cradle 7 14.3% 1.000 0.2025 0.0577
Monthly Performance Trends
Track how model performance varies month-to-month across the season.
Month Games Accuracy Brier Log Loss
2023-10 140 62.1% 0.2210 0.6297 2023-11 213 66.2% 0.2159 0.6207 2023-12 219 66.7% 0.2185 0.6271 2024-01 208 62.5% 0.2168 0.6222 2024-02 172 67.4% 0.2064 0.6014 2024-03 228 71.1% 0.2036 0.5951 2024-04 132 62.1% 0.2266 0.6436 2024-10 166 69.9% 0.2040 0.5965 2024-11 220 66.8% 0.2145 0.6181 2024-12 214 69.2% 0.2053 0.6002 2025-01 224 60.7% 0.2275 0.6459 2025-02 122 55.7% 0.2473 0.6864 2025-03 234 62.0% 0.2280 0.6475 2025-04 132 53.8% 0.2467 0.6861 2025-10 180 57.2% 0.2494 0.6939 2025-11 225 53.8% 0.2439 0.6795 2025-12 226 54.0% 0.2499 0.6936 2026-01 240 56.7% 0.2439 0.6809 2026-02 74 67.6% 0.2234 0.6378 2026-03 242 54.5% 0.2493 0.6913 2026-04 125 59.2% 0.2273 0.6450
Playoff Model Performance
Game-level and series-level accuracy across playoff rounds.
Playoff Games
Round Games Accuracy Brier Log Loss
All Rounds 72 58.3% 0.2441 0.6814 Round 1 45 60.0% 0.2394 0.6718 Round 2 22 59.1% 0.2423 0.6777 Round 3 5 40.0% 0.2943 0.7840
Playoff Series
Round Series Accuracy Brier Log Loss
All Rounds 12 66.7% 0.2210 0.6346 Round 1 8 62.5% 0.2134 0.6189 Round 2 4 75.0% 0.2361 0.6660
Playoff Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 4 5 6