NHL Prediction Model Performance & Calibration
How to Read These Metrics
Accuracy
The percentage of games where the predicted winner (team with >50% win
probability) actually won. Simple but incomplete — it ignores how
confident the model was.
Brier Score
Measures the mean squared error of probability predictions (0–1 scale,
lower is better). A coin-flip baseline yields 0.25; our model targets
values below 0.24. Brier score rewards well-calibrated confidence levels,
not just picking the right side.
Log Loss
A logarithmic scoring rule that heavily penalises confident wrong
predictions. Assigning 90% to a team that loses costs far more than
assigning 55%. This keeps the model honest about uncertainty.
Calibration
Shows whether stated probabilities match real outcomes. In a well-calibrated
model, games given a 70% win probability should be won about 70% of the
time. The calibration tables below group predictions into decile bins so
you can verify this directly.
RMSE (Root Mean Squared Error)
For goal total predictions, RMSE measures how far predicted totals are
from actual totals on average, in goal units.
Why calibration matters most: For probabilistic predictions,
calibration is more important than raw accuracy. A model that says "55%"
every game can be 55% accurate but useless for decision-making. A
well-calibrated model tells you how much to trust each prediction.
Learn more in our analytics guide
and methodology .
Game Predictions (Multi-Window)
Window Start End Games Accuracy Brier Log Loss Avg Winner Prob RMSE Total
last 30 2026-01-27 2026-02-26 100 51.0% 0.2576 0.7092 50.5% 2.906 season to date 2025-10-01 2026-02-26 928 51.8% 0.2527 0.6988 50.6% 2.430 multi season 2023-10-10 2026-02-26 3552 55.5% 0.2407 0.6740 51.8% 2.211
Totals (Over 5.5)
Window Games Accuracy Brier Log Loss Avg Outcome Prob
last 30 100 55.0% 0.2361 0.6640 52.6% season to date 928 56.1% 0.2510 0.6956 51.1% multi season 3552 60.7% 0.2312 0.6542 53.0%
Playoff Game Performance
Start End Games Accuracy Brier Log Loss
2024-04-20 2025-06-17 174 55.2% 0.2378 0.6682
Daily Performance
Show:
Last 14 Days
Last 30 Days
Full Season
Date
Games
Accuracy
Brier
Log Loss
Prediction Recap Highlights
Definition: High confidence means the model assigned the predicted team
a win probability well above 50%. Edges that hit are the highest-confidence correct calls,
misses are the highest-confidence incorrect calls, and surprise results show the largest
absolute gap between win probability and the actual outcome.
Show:
Last 30 Days
Last 90 Days
Full Season
Biggest Model Edges That Hit
Date
Matchup
Win Prob
Outcome
Biggest Misses (High Confidence)
Date
Matchup
Win Prob
Outcome
Surprise Results
Date
Matchup
Surprise
Outcome
Calibration (Win Prob Deciles) — Last 30
Bin Count Mean Pred Observed
5 45 57.1% 51.1% 6 55 63.5% 50.9%
Calibration (Win Prob Deciles) — Season To Date
Bin Count Mean Pred Observed
4 8 48.5% 37.5% 5 590 55.6% 49.3% 6 330 62.7% 56.1%
Calibration (Win Prob Deciles) — Multi Season
Bin Count Mean Pred Observed
4 57 48.9% 10.5% 5 2257 55.8% 47.8% 6 1233 62.7% 68.0% 7 5 70.5% 100.0%
Calibration (Over 5.5) — Last 30
Bin Count Mean Pred Observed
4 7 46.9% 57.1% 5 49 55.8% 42.9% 6 41 64.3% 68.3% 7 3 73.6% 100.0%
Calibration (Over 5.5) — Season To Date
Bin Count Mean Pred Observed
3 8 35.2% 62.5% 4 85 46.8% 51.8% 5 445 55.8% 58.9% 6 355 63.6% 54.6% 7 34 72.5% 58.8% 8 1 89.6% 100.0%
Calibration (Over 5.5) — Multi Season
Bin Count Mean Pred Observed
1 1 18.4% 0.0% 2 3 26.1% 0.0% 3 34 36.1% 20.6% 4 356 46.6% 29.2% 5 1822 55.7% 50.5% 6 1203 63.3% 69.4% 7 122 72.8% 86.9% 8 10 84.6% 100.0% 9 1 91.0% 100.0%
Team Calibration (Home, Top 15 by Volume)
Team Count Mean Pred Observed Bias
FLA 137 59.7% 60.6% -0.9% EDM 133 60.0% 62.4% -2.4% DAL 129 59.3% 63.6% -4.3% CAR 128 61.2% 68.8% -7.6% TOR 123 58.4% 56.9% +1.4% WSH 121 57.8% 57.9% -0.1% WPG 120 58.5% 63.3% -4.9% VGK 120 58.5% 60.8% -2.3% BOS 120 57.5% 55.8% +1.7% COL 119 60.3% 68.9% -8.6% VAN 117 57.6% 45.3% +12.3% NSH 116 58.4% 50.0% +8.4% NYR 116 58.3% 51.7% +6.6% STL 116 56.8% 56.0% +0.7% MIN 115 57.6% 52.2% +5.4%
Team Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 FLA EDM DAL CAR TOR WSH WPG VGK BOS COL VAN NSH NYR STL MIN
Starter Calibration (Home)
Window Starter Status Games Accuracy Brier Log Loss
last 30 Starter 100 51.0% 0.2576 0.7092 season to date Starter 928 51.8% 0.2527 0.6988 multi season Unknown 18 66.7% 0.2500 0.6931 multi season Starter 3534 55.5% 0.2407 0.6739
Cross-Validation (Expanding Window)
Summary: 3 folds |
Brier: 0.2517 |
Log Loss: 0.6971 |
RMSE Total: 2.410
Show fold details
Fold Train N Val N Brier Log Loss RMSE
Fold 1 701 2,097 0.2542 0.7025 2.449 Fold 2 1,399 1,399 0.2540 0.7020 2.396 Fold 3 2,103 695 0.2468 0.6868 2.386
In-Game Checkpoints — Last 30
Checkpoint Games Accuracy Brier Log Loss
end_p1 92 70.7% 0.1990 0.5870 end_p2 92 77.2% 0.1663 0.5179 ot_start 21 71.4% 0.1598 0.4703 p3_10 92 85.9% 0.1016 0.3314 p3_5 92 87.0% 0.0802 0.2697 pregame 92 58.7% 0.2365 0.6656
In-Game Checkpoints — Season To Date
Checkpoint Games Accuracy Brier Log Loss
end_p1 916 66.7% 0.2097 0.6050 end_p2 916 77.7% 0.1533 0.4662 ot_start 234 63.2% 0.2008 0.5694 p3_10 916 84.0% 0.1048 0.3303 p3_5 916 85.4% 0.0878 0.2766 pregame 916 52.3% 0.2514 0.6960
In-Game Calibration — Pregame (Last 30 Days)
Bin Count Mean Pred Observed
5 59 57.8% 50.8% 6 33 61.7% 72.7%
In-Game Calibration — End P2 (Last 30 Days)
Bin Count Mean Pred Observed
0 12 5.3% 33.3% 1 3 17.2% 66.7% 2 9 25.2% 0.0% 3 11 33.4% 36.4% 4 2 41.6% 0.0% 5 7 56.7% 57.1% 6 6 64.9% 50.0% 7 13 75.8% 69.2% 8 9 85.3% 88.9% 9 20 94.2% 100.0%
In-Game Calibration — P3 10 (Last 30 Days)
Bin Count Mean Pred Observed
0 17 2.9% 11.8% 1 11 15.4% 18.2% 2 8 23.1% 12.5% 4 9 45.0% 55.6% 5 9 55.2% 66.7% 6 2 63.2% 100.0% 8 7 87.3% 100.0% 9 29 96.8% 100.0%
xG Model Holdout
Train: 2023-10-10 – 2025-10-28 |
Test: 2025-10-28 – 2026-02-05
Shots (test): 64343 |
ROC AUC: 0.763 |
Log Loss: 0.2287 |
Brier: 0.0623
xG Splits — Strength State
Split Shots Goal Rate AUC Log Loss Brier
Even 51324 6.2% 0.763 0.2059 0.0548 PP 11119 10.4% 0.676 0.3162 0.0896 PK 1365 6.9% 0.773 0.2181 0.0605 EmptyNet 535 52.5% 0.703 0.6293 0.2204
xG Splits — Shot Type
Split Shots Goal Rate AUC Log Loss Brier
wrist 27938 6.8% 0.787 0.2094 0.0565 snap 16061 8.3% 0.756 0.2531 0.0711 slap 7487 5.2% 0.701 0.1935 0.0485 tip-in 6046 6.9% 0.673 0.2375 0.0618 backhand 4739 10.0% 0.782 0.2767 0.0806 deflected 1055 10.7% 0.640 0.3243 0.0922 wrap-around 440 4.5% 0.628 0.1813 0.0429 bat 311 9.6% 0.739 0.2900 0.0814 poke 195 12.8% 0.621 0.3662 0.1078 between-legs 44 13.6% 0.675 0.3678 0.1071 nan 22 95.5% 0.405 0.2556 0.0636 cradle 5 0.0% — — 0.0180
Monthly Performance Trends
Track how model performance varies month-to-month across the season.
Month Games Accuracy Brier Log Loss
2023-10 140 58.6% 0.2467 0.6866 2023-11 213 54.0% 0.2404 0.6735 2023-12 219 58.9% 0.2328 0.6580 2024-01 208 53.4% 0.2363 0.6644 2024-02 172 51.2% 0.2384 0.6690 2024-03 228 59.2% 0.2264 0.6445 2024-04 132 58.3% 0.2326 0.6573 2024-10 166 58.4% 0.2436 0.6803 2024-11 220 54.5% 0.2402 0.6732 2024-12 214 57.9% 0.2279 0.6476 2025-01 224 56.7% 0.2347 0.6618 2025-02 122 53.3% 0.2497 0.6923 2025-03 234 60.3% 0.2352 0.6630 2025-04 132 60.6% 0.2356 0.6638 2025-10 180 53.9% 0.2485 0.6901 2025-11 225 52.0% 0.2522 0.6976 2025-12 226 50.4% 0.2547 0.7027 2026-01 240 52.1% 0.2528 0.6988 2026-02 57 49.1% 0.2603 0.7149
Playoff Model Performance
Game-level and series-level accuracy across playoff rounds.
Playoff Games
Round Games Accuracy Brier Log Loss
All Rounds 86 58.1% 0.2348 0.6624 Round 1 47 63.8% 0.2282 0.6492 Round 2 23 60.9% 0.2333 0.6594 Round 3 10 30.0% 0.2627 0.7186 Round 4 6 50.0% 0.2454 0.6839
Playoff Series
Round Series Accuracy Brier Log Loss
All Rounds 15 53.3% 0.2386 0.6702 Round 1 8 50.0% 0.2390 0.6709 Round 2 4 75.0% 0.2265 0.6459 Round 3 2 50.0% 0.2486 0.6904 Round 4 1 0.0% 0.2643 0.7218
Playoff Calibration (Pred vs Observed) Mean Pred Observed
0.0 0.5 1.0 4 5 6