NHL Prediction Accuracy | Model Performance Explained
This page explains how to interpret our model's prediction accuracy metrics and provides transparent performance tracking across the 2025–26 NHL season.
Current Season Performance
OOS Accuracy = true holdout result (train 2023–25, test 2025–26 season, 1,056 games) — the honest generalization estimate. In-sample figure includes training data and overstates performance. Brier = probabilistic score (lower is better, 0.25 = random). RMSE Total = root-mean-square error on predicted total goals.
What the Metrics Mean
Accuracy
The fraction of games where the model correctly predicted the winner (the team with win probability >50%). A naive 50/50 coin flip gives ~50%. Our model typically achieves 54–63%, depending on the game window. Hockey is highly random; research suggests ~58% may be near the practical ceiling for single-game NHL predictions.
Brier Score
The Brier score measures probabilistic accuracy: it is the mean squared difference between the predicted probability and the binary outcome (1 = home win, 0 = home loss). A random model predicting 50% every game scores 0.25. Lower Brier scores are better. Our model targets <0.235, indicating meaningful probability calibration beyond chance.
Calibration
Calibration measures whether predicted probabilities match observed frequencies. If the model says 65% in 100 games, those teams should win about 65 of them. Our isotonic calibration post-processing ensures predicted probabilities are honest, not just directionally correct. See the Performance page for calibration curves.
RMSE (Totals)
Root-mean-square error on predicted total goals (over/under). A perfect model would score 0; random guessing around the league average (~5.8 goals) scores ~2.4. Our model typically achieves 2.1–2.4.
Cross-Validation Results (3 folds)
Walk-forward cross-validation (no future data leakage). Avg Brier: 0.2519 | Avg Log-loss: 0.6975 | Avg RMSE (Totals): 2.402
| Fold | Brier | Log-loss | RMSE Total | Train N | Val N |
|---|---|---|---|---|---|
| 1 | 0.2542 | 0.7026 | 2.443 | 701 | 2,097 |
| 2 | 0.2515 | 0.6968 | 2.385 | 1,399 | 1,399 |
| 3 | 0.2499 | 0.6931 | 2.376 | 2,103 | 695 |
Monthly Accuracy Trend
Win/loss prediction accuracy by calendar month. Larger samples = more stable estimates.
Monthly Breakdown
| Month | Games | Accuracy | Brier Score |
|---|---|---|---|
| 2023-10 | 140 | 58.6% | 0.2483 |
| 2023-11 | 213 | 55.4% | 0.2409 |
| 2023-12 | 219 | 57.5% | 0.2352 |
| 2024-01 | 208 | 55.8% | 0.2342 |
| 2024-02 | 172 | 54.1% | 0.2339 |
| 2024-03 | 228 | 60.1% | 0.2246 |
| 2024-04 | 132 | 59.8% | 0.2365 |
| 2024-10 | 166 | 58.4% | 0.2438 |
| 2024-11 | 220 | 56.8% | 0.2382 |
| 2024-12 | 214 | 58.9% | 0.2268 |
| 2025-01 | 224 | 59.4% | 0.2341 |
| 2025-02 | 122 | 52.5% | 0.2484 |
| 2025-03 | 234 | 61.1% | 0.2344 |
| 2025-04 | 132 | 60.6% | 0.2345 |
| 2025-10 | 180 | 52.8% | 0.2488 |
| 2025-11 | 225 | 51.6% | 0.2501 |
| 2025-12 | 226 | 51.8% | 0.2561 |
| 2026-01 | 240 | 51.7% | 0.2510 |
| 2026-02 | 74 | 62.2% | 0.2238 |
| 2026-03 | 117 | 49.6% | 0.2601 |
True Out-of-Sample Accuracy
Cross-validation on historical data can still overestimate real-world performance. Our strictest test: train on 2023–24 and 2024–25 seasons, test on the 2025–26 season (entirely unseen data at training time).
This is the most honest estimate of how the model performs on genuinely new games. The gap between in-sample accuracy (63%) and holdout (54.6%) reflects variance and the inherent difficulty of hockey prediction.
The 2025–26 season accuracy shown above accumulates throughout the season as more games are played, so early-season figures may be noisier.
Why Is Accuracy Bounded?
NHL games have one of the highest upset rates in professional sports. Even the best teams win only ~60% of their games, meaning no model can exceed the "natural ceiling" set by game randomness. Key sources of unpredictability:
- Goalie performance variance (hot/cold streaks)
- Injuries and lineup uncertainty announced close to game time
- Puck luck (post hits, lucky bounces)
- Small sample sizes — 82-game seasons with nightly scheduling
Our goal is not to beat the ceiling — it is to provide well-calibrated probabilities that accurately reflect uncertainty, making them useful for decision-making even when no single prediction is guaranteed.
See also: Live Model Performance | Full Methodology | Today's Predictions