What is a Brier score in NHL predictions?

The Brier score measures probabilistic accuracy as the mean squared error between predicted probability and outcome. Lower is better; 0.25 is random.

How accurate are NHL game predictions?

Our model achieves 54-63% accuracy depending on the sample. The true out-of-sample holdout accuracy is approximately 54.6%. Research suggests ~58% may be near the practical ceiling for NHL single-game predictions.

NHL Prediction Accuracy | Model Performance Explained

This page explains how to interpret our model's prediction accuracy metrics and provides transparent performance tracking across the 2025–26 NHL season.

Current Season Performance

54.6%

OOS Accuracy

(in-sample: 61.4%)

0.228

Brier Score

2.25

RMSE (Totals)

4,189

Games Evaluated

OOS Accuracy = true holdout result (train 2023–25, test 2025–26 season, 1,056 games) — the honest generalization estimate. In-sample figure includes training data and overstates performance. Brier = probabilistic score (lower is better, 0.25 = random). RMSE Total = root-mean-square error on predicted total goals.

What the Metrics Mean

Accuracy

The fraction of games where the model correctly predicted the winner (the team with win probability >50%). A naive 50/50 coin flip gives ~50%. Our model typically achieves 54–63%, depending on the game window. Hockey is highly random; research suggests ~58% may be near the practical ceiling for single-game NHL predictions.

Brier Score

The Brier score measures probabilistic accuracy: it is the mean squared difference between the predicted probability and the binary outcome (1 = home win, 0 = home loss). A random model predicting 50% every game scores 0.25. Lower Brier scores are better. Our model targets <0.235, indicating meaningful probability calibration beyond chance.

Calibration

Calibration measures whether predicted probabilities match observed frequencies. If the model says 65% in 100 games, those teams should win about 65 of them. Our isotonic calibration post-processing ensures predicted probabilities are honest, not just directionally correct. See the Performance page for calibration curves.

RMSE (Totals)

Root-mean-square error on predicted total goals (over/under). A perfect model would score 0; random guessing around the league average (~5.8 goals) scores ~2.4. Our model typically achieves 2.1–2.4.

Cross-Validation Results (3 folds)

Walk-forward cross-validation (no future data leakage). Avg Brier: 0.2485 | Avg Log-loss: 0.6905 | Avg RMSE (Totals): 2.411

Fold	Brier	Log-loss	RMSE Total	Train N	Val N
1	0.2524	0.6988	2.464	701	2,097
2	0.2478	0.6888	2.394	1,399	1,399
3	0.2454	0.6839	2.375	2,103	695

Monthly Accuracy Trend

Win/loss prediction accuracy by calendar month. Larger samples = more stable estimates.

Monthly Breakdown

Month	Games	Accuracy	Brier Score
2023-10	140	66.4%	0.2172
2023-11	213	67.1%	0.2227
2023-12	219	63.5%	0.2199
2024-01	208	61.1%	0.2238
2024-02	172	64.5%	0.2177
2024-03	228	71.9%	0.2066
2024-04	132	63.6%	0.2261
2024-10	166	71.7%	0.1957
2024-11	220	62.7%	0.2178
2024-12	214	68.2%	0.2087
2025-01	224	60.3%	0.2328
2025-02	122	49.2%	0.2503
2025-03	234	62.8%	0.2250
2025-04	132	52.3%	0.2485
2025-10	180	59.4%	0.2423
2025-11	225	52.0%	0.2408
2025-12	226	53.1%	0.2493
2026-01	240	55.8%	0.2480
2026-02	74	64.9%	0.2236
2026-03	242	57.0%	0.2439
2026-04	125	63.2%	0.2234

True Out-of-Sample Accuracy

Cross-validation on historical data can still overestimate real-world performance. Our strictest test: train on 2023–24 and 2024–25 seasons, test on the 2025–26 season (entirely unseen data at training time).

True holdout accuracy (train 2023–25, test 2025–26): ~54.6%
This is the most honest estimate of how the model performs on genuinely new games. The gap between in-sample accuracy (63%) and holdout (54.6%) reflects variance and the inherent difficulty of hockey prediction.

The 2025–26 season accuracy shown above accumulates throughout the season as more games are played, so early-season figures may be noisier.

Why Is Accuracy Bounded?

NHL games have one of the highest upset rates in professional sports. Even the best teams win only ~60% of their games, meaning no model can exceed the "natural ceiling" set by game randomness. Key sources of unpredictability:

Goalie performance variance (hot/cold streaks)
Injuries and lineup uncertainty announced close to game time
Puck luck (post hits, lucky bounces)
Small sample sizes — 82-game seasons with nightly scheduling

Our goal is not to beat the ceiling — it is to provide well-calibrated probabilities that accurately reflect uncertainty, making them useful for decision-making even when no single prediction is guaranteed.