Our Methodology
Transparency is a core principle of NHLForecasts. This page explains exactly how our prediction model works — from the data it ingests to the probabilities it outputs — so you can evaluate the forecasts on their merits rather than taking them on faith.
Model Overview
Every game prediction is produced by blending two complementary machine-learning models:
- Logistic Regression — A linear model with L2 regularisation that captures the direct relationship between team features and win probability. Its simplicity makes it resistant to overfitting on small samples.
- Gradient-Boosted Classifier (GBC) — A tree-based ensemble that captures non-linear interactions between features, such as the compounding effect of home ice and rest advantage.
The final win probability is a 50/50 average of both models, passed through isotonic calibration so that stated probabilities match observed win rates.
Model Inputs & Features
The model uses 16 features derived from recent team and goalie performance. All features use walk-forward construction — only data available before each game is used — to prevent data leakage.
| Feature Category | Examples | Purpose |
|---|---|---|
| Rolling Win Rate | 10-game win % (home & away) | Recent team form |
| Goal Differential | Goals scored minus allowed per game (10-game window) | Margin of victory / defeat |
| Goalie Performance | Recent save percentage of projected starter | Starting goaltender quality |
| Home Ice | Home/away indicator, home win rate | Venue advantage (~54% historical home win rate) |
| Rest & Schedule | Days since last game (both teams) | Fatigue and back-to-back effects |
Win Probability Production
For each upcoming game the pipeline:
- Builds features from the latest available data
- Runs both models to get raw probabilities
- Averages the two outputs (50% logistic + 50% GBC)
- Applies isotonic calibration trained on historical predictions
- Outputs a calibrated home-win probability (away = 1 − home)
The blending and calibration steps are validated through expanding-window cross-validation to ensure they generalise to unseen games.
Goal Totals Prediction
Game totals (over/under) are predicted separately from the winner:
- Separate gradient-boosted regressors predict expected home and away goals.
- A Monte Carlo simulation draws thousands of score outcomes using learned residual dispersion (negative-binomial + normal mix) to produce over/under probabilities for common lines like 5.5 and 6.5 total goals.
Expected Goals (xG) Model
Our shot-level xG model is a gradient-boosted classifier trained on individual shot events. It estimates the probability that each shot becomes a goal using features including:
- Shot distance and angle to the net
- Shot type (wrist, slap, backhand, deflection, etc.)
- Strength state (even strength, power play, shorthanded)
- Score differential at the time of the shot
- Period and game clock
The xG model is trained with strict temporal integrity — only shots from prior seasons are used for training — to prevent future data from leaking into historical metrics. See the xG Analysis page for team and player leaderboards.
Data Sources & Update Cadence
All data comes from the official NHL API. During the regular season and playoffs the pipeline runs daily to:
- Ingest completed game results and shot events
- Update rolling features for every team
- Re-run predictions for upcoming games
- Refresh performance metrics and calibration
- Regenerate the website with updated data
Transparency Commitment
We believe predictions without accountability are just noise. That's why we publish live performance metrics — including accuracy, Brier scores, calibration charts, and team-level breakdowns — updated with every site build. If the model is wrong, the data will show it.
For a broader introduction to the analytics concepts used here, see our NHL Analytics Guide.
Frequently Asked Questions
How are NHL game predictions made?
Each game prediction blends two machine-learning models — logistic regression and gradient-boosted classification — trained on thousands of historical NHL games. The models use rolling team stats, goalie performance, home-ice advantage, and rest days to produce a calibrated win probability for each team.
What data does the model use?
The model ingests game results, goalie stats, and shot-level data from the official NHL API. Features include 10-game rolling win percentages, goal differentials per game, recent goalie save percentages, home/away splits, and rest-day advantages. Data is updated daily during the NHL season.
What is isotonic calibration?
Isotonic calibration is a post-processing step that adjusts raw model outputs so that predicted probabilities match observed frequencies. If the model says 65% win probability, isotonic calibration ensures teams in that range actually win about 65% of the time.
How accurate are the predictions?
Our model achieves roughly 55–60% accuracy on straight win/loss picks with a Brier score around 0.235. Hockey is inherently random — research suggests ~58% may be near the practical ceiling for single-game NHL predictions. See our performance page for live accuracy tracking.
Explore
- NHL Analytics Guide — Corsi, xG, and key terms explained
- Model Performance — Live accuracy and calibration
- Today's Predictions — Daily NHL game probabilities
- xG Analysis — Team and player expected goals