Enhancing In-Game Win Probability with Expected Goals
Current Matchups
On November 25, 2025, we completed a major upgrade to our in-game win probability model by integrating Expected Goals (xG) metrics. This enhancement allows our model to consider shot quality alongside actual scoring, providing more nuanced and accurate win probability curves throughout each game.
Two-Phase Development
The integration happened in two critical phases:
Phase 1: Critical Bug Fixes (Nov 25, 12:04 UTC)
Before integration, we discovered and fixed two critical bugs in our xG model that were affecting prediction accuracy:
The Angle Calculation Bug
Our shot angle calculation was fundamentally broken. We were using atan(dy/dx), which failed to properly handle all quadrants of the ice.
Impact: Shots from behind the net were being treated as high-quality slot shots: - Behind net (95, 0): Calculated as 0° (slot quality) - Should be 180° (behind net, very low quality)
The Fix: We switched to atan2(dy, dx) for proper quadrant handling across the entire rink. This ensures shots from different ice positions get appropriate angle assessments.
# Before (buggy)
angle = math.atan(dy / dx) # Fails for different quadrants
# After (fixed)
angle = math.atan2(dy, dx) # Properly handles all positions
This fix is implemented in nhl-revamp/shot_data.py lines 208-234.
Enhanced Strength State Detection
Previously, our model could only distinguish between "Even" and "NonEven" situations. This meant we couldn't tell the difference between: - High-danger power play shots (5-on-4) - Low-danger penalty kill shots (4-on-5) - Empty net situations
The Fix: We enhanced the strength state mapping to parse the NHL API's situation codes and determine the shooting team's context:
- Even: 5-on-5 or equal strength
- PowerPlay (PP): Team has man advantage
- PenaltyKill (PK): Team is short-handed
- EmptyNet: Opponent pulled their goalie
- Other: Unusual situations
This enhancement dramatically improved our xG model's ability to assess shot quality in different game situations. The implementation is in nhl-revamp/expected_goals.py lines 104-148.
Current Performance (Temporal Training): - ROC AUC: 0.755 (excellent discrimination) - Log Loss: 0.218 (well-calibrated probabilities) - Brier Score: 0.058 (low prediction error) - Training data: 141,407 shots with 7.16% goal rate - Temporal split: 47K training, 12K test (chronological) - Production accuracy: 0.983 finishing rate (near-perfect calibration)
Note: December 2025 update implements temporal training (Issue #20) to prevent data leakage, providing honest performance metrics that reflect true predictive capability
Phase 2: In-Game Integration (Nov 25, 17:08 UTC)
With the xG model fixed and validated, we integrated it into our in-game win probability model by adding three new features:
- home_xg: Cumulative expected goals for home team
- away_xg: Cumulative expected goals for away team
- xg_diff: Home xG minus Away xG
Technical Implementation
The integration required careful feature engineering to maintain prediction quality:
Batch xG Calculation
Rather than calculating xG for each shot individually during game progression, we pre-calculate xG probabilities for all shot events in a single batch. This is more efficient and ensures consistent predictions.
Cumulative Tracking
As we walk through each game event chronologically, we accumulate xG values for both teams. After each event (goal, shot, penalty, etc.), we calculate: - Total expected goals generated by the home team so far - Total expected goals generated by the away team so far - The differential between them
Event-Level Merging
To connect shot data with game events, we merge on both game_id and event_id from the NHL API. This ensures each shot event carries its xG value through the feature pipeline.
The implementation spans multiple files:
- Feature engineering: nhl-revamp/in_game/features.py lines 24-110
- Shot data merging: nhl-revamp/in_game/model.py lines 80-114
Model Performance
The enhanced in-game model now has 13 total features (up from 10):
Existing features: - pregame_home_win_prob - score_diff, period, time_remaining, is_overtime - time_remaining_pct, lead_change - score_diff_x_time, score_diff_squared, abs_score_diff
New xG features: - home_xg, away_xg, xg_diff
Despite the added complexity, the model maintains excellent calibration: - Brier Score: 0.1262 (well-calibrated) - Log Loss: 0.3862 (strong predictive accuracy) - Training: 2023-2024 and 2024-2025 seasons
Real-World Example: Game 2024020011
Let's see how xG integration affects win probability in a real game where the home team won 6-4:
| Period | Time | Score | Home xG | Away xG | Win Prob |
|---|---|---|---|---|---|
| 1 | 09:52 | 0-1 | 0.07 | 0.11 | 51.7% |
| 1 | 13:24 | 2-2 | 0.37 | 0.46 | 53.5% |
| 1 | 18:23 | 3-2 | 0.85 | 0.46 | 78.7% |
| 2 | 16:49 | 4-2 | 1.70 | 1.02 | 96.0% |
| 3 | 17:05 | 6-4 | 2.85 | 2.42 | 97.9% |
Final: 6-4 score with 2.85 vs 2.54 xG
Key Insight: At 09:52 in the first period, the home team was down 0-1 but had slightly better shot quality (0.07 vs 0.11 xG). The model assigned a 51.7% win probability - barely below 50% despite being down a goal.
This demonstrates how xG provides crucial context: the home team wasn't being dominated in shot quality, so the model correctly recognized this as a close game rather than panic over the one-goal deficit.
Benefits of xG Integration
1. More Nuanced Predictions
Win probability now considers shot quality, not just the scoreboard. A team can be down 2-1 but have accumulated 2.5 xG to their opponent's 0.8, suggesting they're creating better chances and may deserve a higher win probability than the score suggests.
2. Better Game State Representation
Situations like "down 0-1 but out-shooting opponent with quality chances" are now properly distinguished from "down 0-1 and being dominated."
3. Industry-Standard Alignment
Expected Goals is the gold standard for shot quality in hockey analytics. Our integration aligns us with modern sports analytics practices used by NHL teams and leading analytics platforms.
4. Maintained Calibration
Despite adding three new features, our model's calibration metrics (Brier score, log loss) remain excellent. The xG features enhance predictions without introducing instability.
5. Automatic Integration
The xG columns (home_xg, away_xg, xg_diff) are automatically included in all prediction outputs and in-game dashboards. No additional code changes needed for downstream consumers.
What's Next
This integration represents a significant step forward in our in-game analytics. Future enhancements we're exploring include:
- Shot location heatmaps overlaid on win probability curves
- Player-level xG contributions during critical game moments
- xG-based "luck index" identifying games where scoring diverged from shot quality
- Integration of xG metrics into our pregame prediction models
The xG integration demonstrates our commitment to incorporating modern analytics while maintaining the transparency and calibration quality our users expect. You can explore these enhanced win probability curves in the in-game dashboards for any completed game this season.
Get daily picks & playoff updates
Short, data-driven emails. Unsubscribe any time.