Simming 2018

Using Elo ratings, Baseball America simulated the Division I college baseball season 100 times to predict how everything would shake out.

Elo ratings are a way to measure the overall skill level of an individual or team in any sport that’s a zero-sum game. In other words, for every game winner there must be a game loser. The system was originally popularized as a way to rank chess players, but has now grown to reach dozens of different sports. Ratings are usually centered around the number 1500, with teams of greater skill being rated higher. These ratings can then be used to determine the win probability between any two teams.

In order to begin, every game of the season had to be collected from the NCAA’s schedule. Using Python, a programming language, each game’s home team, away team, and date were placed into an organized file. Games versus non-Division I opponents as well as games that were to be announced were filtered from the schedule.

Initial Elo ratings were drawn from, which were based upon the end to the 2017 season. Ratings were then weighted towards 1500 because every team should theoretically regress to the mean – be more average – in a following season. This occurs mainly because of changes in skill-level – losing players to the draft, seniors moving on, transfers, a new freshman class – and luck.

With that in check, two more important values had to be found. The first is called a k-value – basically a multiplier applied to a team’s margin of victory to determine how much their Elo should change.

For example, let’s say UCLA beats Washington 10-7. UCLA’s Elo was 1550 and Washington’s was 1500. How many points should be awarded to UCLA? Adjust it by too much and the system interprets UCLA’s actual skill-level as improving too much; by too little and UCLA’s skill seems unaffected by the win. Since Elo ratings have a direct effect on win probability in this kind of simulation, changes by too large of a degree on either end can cause it to spiral out of control.

Testing which value would be best involves running the simulation on the 2017 season to see how accurate each k-value is. The same process is used to figure out just how important home-field advantage is. More on that can be found here. Now on to the simulation.

If you were to simulate the season just once, it’s possible that a team could be much stronger or weaker than originally predicted. After all, one season could go a number of different ways. However with each additional simulation, a team’s average is pushed towards their most probable outcome. That’s why running the simulation 100 times on all 7,943 games better captures the reality of the season.

In the table below, the average of all 100 simulations is provided, along with a measure of how much variability there is to each team. Higher variability implies that there’s a larger range of likely outcomes due to – among other things – close matchups.

All code can be found in this GitHub.

Comments are closed.

Download our app

Read the newest magazine issue right on your phone