SABR Analytics Conference: Day Three

Proliferation Of 'Big' Data Allows For Complex Analysis

Follow me on Twitter

See also: SABR Analytics Conference: Day One and Day Two

The Society for American Baseball Research hosts its second annual Analytics Conference in Phoenix this week. SABR president Vince Gennaro said in his opening remarks that the conference exists to assemble the thought leaders in the field of analytics to discuss topics important to the game. The three-day event includes research presentations, featured speakers such as Bill James, Joe Posnanski and Brian Kenny, as well as moderated
panels with players, general managers and agents.

PHOENIX—In the 2001 edition of the "Historical Abstract," Bill James put forth the notion that most significant pitchers in history could be sorted into "families" based on the shape of their on-field performance.

The Kevin Brown family, for example, would consist of groundball pitchers who also had a strikeout pitch. The Jim Kaat family of pitchers would encompass physical lefties who worked gracefully and threw strikes.

The proliferation of baseball data in the decade since James published the "Historical Abstract" has opened new avenues by which to classify pitchers. Since being implemented in each major league park in 2008, Pitch f/x cameras have tracked the type, velocity and movement on the horizontal and vertical planes for more than 712,000 pitches.

The complexity, granularity and volume of Pitch f/x data prompted SABR president Vince Gennaro, in his research presentation "The Big Data Approach To Baseball Analytics," to apply complex data-processing and graphing to the task of sorting contemporary big league pitchers into clusters of similar pitchers.

With so many data points at his disposal, Gennaro was able to narrowly define each pitcher cluster, using pitcher attributes instead of outcomes, marking an important departure from James' initial work. Many of the attributes Gennaro included would seem more natural in a scouting report than a spreadsheet. They included velocity, horizontal and vertical movement, arm slot, preferred out-pitch, swinging-strike percentage, overall strike percentage and favored pitch sequences versus both lefthanded and righthanded batters.

Considering all those attributes, Gennaro presented a scatter plot graph in which pitchers were sorted into groups based on things like velocity, swing-and-miss rate and movement. For example, Justin Verlander and Craig Kimbrel belong to the group of high-velocity, high swing-and-miss rate, high-movement pitchers. Those righthanders who favor fastballs and curveballs against lefthanded batters include the likes of Gavin Floyd, Yovani Gallardo, Wade Davis and Zack Greinke.

Due to the presence of left/right split data, Gennaro took the process further, identifying those pitchers who are most similar and most opposite to themselves based on how they attack batters on each side of the plate. Matt Harrison, Wade Miley and Ross Detwiler were remarkably consistent in the way in which they attacked lefties and righties. Meanwhile, C.C. Sabathia, Rick Porcello, Hiroki Kuroda and Matt Cain show distinct pitch preference depending on which side of the plate the batter stands.

Presumably, many major league teams have built their own datasets along the lines of the one Gennaro presented, taking the integration of scouting and statistics to its natural conclusion. The resulting information would aid them in making pitcher acquisitions, seeing as clubs could find precedent for any pitcher type they're considering and base predictions on real-life results. The data also condenses many of the elements that surely show up in advance scouting reports for the upcoming opponent.

For the rest of us, it looks like we need to fire up Excel and get to work.