The Novice's Guide to Advanced Stats Part II: Baseball

Posted by Taylor Nigrelli on February 13, 2015 · 9 mins read

When discussing advanced statistical analysis in sports, baseball is a great place to start. Because of the nature of the game, it has adapted to the advanced stats revolution more quickly than the other major North American sports.

Baseball can be easily quantified because it occurs in sets. A pitcher throws to a batter, the batter hits the ball and he goes to first. The next batter strikes out. The third batter hits the ball to the shortstop and is put out. These events occur independently of each other, which makes them easy to quantify. Unlike basketball and hockey, which feature free-flowing action, and football, which features difficult-to-quantify aspects such as blocking, baseball easily lends itself to statistical analysis.

Baseball has always been a game of numbers. Since the game’s beginnings, people obsessed over home runs, RBI and pitcher win totals. Around 1977, a man named Bill James began to turn his curiosity toward unorthodox stats into actual data and research. While working the night shift as a security guard, he wrote and published Baseball Abstract, a short book featuring in-depth, advanced statistical analysis on each MLB team. (It should be said that there was many others who helped advanced the cause of analytics in baseball, but Bill James is considered the most prominent and important. Also, this guide is more about the themes than the history).

It took years for James’ work to gain mainstream notoriety, the price that a pioneer must pay. In 2003, Michael Lewis extensively profiled Oakland Athletics General Manager Billy Beane for his book Moneyball. Beane had taken to running his team partially through the use of advanced statistics. The book launched a revolution. Websites dedicated to advanced baseball stats grew in popularity; teams took heed and hired their own analytics guys and advanced stats entered the national consciousness. (Poor James had to wait to become a “household name,” while Beane was depicted in the 2011 Oscar-nominated film adaptation of Moneyball by Brad Pitt. Some guys have it good.)

The basic premise of this revolution is simple: to find out which stats are useful for player evaluation and which aren’t. If a stat doesn’t directly measure a skill, then it isn’t a good stat. Then, new stats can be created to exploit areas that aren’t being considered enough. There are five tools/skills baseball players are judged by: arm strength, fielding ability, ability to hit for power, ability to hit for contact and speed.

Here’s an example of where advanced metrics changed the way we talk about the game. Throughout baseball history, individual wins and RBI were both considered important stats. Many still consider them important, but they are at best completely circumstantial, and compared with the available metrics, they have been demoted to inconsequential. Judging a pitcher on his win total is foolish because win total is dependent on factors beyond the pitcher’s control. The same goes for RBI; why judge a player by his RBI total when he has no control over how many players will be on base in front of him?

So, alternative stats emerged. Instead of wins, analysts began to consider earned run average. Then, to eliminate any advantages defenses might be giving pitchers, statisticians began to use defense-independent stats. The most popular, Fielding Independent Pitching (FIP), is calculated and compiled by popular baseball statistic websites Fangraphs and Baseball Reference. This statistic comes into play often when a pitcher is putting up worse-than-expected numbers early in a season. If his FIP hasn’t changed and his infield hasn’t gotten significantly worse defensively, you can reasonably expect regression.

The statistic batting average on balls in play (BABIP) is somewhat similar to FIP. The statistic does as its name says – measures a players batting average on balls he puts into play. Generally, a player should have a steady BABIP throughout his prime. If his BABIP is much higher than usual, he is due for negative regression. If it is much lower than usual, positive regression should be in store.

Some other statistical innovations in baseball include.

On-base percentage – This was an early one and is fairly simple. It’s the percentage of time a player reaches base when he comes up to bat. It’s more effective than just measuring batting average because it includes previously-undervalued walks.

Park adjusted stats – Some fields are easier to hit in than others due to stadium dimensions, outfield set up and, in some cases, air pressure (Colorado). Thus, players who play in parks that are harder to hit in will have worse stats over long periods of time while guys who play home games in easier parks will have inflated stats. Park-adjusted stats make up for this. The formula behind calculating them is complicated, but luckily sites like Fangraphs and Baseball Reference do the math and post the results. So, a team that has the “best park-adjusted offense” is the best hitting team after considering the difficulty of the parks they’ve played in at that point in the season.

Historical trends – It seems like common sense that teams shouldn’t give 10-year contracts to players in their early 30’s, but they do it every year anyway. Why? They either don’t pay close enough attention to historical trends or irrationally believe that their player can buck those trends. For example, players tend to start declining in their early 30’s and usually aren’t capable of playing into their mid-t0-late 30’s. So, if a player has been an MVP-caliber guy for 10 years and just turned 30, a team is better off trying to get him to sign a short-term deal or letting him sign elsewhere.

WAR – Wins above replacement, or WAR, has become one of the most popular stats because of the simplicity of its use. It measures the number of wins a player is worth to his team against the common baseline of a Triple-A “replacement” for that player. So, if Los Angeles Angels outfielder Mike Trout has a WAR of 10 in a season, he was worth 10 more wins to the Angels than the average Triple A replacement. The formula behind calculating WAR is fairly complicated but if you understand the basic premise I laid out, that’s all you need to know.

PITCHf/x – As I said above, the MLB fully embraced advanced stats before the other major sports leagues. Thus, it has made advancements beyond what the other leagues have achieved. I could take this in many directions, but PITCHf/x gives a good enough example. It is essentially a system of cameras used to track the movements of pitched balls with an accuracy better than one inch or one mile per hour. This allows teams to track just how fast a pitcher throws, which is important because diminishing velocity is often a sign a pitcher will soon become less effective. Additionally, it allows teams to track exactly how much a pitch breaks, curves or moves in general. All this with two cameras in each stadium. Although this seems like a very recent innovation, or something that’s theoretical and not in place yet, PITCHf/x has been used by MLB teams since 2006. Like I said, the MLB is ahead of the other leagues.

Although the NFL, NBA and NHL aren’t exactly on baseball’s level, all have made significant strides in recent years which I will discuss in upcoming posts. Enjoy the weekend.