Collecting Retrosheet Data One reason behind the contributions in baseball research is the availability of data. Retrosheet was founded in 1989 with the purpose of computerizing baseball game records and game-by-game and play-by-play data are available for download from retrosheet.org. The contribution by Retrosheet was remarkable and deserves special recognition. David W. Smith received the […]
Working for Teams As many of you know, there has been a great growth in sports analytics in recent years which is demonstrated by the many analysts currently employed by professional sports teams. Although I have been very fortunate to have the opportunity to do baseball research as an academic and in retirement, I have […]
Rocky Colavito, a Cleveland baseball All-Star slugger, passed away this week. He was a very popular player for the Indians in the 1950’s — recently a statue of Colavito was unveiled in the Little Italy section of Cleveland. One event that defined his career was the shocking trade of the Indians with the Detroit Tigers […]
Tom Tango recently posted in BlueSky the color codes for the pitch types that they use in Baseball Savant. He asked me by email if there was a way to create a dictionary or library to represent these colors. Here’s how to use these color codes for graphs involving pitch types in R. First I […]
Previous Post on W/L Records A couple of years ago, I wrote a post “Is a Pitcher’s Win/Loss Record Meaningful?”. This was motivated by Bill Ripken’s belief (stated in his State of Play book) that a pitcher’s W/L record is meaningful beyond his other pitching measures such as ERA, SO, WHIP, etc. So the purpose […]
Introduction — Should Joe Carter be in the Hall of Fame? Last week, the 2025 Hall of Fame (HOF) ballot was released. At this time, baseball fans reflect on popular players who have not been inducted in the HOF. One popular player, especially among Canadian fans, was Joe Carter who primarily played for the Indians […]
Introduction In Chapter 5 of our ABDWR book, we discuss the process of constructing the runs expectancy table that gives the average number of runs scored in the remainder of the inning for each bases/outs situation. Using these runs expectancies, one can find the runs value of any play. By accumulating these runs values over […]
Carnegie Mellon Sports Analytics Conference I recently participated in the Carnegie Mellon Sports Analytics Conference, an annual event dedicated to showcase current research in sports statistics. (Back in April 2018, I visited CMU for a Baseball Analytics Workshop and it was nice returning for a sports stats meeting.) It was a great conference including a […]
Introduction In last week’s post, I explored the run production of different spots in the batting order for the 2024 season. We found some interesting patterns that might go against one’s thinking about the production of different batting lineups. As a follow up, I thought I’d use this post to explore this issue across recent […]
Introduction The 2024 World Series starts tonight between the two teams, the Yankees and the Dodgers, with the highest payrolls. (One of my tennis friends is upset about this — he commented that teams with small payrolls rarely make it to the World Series.) It should be an exciting series featuring the likely MVPs of […]