Pitch Type Prediction in Major League Baseball
MetadataShow full item record
CitationPlunkett, Ryan. 2019. Pitch Type Prediction in Major League Baseball. Bachelor's thesis, Harvard College.
AbstractThis thesis was first conceived as an extension of previous research done in the realm of the "pitch prediction" problem, defined here as the process by which individuals competing in Major League Baseball contests decide which of their pitches they will throw in a given situation. Prior work suggests that binary classification (i.e. predicting whether an upcoming pitch will be a fastball or non-fastball) is possible, and thus, we hope to improve upon these accuracies and extend the research to multi-class classification systems as well.
Since every pitcher is unique, rather than weighting all observations equally when attempting to make predictions for a specific individual, we instead introduce the idea of "similarity analysis" via kernel-weighting mechanisms, in which pitchers deemed comparable via some metric are leveraged more heavily during the training of our localized models. To identify similar pitchers, we represent individuals as three-dimensional clouds of points based on the physical attributes of their pitches, then apply the Earth Mover's Distance algorithm to obtain a measure of distance before using unsupervised learning to cluster pitchers.
Despite our best attempts at classification (both binary and multi-class), we find that our models struggle to surpass the naive baselines set forth in prior research, suggesting that predicting pitch selection may be more difficult than previously reported. The thesis concludes by hypothesizing why our work may have yielded results differing from those of earlier authors while simultaneously examining the possibility of a link between pitcher predictability and performance.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364634
- FAS Theses and Dissertations