Pitch Type Prediction in Major League Baseball
Citation
Plunkett, Ryan. 2019. Pitch Type Prediction in Major League Baseball. Bachelor's thesis, Harvard College.Abstract
This thesis was first conceived as an extension of previous research done in the realm of the "pitch prediction" problem, defined here as the process by which individuals competing in Major League Baseball contests decide which of their pitches they will throw in a given situation. Prior work suggests that binary classification (i.e. predicting whether an upcoming pitch will be a fastball or non-fastball) is possible, and thus, we hope to improve upon these accuracies and extend the research to multi-class classification systems as well.Since every pitcher is unique, rather than weighting all observations equally when attempting to make predictions for a specific individual, we instead introduce the idea of "similarity analysis" via kernel-weighting mechanisms, in which pitchers deemed comparable via some metric are leveraged more heavily during the training of our localized models. To identify similar pitchers, we represent individuals as three-dimensional clouds of points based on the physical attributes of their pitches, then apply the Earth Mover's Distance algorithm to obtain a measure of distance before using unsupervised learning to cluster pitchers.
Despite our best attempts at classification (both binary and multi-class), we find that our models struggle to surpass the naive baselines set forth in prior research, suggesting that predicting pitch selection may be more difficult than previously reported. The thesis concludes by hypothesizing why our work may have yielded results differing from those of earlier authors while simultaneously examining the possibility of a link between pitcher predictability and performance.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364634
Collections
- FAS Theses and Dissertations [6847]
Contact administrator regarding this item (to report mistakes or request changes)