Publication: Pitch Type Prediction in Major League Baseball
Loading...
Open/View Files
Date
2019-08-23
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Plunkett, Ryan. 2019. Pitch Type Prediction in Major League Baseball. Bachelor's thesis, Harvard College.
Abstract
This thesis was first conceived as an extension of previous research done in the realm of the "pitch prediction" problem, defined here as the process by which individuals competing in Major League Baseball contests decide which of their pitches they will throw in a given situation. Prior work suggests that binary classification (i.e. predicting whether an upcoming pitch will be a fastball or non-fastball) is possible, and thus, we hope to improve upon these accuracies and extend the research to multi-class classification systems as well.
Since every pitcher is unique, rather than weighting all observations equally when attempting to make predictions for a specific individual, we instead introduce the idea of "similarity analysis" via kernel-weighting mechanisms, in which pitchers deemed comparable via some metric are leveraged more heavily during the training of our localized models. To identify similar pitchers, we represent individuals as three-dimensional clouds of points based on the physical attributes of their pitches, then apply the Earth Mover's Distance algorithm to obtain a measure of distance before using unsupervised learning to cluster pitchers.
Despite our best attempts at classification (both binary and multi-class), we find that our models struggle to surpass the naive baselines set forth in prior research, suggesting that predicting pitch selection may be more difficult than previously reported. The thesis concludes by hypothesizing why our work may have yielded results differing from those of earlier authors while simultaneously examining the possibility of a link between pitcher predictability and performance.
Description
Other Available Sources
Research Data
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service