Publication:

Pitch Type Prediction in Major League Baseball

Loading...
Thumbnail Image

Date

2019-08-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Plunkett, Ryan. 2019. Pitch Type Prediction in Major League Baseball. Bachelor's thesis, Harvard College.

Abstract

This thesis was first conceived as an extension of previous research done in the realm of the "pitch prediction" problem, defined here as the process by which individuals competing in Major League Baseball contests decide which of their pitches they will throw in a given situation. Prior work suggests that binary classification (i.e. predicting whether an upcoming pitch will be a fastball or non-fastball) is possible, and thus, we hope to improve upon these accuracies and extend the research to multi-class classification systems as well.

Since every pitcher is unique, rather than weighting all observations equally when attempting to make predictions for a specific individual, we instead introduce the idea of "similarity analysis" via kernel-weighting mechanisms, in which pitchers deemed comparable via some metric are leveraged more heavily during the training of our localized models.  To identify similar pitchers, we represent individuals as three-dimensional clouds of points based on the physical attributes of their pitches, then apply the Earth Mover's Distance algorithm to obtain a measure of distance before using unsupervised learning to cluster pitchers.  

Despite our best attempts at classification (both binary and multi-class), we find that our models struggle to surpass the naive baselines set forth in prior research, suggesting that predicting pitch selection may be more difficult than previously reported.  The thesis concludes by hypothesizing why our work may have yielded results differing from those of earlier authors while simultaneously examining the possibility of a link between pitcher predictability and performance.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories