Publication:

Using Machine Learning to Predict Future Points in the NHL

Loading...
Thumbnail Image

Date

2017-11-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Matsuzawa, Takehiro. 2017. Using Machine Learning to Predict Future Points in the NHL. Bachelor's thesis, Harvard College.

Abstract

Goal creation in ice hockey is complicated and difficult to understand. Unlike baseball, players and the puck are always moving and a combination of 5 players produces a goal. Recently, the NHL has published new data about each game. As new statistical analysis tools such as neural network, k-nearest neighborhood and random forest regression become available, it becomes possible to analyze goals and assists more holistically by using a wide range of statistical methods. The aims of this study were to predict the average number of points of each player in the next 5 games by looking at the statistics of an individual player, his team and his opponents in the previous 10 games. Subsequent to this study, I was able to find important variables to predict the number of points. In this study, the random forest regression predicted the average number of points in the next 5 games by mean squared error of 0.0675. This is about a 70% improvement compared to mean squared error of a baseline model that predicts that all the players get the average number of points in the next 5 games.
This paper starts with an exploration of relevant sports statistics and their history. Then the paper shifts its focus on explaining relevant work by other people (Chapter 1). Next the paper explains different modern machine algorithms such as neural network regression, random forest regression and k-nearest neighborhood regression used in this research (Chapter 2). Then the paper seeks to minimize prediction errors and find significant features with these methods (Chapter 3). Finally the paper summarizes our main findings (Chapter 4).

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories