Publication: A Forest for the Trees: Using Random Forests for Small Area Estimation on US Forest Inventory Data
No Thumbnail Available
Open/View Files
Date
2023-06-30
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Schmitt, Julian Francis. 2023. A Forest for the Trees: Using Random Forests for Small Area Estimation on US Forest Inventory Data. Bachelor's thesis, Harvard University Engineering and Applied Sciences.
Research Data
Abstract
Methods which estimate population parameters of interest across small areas is a growing field of research. These problems arise frequently in election prediction, healthcare monitoring, and environmental studies. The Forest Inventory and Analysis Program (FIA) of the US Forest Service tracks forest metrics, such as basal area and above ground carbon, to ensure sustainable stewardship of the nation's forests and preserve her resources for future generations. Their estimates combine expensive ground plot observations of the variables of interest alongside inexpensive and plentiful auxiliary data collected by remote sensing. Historically, estimators in this setting either rely on means or linear parametric models, such as the post-stratified estimator, area-level empirical best linear unbiased predictor (area-EBLUP), and unit-level empirical best linear unbiased predictor (unit-EBLUP) models. Here, we present the results of a simulation study to compare these standard estimators to a new problem-specific estimator, as well as machine learning models. The problem-specific zero-inflated estimator is introduced to address the overabundance of zero observations in FIA ground plot observations, while machine learning methods, including the random forest and mixed-effects random forest (SMERF) seek to flexibly capture non-linear relationships between the predictors and the response variable to improve performance while also addressing the zero-inflation problem. We track both bias and root mean squared error across the six estimators to assess their performance and find that there is no universal ``best model." Instead we find a complex story in which the post-stratified and area-EBLUP models have exceptionally low bias, particularly across areas with low-carbon levels however when examining root mean squared error, the zero-inflation model performs well. Across higher carbon levels model performance is even more complex. We close with implications for these results alongside avenues to improve estimation at scale across the US.
Description
Other Available Sources
Keywords
carbon, forestry, MERF, random forest, small area estimation, Applied mathematics, Environmental science, Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service