Prediction With Systematically Missing Data: Methods for Health Plan Payment and Cancer Stage Classification
Access Status
Full text of the requested work is not available in DASH at this time ("restricted access"). For more information on restricted deposits, see our FAQ.Author
Bergquist, Savannah
Metadata
Show full item recordCitation
Bergquist, Savannah. 2019. Prediction With Systematically Missing Data: Methods for Health Plan Payment and Cancer Stage Classification. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.Abstract
Missing data is a common barrier in health services research and has important implications for both health plan payment policy and cancer outcomes research. This dissertation assesses two approaches for leveraging data in plan payment risk adjustment, and evaluates lung cancer stage classification algorithms and subsequently estimates survival outcomes.Chapter one evaluates non-representative sampling in Medicare Advantage risk adjustment. Setting per-person payments based on data samples that differ nontrivially from their target populations may incorrectly characterize expected costs and create unintended adverse incentives. A propensity-score matched sample of traditional Medicare beneficiaries who resemble Medicare Advantage enrollees is used to estimate risk adjustment formulas. Matching improves balance on observables but fitting the risk adjustment formulas on a random versus a matched sample yields little difference in plan payments, suggesting that employing a random sample for risk adjustment estimation is not a large contributor to problematic selection incentives.
Chapter two proposes to break the feedback loop between insurer actions and health plan payments by transforming the data used to set payments. Data modified to reflect the researcher or policymaker’s beliefs about efficient and fair levels of spending versus observed spending levels can be used for calibrating payments. The proposed data modification approach is demonstrated in two Medicare applications and compared to two other common methods, illustrating that the “side effects” of the approaches vary by context and that data transformation is an effective tool for addressing misallocations in individual health insurance markets.
Chapter three examines using health insurance claims data to classify lung cancer stage and compares survival outcomes based on observed and predicted stage. Oncology health outcomes research has been limited by the difficulty of identifying cancer stage in claims data, and this study first demonstrates the feasibility of employing machine learning-based methods to classify early versus late stage lung cancer. This work is then extended to predicting a tripartite outcome of stages I-II, stage III, and stage IV, which is more clinically relevant due to the survival differences between these groups. The machine learning-based classification algorithms approximate the separation obtained by stratifying survival on the observed lung cancer stages.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029496
Collections
- FAS Theses and Dissertations [5370]
Contact administrator regarding this item (to report mistakes or request changes)