Publication: Statistical Methods for Complex Dependent Data
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
In the present dissertation, we consider two topics focusing on dependent data. The first topic concentrates on clustered longitudinal semi-continuous data subject to mortality, while the second topic examines properties of multiple period cluster randomized crossover trials in the small sample space. For each topic, there are many obstacles in the real world setting that make estimation and inference a challenging endeavor.
In Chapter 1, we focus on clustered longitudinal semi-continuous data subject to mortality. There is a need to develop statistical methods to evaluate such data, particularly in the nursing home set- ting. The semi-continuous distribution, which consists of a mass of zero values and a long right tail, introduces challenges in both model fitting and interpretation. Furthermore, residents of nursing homes are at high risk of mortality which acts as a competing risk. To address this knowledge gap, we propose a novel Bayesian hierarchical discrete time model for analyzing clustered and/or longitudinal semi-continuous data truncated by death. The model jointly captures the semi-continuous nature by modeling the probability mass at zero and skewed cost separately and combines it with a discrete time framework for semi-competing risks. We demonstrate the applicability of this novel model by examining factors longitudinally associated with cost and health care utilization in and across nursing homes using nationally collected data from Centers for Medicare & Medicaid Services.
Chapter 2 extends the work in the previous chapter, focusing on creating a quantifiable measure of performance for clustered longitudinal semi-continuous data subject to mortality. Comparing and understanding nursing home costs is essential for families, policymakers, and healthcare providers. These costs are a significant component of financial planning for aging individuals, and the rising de- mand for long-term care due to an aging population places a substantial economic burden on public and private healthcare systems. However, several statistical issues must be considered when analyzing these data such as the semi-continuous distribution and semi-competing risk of death. Chapter 1 pro- poses a hierarchical discrete time model that accounts for these complexities. Utilizing this model, we propose several metrics for profiling nursing homes in this chapter. We apply our performance metrics to profile nursing homes on the basis of 90-day costs and mortality for nursing homes in Massachusetts and California using nationally collected data from Centers for Medicare & Medicaid services.
Chapter 3 shifts focus towards cluster randomized trials (CRTs) in the small sample setting, an- other common setting for dependent data. CRTs are becoming more commonly used to assess the effectiveness of health interventions; however, they are often designed with a small number ( 10) of clusters. It is unclear which statistical methods are optimal in this extreme scenario, as common approaches typically assume for more than 40 clusters. More complex designs such as the multiple period cluster randomized crossover trial (MPCRCT) can be used to mitigate issues caused by a small number of clusters, but little work has been done on analyzing their properties in this setting. This chapter aims to look at the effectiveness of generalized linear mixed models (GLMM) and generalized estimating equations (GEE), along with their corresponding small sample variance corrections, for MPCRCTs with fewer than 10 clusters. Results of this simulation trial will be used as guidance for MI-VACUNA, an ongoing MPCRCT examining the effectiveness of motivational interviewing on vaccine hesitancy at two Boston community health centers.