Association pattern discovery via theme dictionary models
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
MetadataShow full item record
CitationDeng, Ke, Zhi Geng, and Jun S. Liu. 2013. “Association Pattern Discovery via Theme Dictionary Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2) (September 18): 319–347. doi:10.1111/rssb.12032.
AbstractDiscovering patterns from a set of text or, more generally, categorical data is an important problem in many disciplines such as biomedical research, linguistics, artificial intelligence and sociology. We consider here the well‐known ‘market basket’ problem that is often discussed in the data mining community, and is also quite ubiquitous in biomedical research. The data under consideration are a set of ‘baskets’, where each basket contains a list of ‘items’. Our goal is to discover ‘themes’, which are defined as subsets of items that tend to co‐occur in a basket. We describe a generative model, i.e. the theme dictionary model, for such data structures and describe two likelihood‐based methods to infer themes that are hidden in a collection of baskets. We also propose a novel sequential Monte Carlo method to overcome computational challenges. Using both simulation studies and real applications, we demonstrate that the new approach proposed is significantly more powerful than existing methods, such as association rule mining and topic modelling, in detecting weak and subtle interactions in the data.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:33719945
- FAS Scholarly Articles