Publication:

Association pattern discovery via theme dictionary models

Loading...
Thumbnail Image

Date

2013

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley-Blackwell
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Deng, Ke, Zhi Geng, and Jun S. Liu. 2013. “Association Pattern Discovery via Theme Dictionary Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2) (September 18): 319–347. doi:10.1111/rssb.12032.

Abstract

Discovering patterns from a set of text or, more generally, categorical data is an important problem in many disciplines such as biomedical research, linguistics, artificial intelligence and sociology. We consider here the well‐known ‘market basket’ problem that is often discussed in the data mining community, and is also quite ubiquitous in biomedical research. The data under consideration are a set of ‘baskets’, where each basket contains a list of ‘items’. Our goal is to discover ‘themes’, which are defined as subsets of items that tend to co‐occur in a basket. We describe a generative model, i.e. the theme dictionary model, for such data structures and describe two likelihood‐based methods to infer themes that are hidden in a collection of baskets. We also propose a novel sequential Monte Carlo method to overcome computational challenges. Using both simulation studies and real applications, we demonstrate that the new approach proposed is significantly more powerful than existing methods, such as association rule mining and topic modelling, in detecting weak and subtle interactions in the data.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

Metadata Only

Endorsement

Review

Supplemented By

Related Stories