Publication:
Bayesian Text Classification and Summarization via a Class-Specified Topic Model

No Thumbnail Available

Date

2021

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

JMLR
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Wang, Feifei, Junni L. Zhang, Yichao Li, Ke Deng, and Jun S. Liu. "Bayesian Text Classification and Summarization via a Class-Specified Topic Model." Journal of Machine Learning Research 22 (89):1−48, 2021.

Research Data

Abstract

We propose the Class Specified Topic Model (CSTM) to deal with the tasks of text classifi cation and class-specifi c text summarization. The model assumes that, besides a set of latent topics that are shared across classes, for each class there is a set of class-speci c latent topics. Each document is a probabilistic mixture of the class-specifi c topics associated with its class and the shared topics. Each class-specifi c or shared topic has its own probability distribution over a given dictionary. We develop Bayesian inference of CSTM in the semi-supervised scenario, with the supervised scenario as a special case. We analyze in detail the 20 Newsgroup dataset, a benchmark dataset for text classifi cation, and demonstrate that CSTM has better performance than a two-stage approach based on latent Dirichlet allocation (LDA), several existing supervised extensions of LDA, and a L1 penalized logistic regression. The nice performance of the CSTM is also demonstrated through Monte Carlo simulations and an analysis of the Reuters dataset

Description

Other Available Sources

Keywords

Terms of Use

Endorsement

Review

Supplemented By

Referenced By

Related Stories