Publication:

Cross-Language News Article Clustering

Loading...
Thumbnail Image

Date

2017-09-27

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Abstract

This thesis describes a method of delivering topically-clustered English and Chinese news articles for monolingual readers and provides a fully-implemented application. In today’s highly-polarized political climate, we are inundated with a diversity of opinions in television and online news media markets. Yet there are some topics, particularly those pertaining to foreign policy, in which a nation’s news media exhibits bias by nature of who’s reporting the news and to whom it’s being reported. One potential way for the media’s audience to counteract bias is by comparing and contrasting news articles about the same topic written in different languages and different countries. Such comparisons can expose unique perspectives by nature of their origin. The application developed for this thesis allows one to quickly identify articles about the same topic in different languages. It does this by clustering news articles by topic and presenting them in groups. For monolingual readers, the application integrates with Google Translate to provide a translated version of the source text. In order to provide these services, the application scrapes Chinese and English news articles from the web, extracts their relevant features, translates these features into a common human language, uses machine-learning techniques to reduce the dimensionality of the features, and stores those features for on-demand clustering and similar article retrieval. This thesis and similar projects have many possible applications, from providing the casual bilingual reader the chance to explore news coverage from different viewpoints, to use by researchers in both the US and China in better understanding the media and how it shapes public opinion. Both the application and its relevant source code are accessible on the author’s website.

Description

Other Available Sources

Research Data

Keywords

Computer Science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories