Publication:

Assessing Corporate Growth and Bankruptcy Risk Using Public Data Proxies

Loading...
Thumbnail Image

Date

2025-06-24

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Tang, Eric. 2025. Assessing Corporate Growth and Bankruptcy Risk Using Public Data Proxies. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

This thesis applies natural language processing (NLP) to job listing data as a novel predictive and explanatory tool for evaluating bankruptcy risk and corporate growth. Unlike models based on traditional financial ratios, which are limited by the sparsity of data for small and private companies, job postings provide abundant, real-time information relevant for nearly all corporations and enhance published corporate evaluation methods.

In this report, we demonstrate that the textual context within job listing data offers a meaningful signal for both predictive and descriptive purposes. Our analysis is applied to a corpus of approximately 51.8 million job listings generated from 6764 unique corporations over the roughly decade-long period from 2010 to 2020.

For bankruptcy prediction, our research presents models with robust predictive performances (accuracy 0.8652; specificity 0.8653; sensitivity 0.8042; ROC-AUC 0.9162) that mirrors or exceeds the predictive capabilities of reference baseline models from literature. We then discuss the potential of topic models as both a predictive and descriptive tool for the overall economy as a whole; though our work indicates limited performance of the topics as a predictive feature. Instead, their descriptive potential lies with the ability to track desired employment and thematic distribution across fields such as remote work-- concentrated in tech and management--or entry-level positions--concentrated in retail or delivery. Finally, we also present models evaluating corporate growth, where our predictions reflect the theoretical economic behaviors of debt-based and cyclical investment in the corporate bond and commodity-driven sectors.

Description

Other Available Sources

Research Data

Keywords

Big Data, Chapter 11 Bankruptcy, Corporate Growth, Credit Risks, Natural Language Processing, Finance, Banking, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories