Publication:
Towards Automated Healthcare: Deep Vision and Large Language Models for Radiology Report Generation

No Thumbnail Available

Date

2023-06-30

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Tian, Katherine. 2023. Towards Automated Healthcare: Deep Vision and Large Language Models for Radiology Report Generation. Bachelor's thesis, Harvard College.

Research Data

Abstract

Automatic radiology report generation has the potential to improve patient care and reduce diagnosis delays. Deep learning approaches have shown promising progress but are still not accurate enough for clinical deployment. In this thesis, we investigate and develop two approaches for report generation, one retrieval-based and one generation-based, both of which leverage deep vision-language pre-training. Our retrieval-based method uses a multimodal encoder and contrastive loss to learn pre-trained radiology image and text representations, followed by a learned image-text matching similarity metric for retrieval. This method achieves state-of-the-art results on clinical accuracy and natural language metrics including CheXpert vector disease profile similarity and BLEU2 score. We also conduct an expert evaluation study on a subset of samples, where we collect radiologists' error annotations on our generated reports, a baseline method's generated reports, and human-written reports. The study confirms that our method improves significantly upon the baseline, and we will release the dataset of error annotations to aid future research into types of generated report errors and alignment of evaluation metrics with human radiologists' assessment. For our generation-based method, we use a querying transformer module for modality alignment between an image encoder and a text decoder. We also investigate a novel prompting method to generate both impression and findings report sections with the same model to increase efficiency. The model is trained on a mixed report section dataset and can be prompted to generate both report sections with similar performance to separate single-section models. Finally, we study the impact of different pre-training methods for the querying transformer and find that unlocking the image encoder during pre-training helps with domain adaptation and clinical accuracy but not natural language metrics.

Description

Other Available Sources

Keywords

deep learning, generation, pre-training, radiology, report, vision-language, Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories