Publication: Towards Automated Healthcare: Deep Vision and Large Language Models for Radiology Report Generation
No Thumbnail Available
Open/View Files
Date
2023-06-30
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Tian, Katherine. 2023. Towards Automated Healthcare: Deep Vision and Large Language Models for Radiology Report Generation. Bachelor's thesis, Harvard College.
Research Data
Abstract
Automatic radiology report generation has the potential to improve patient care and reduce diagnosis delays. Deep learning approaches have shown promising progress but are still not accurate enough for clinical deployment. In this thesis, we investigate and develop two approaches for report generation, one retrieval-based and one generation-based, both of which leverage deep vision-language pre-training.
Our retrieval-based method uses a multimodal encoder and contrastive loss to learn pre-trained radiology image and text representations, followed by a learned image-text matching similarity metric for retrieval. This method achieves state-of-the-art results on clinical accuracy and natural language metrics including CheXpert vector disease profile similarity and BLEU2 score. We also conduct an expert evaluation study on a subset of samples, where we collect radiologists' error annotations on our generated reports, a baseline method's generated reports, and human-written reports. The study confirms that our method improves significantly upon the baseline, and we will release the dataset of error annotations to aid future research into types of generated report errors and alignment of evaluation metrics with human radiologists' assessment.
For our generation-based method, we use a querying transformer module for modality alignment between an image encoder and a text decoder. We also investigate a novel prompting method to generate both impression and findings report sections with the same model to increase efficiency. The model is trained on a mixed report section dataset and can be prompted to generate both report sections with similar performance to separate single-section models. Finally, we study the impact of different pre-training methods for the querying transformer and find that unlocking the image encoder during pre-training helps with domain adaptation and clinical accuracy but not natural language metrics.
Description
Other Available Sources
Keywords
deep learning, generation, pre-training, radiology, report, vision-language, Computer science, Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service