Publication:

Considerations for a Machine Learning Approach to Classification of Cancer Driver Mutations

Loading...
Thumbnail Image

Date

2022-04-20

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Smith, Daniel. 2022. Considerations for a Machine Learning Approach to Classification of Cancer Driver Mutations. Master's thesis, Harvard University Division of Continuing Education.

Abstract

Cancer is one of the leading causes of death for people worldwide. Since the completion of the Human Genome Project, Next-Generation Sequencing has made leaps in understanding of the cancer genome possible. Such a deep understanding has allowed researchers to develop novel targeted therapy options and improve survival rates. As the amount of complex genomic data increases, powerful tools are necessary to discern underlying genomic drivers and therapeutic targets in a patient’s cancer. Machine learning has been an asset in the discovery of new relationships in cancer genomes and is explored in this research. Using publicly available genomic data from several databases, machine learning models were designed and implemented to classify variants as pathogenic or benign in APC, RB1, TP53, EGFR, ERBB2, and PIK3CA genes, all previously implicated in various cancers. The output of the classification experiments demonstrates the utility of random forest and extremely randomized trees classifiers and highlights the value of several key data features across these datasets. In addition, the implementations offer guidelines for future researchers by emphasizing reproducibility and generalizability of similar models. Through this framework, future machine learning research may be faster to implement using real-world data. By leveraging the power of machine learning, scientists can continue to expand the cancer genomics knowledgebase and take steps toward improved outcomes for patients.

Description

Other Available Sources

Research Data

Keywords

artificial intelligence, bioinformatics, cancer, machine learning, modeling, mutation, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories