Publication:

Modeling Astronomical Data using Deep Learning by Integrating Embeddings

Loading...
Thumbnail Image

Date

2025-07-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Chiong, Gabriel Zhen. 2025. Modeling Astronomical Data using Deep Learning by Integrating Embeddings. Masters Thesis, Harvard University Division of Continuing Education.

Research Data

Abstract

Astronomical surveys produce time-series data by observing stellar objects across multiple wavelength bands. Foundational transformer-based models, such as Astromer, encode each time-series as a sequence of embeddings of uniform dimensions. However, such models operate independently on each band at a single time and do not natively leverage information across telescope filters. We extend this framework by introducing a fusion mechanism that maps the collection of single-band embeddings to a unified sequence representation, enabling multiband modeling for downstream tasks. The challenge lies in devising a mechanism within the encoder to coordinate between data from different wavelengths, which are often recorded at asynchronous times. We pre-train multiband models on a subset of 600 000 high signal-to-noise light curves from the MACHO survey and fine-tune them using the Alcock and ATLAS survey datasets. Experimental results show that both our proposed multiband architectures outperform the single-band models by approximately 10% in F1-score, with jointly pre-trained multiband encoders further improving performance over a collection of independently pre-trained single-band encoders. Furthermore, our experiments show that there are minimal differences in multiband performance when sampling individual band data asynchronously versus sampling all individual bands on the same set of time-steps. However, jointly pre-trained models can take more than twice the time to pre-train. These results demonstrate the trade-offs of the multiband approach where multivariate data are available.

Description

Other Available Sources

Keywords

Deep Learning, Embeddings, Foundational Models, Multivariate Data, Representation Learning, Transformer, Computer science, Artificial intelligence, Astrophysics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories