Publication: Spectromer: a Visual Transformer-Based Model for Spectral Data
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
We present Spectromer, a novel framework leveraging Vision Transformers (ViTs), a class of deep learning models originally developed for image recognition, for the analysis of astronomical spectral data. By converting traditional one-dimensional spectral data, where each spectrum is represented as a sequence of intensity values over wavelength, into two-dimensional image-like representations, Spectromer enables Vision Transformers to leverage their spatial self-attention mechanism for capturing both local and global spectral features.
We fine-tune a base model, pretrained on ImageNet, using images constructed from SDSS and LAMOST spectral data, which together encompass several million spectra from diverse astronomical objects. These images are generated by transforming one-dimensional spectral data into two-dimensional image representations suitable for vision transformer architectures. We then validate this model on key downstream tasks including stellar object classification and redshift estimation, demonstrating strong performance and scalability. Spectromer has provided either comparable or better results depending on downstream tasks with other models, showing similar R^2 to AstroCLIP’s spectrum encoder even when including data from different astronomical objects as well as showing higher classification accuracy versus solutions based on Support Vector Machine and Random Forests. Our results highlight Spectromer’s potential to advance spectral analysis by leveraging pretrained vision models to enable precise interpretation of large-scale astronomical datasets beyond their original design. To our knowledge, this is the first application of ViTs to spectroscopic data and among the first to demonstrate results on a large-scale, real observational dataset without relying on synthetic data.