Publication: Natural Language Search for NASA ADS
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The NASA Astrophysics Data System (ADS) is a critical resource for researchers and students in astronomy, astrophysics, and beyond. ADS indexes a vast collection of papers and scholarly literature that researchers can search through using the ADS website or API. ADS’s database is powered by Apache Solr, enabling users to formulate highly expressive and precise search queries from the more than 50 allowable search fields. However, the sophistication of ADS’s search capabilities comes at the cost of usability, necessitating users to familiarize themselves with Solr and ADS’s documentation to fully exploit its features. This thesis proposes a solution to enhance the accessibility of ADS by creating a chat application where users make requests for papers by asking for them in natural language rather than by constructing Solr queries. This application works by leveraging SOTA transformer-based large language models (LLMs) to translate natural language requests into Solr queries, thereby simplifying user interaction with the ADS database without compromising on the precision of search results. In this work, we use in-context learning (ICL) with retrieval augmented generation (RAG) in order to enhance the translation capabilities of the LLM, leading to significant improvement in translation performance.