Publication:

R / Python Pipelines for Biomedical LLM Semantic Search Apps

Loading...
Thumbnail Image

Date

2025-03-08

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Thomas Charlon, Joseph Hoche, and Tianxi Cai. 2025. "R / Python Pipelines for Biomedical LLM Semantic Search Apps." Talk presented at SoCal Linux Expo 22x.

Abstract

Leveraging Pytorch's GPU indexing and R's data management, evaluation and visualization capabilities

At the CELEHS laboratory we are particularly interested by LLM-based embeddings as BGE and BERT. As the number of models increases, we need methods to compare their clinical usefulness. While some R packages exist to leverage GPU capabilities, Pytorch is by far more used for GPU computation. In contrast, R is efficient for data management and visualization. How should one build robust and reproducible pipelines incorporating them both ? My answer is well-designed pipelines with Docker, Makefile, and Elasticsearch. In this talk I will showcase my design approaches to such challenges.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories