Publication:

PyBio: An Open Source Bioinformatics Library for Python.

Loading...
Thumbnail Image

Date

2016-06-25

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ellis, Jon R. 2016. PyBio: An Open Source Bioinformatics Library for Python.. Master's thesis, Harvard Extension School.

Abstract

PyBio is an easy-to-install, open-source library for working with bioinformatics data in Python, designed to encourage interactive data exploration and scripting.

The Pybio API is designed to be explorable in an IPython session or in the Jupyter Notebook. The modules are laid out with hierarchical structure. All interface classes, functions, and modules are documented. Function hiding and module interface declarations are used to present users with a simple, powerful interface.

PyBio's modules are integrated with each other to encourage smooth workflows and simple, direct code. The same class abstractions are used throughout the library. The output of one function can often be used as the input for another. Opinionated, higher-level APIs abstract away complexity, lower the barrier to entry, and encourage expressive, functional code.

Bioinformatics data from NCBI's Entrez service is available from within Python through the pybio.entrez module. The module parses sequences directly from Entrez into PyBio's sequence representation, so acquiring and using new sequence data is fast and seamless.

PyBio is built for speed; Numpy and Cython are used extensively to achieve fast execution speeds not normally associated with Python, while maintaining Python's simplicity and clarity. Execution speed is critical for productive interactive data science. Performant tools written mostly in Python make bioinformatics code more accessible to less advanced programmers, fostering a closer connection between developers and biologists.

PyBio is a tool for bioinformatics developers. Classes like Sequence and Alignment, and modules like pybio.parse, provide a shared environment for new code, reducing the need to write parsers and data abstraction classes, and making it easier to write code that interoperates with existing code. PyBio could potentially provide a platform for new useful bioinformatics algorithms and implementations to become quickly available to the community.

PyBio has the potential to become a powerful tool for bioinformatics, encouraging a data science approach to bioinformatics data and stimulating innovation in the implementations of bioinformatics algorithms.

Description

Other Available Sources

Research Data

Keywords

Biology, Bioinformatics, Computer Science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories