Publication: PyBio: An Open Source Bioinformatics Library for Python.
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
PyBio is an easy-to-install, open-source library for working with bioinformatics data in Python, designed to encourage interactive data exploration and scripting.
The Pybio API is designed to be explorable in an IPython session or in the Jupyter Notebook. The modules are laid out with hierarchical structure. All interface classes, functions, and modules are documented. Function hiding and module interface declarations are used to present users with a simple, powerful interface.
PyBio's modules are integrated with each other to encourage smooth workflows and simple, direct code. The same class abstractions are used throughout the library. The output of one function can often be used as the input for another. Opinionated, higher-level APIs abstract away complexity, lower the barrier to entry, and encourage expressive, functional code.
Bioinformatics data from NCBI's Entrez service is available from within Python through the pybio.entrez module. The module parses sequences directly from Entrez into PyBio's sequence representation, so acquiring and using new sequence data is fast and seamless.
PyBio is built for speed; Numpy and Cython are used extensively to achieve fast execution speeds not normally associated with Python, while maintaining Python's simplicity and clarity. Execution speed is critical for productive interactive data science. Performant tools written mostly in Python make bioinformatics code more accessible to less advanced programmers, fostering a closer connection between developers and biologists.
PyBio is a tool for bioinformatics developers. Classes like Sequence and Alignment, and modules like pybio.parse, provide a shared environment for new code, reducing the need to write parsers and data abstraction classes, and making it easier to write code that interoperates with existing code. PyBio could potentially provide a platform for new useful bioinformatics algorithms and implementations to become quickly available to the community.
PyBio has the potential to become a powerful tool for bioinformatics, encouraging a data science approach to bioinformatics data and stimulating innovation in the implementations of bioinformatics algorithms.