Person:

Alterovitz, Gil

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Alterovitz

First Name

Gil

Name

Alterovitz, Gil

Search Results

Now showing 1 - 10 of 11
  • Publication

    A Bayesian Translational Framework for Knowledge Propagation, Discovery, and Integration Under Specific Contexts

    (American Medical Informatics Association, 2012) Deng, Michelle; Zollanvari, Amin; Alterovitz, Gil

    The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts—rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.

  • Publication

    An Automated Bayesian Framework for Integrative Gene Expression Analysis and Predictive Medicine

    (American Medical Informatics Association, 2012) Parikh, Neena; Zollanvari, Amin; Alterovitz, Gil

    Motivation: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. Results: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive ability. The approach described in this paper can be applied to any complex disorder and can include any number and type of genome-scale studies.

  • Publication

    Context-Specific Ontology Integration: A Bayesian Approach

    (American Medical Informatics Association, 2012) Marwah, Kshitij; Katzin, Dustin; Zollanvari, Amin; Noy, Natalya F.; Ramoni, Marco; Alterovitz, Gil

    We introduce a principled computational framework and methodology for automated discovery of context-specific functional links between ontologies. Our model leverages over disparate free-text literature resources to score the model of dependency linking two terms under a context against their model of independence. We identify linked terms as those having a significant bayes factor (p < 0.01). To scale our algorithm over massive ontologies, we propose a heuristic pruning technique as an efficient algorithm for inferring such links. We have applied this method to translationalize Gene Ontology to all other ontologies available at National Center of Biomedical Ontology (NCBO) BioPortal under the context of Human Disease ontology. Our results show that in addition to broadening the scope of hypothesis for researchers, our work can potentially be used to explore continuum of relationships among ontologies to guide various biological experiments.

  • Publication

    Order-Disorder Interface Characterization Reveals Critical Factors for Disease and Drug Targets

    (American Medical Informatics Association, 2013) Kallenbach, Jonah; Hsu, Wei-Lun; Dunker, A. Keith; Alterovitz, Gil

    Signal transduction pathways are of critical importance in disease and regulation of cellular functions. Proteins that do not fold to a state of stable tertiary structure, known as intrinsically disordered proteins, are highly represented in signaling pathways and protein interaction networks. Important examples of disordered signaling proteins include p53 and BRCA1, and approximately 40% of Eukaryotic proteins are estimated to have significant disordered regions. Certain regions within these disordered proteins, however, can take on an ordered structure upon binding to a partner. The nature of the resulting protein-protein interactions has not yet been established. Here we categorize and identify interactions between binding segments of disordered proteins and their ordered partners using a Bayesian network framework, constructed on a test set of 964 proteins mined for Molecular Recognition Feature (MoRF) characteristics from the PDB. This framework, more specifically Bayesian network learning, enables us to investigate the underlying biological processes involved, including the sequential and structural determinants of these interactions. After the construction of the training set (80% of data), features were successively eliminated to determine relative significances. The Bayesian network model was validated on the test set with excellent accuracy(>90% AUC). Examining features underlying the model provides a plethora of new and potentially useful biological information. The results also lend themselves to a strategy for rational drug design whereby disordered regions can be targeted with a high degree of specificity and small molecule peptide mimetics of their binding regions can be utilized as drugs.

  • Publication

    Automated Synthesis and Visualization of a Chemotherapy Treatment Regimen Network

    (2013) Warner, Jeremy; Yang, Peter; Alterovitz, Gil

    Cytotoxic treatments for cancer remain highly toxic, expensive, and variably efficacious. Many chemotherapy regimens are never directly compared in randomized clinical trials (RCTs); as a result, the vast majority of guideline recommendations are ultimately derived from human expert opinion. We introduce an automated network meta-analytic approach to this clinical problem, with nodes representing regimens and edges direct comparison via RCT(s). A chemotherapy regimen network is visualized for the primary treatment of chronic myelogenous leukemia (CML). Node and edge color, size, and opacity are all utilized to provide additional information about the quality and strength of the depicted evidence. Historical versions of the network are also created. With this approach, we were able to compactly compare the results of 17 CML regimens involving RCTs of 9700 patients, representing the accumulation of 45 years of evidence. Our results closely parallel the recommendations issued by a professional guidelines organization, the National Comprehensive Cancer Network (NCCN). This approach offers a novel method for interpreting complex clinical data, with potential implications for future objective guideline development.

  • Publication

    Nonlinear dimensionality reduction methods for synthetic biology biobricks’ visualization

    (BioMed Central, 2017) Yang, Jiaoyun; Wang, Haipeng; Ding, Huitong; An, Ning; Alterovitz, Gil

    Background: Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps. Results: By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks. Conclusions: By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases’ quality, and make biobricks selection. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1484-4) contains supplementary material, which is available to authorized users.

  • Publication

    SNP by SNP by environment interaction network of alcoholism

    (BioMed Central, 2017) Zollanvari, Amin; Alterovitz, Gil

    Background: Alcoholism has a strong genetic component. Twin studies have demonstrated the heritability of a large proportion of phenotypic variance of alcoholism ranging from 50–80%. The search for genetic variants associated with this complex behavior has epitomized sequence-based studies for nearly a decade. The limited success of genome-wide association studies (GWAS), possibly precipitated by the polygenic nature of complex traits and behaviors, however, has demonstrated the need for novel, multivariate models capable of quantitatively capturing interactions between a host of genetic variants and their association with non-genetic factors. In this regard, capturing the network of SNP by SNP or SNP by environment interactions has recently gained much interest. Results: Here, we assessed 3,776 individuals to construct a network capable of detecting and quantifying the interactions within and between plausible genetic and environmental factors of alcoholism. In this regard, we propose the use of first-order dependence tree of maximum weight as a potential statistical learning technique to delineate the pattern of dependencies underpinning such a complex trait. Using a predictive based analysis, we further rank the genes, demographic factors, biological pathways, and the interactions represented by our SNP \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \times $$\end{document}×SNP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \times $$\end{document}×E network. The proposed framework is quite general and can be potentially applied to the study of other complex traits. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0403-7) contains supplementary material, which is available to authorized users.

  • Publication

    Robust Prediction-Based Analysis for Genome-Wide Association and Expression Studies

    (American Medical Informatics Association, 2013) K. Koppula, Skanda; Zollanvari, Amin; An, Ning; Alterovitz, Gil

    Here we describe a prediction-based framework to analyze omic data and generate models for both disease diagnosis and identification of cellular pathways which are significant in complex diseases. Our framework differs from previous analysis in its use of underlying biology (cellular pathways/gene-sets) to produce predictive feature-disease models. In our study of alcoholism, lung cancer, and schizophrenia, we demonstrate the framework’s ability to robustly analyze omic data of multiple types and sources, identify significant features sets, and produce accurate predictive models.

  • Publication

    Gene expression prediction using low-rank matrix completion

    (BioMed Central, 2016) Kapur, Arnav; Marwah, Kshitij; Alterovitz, Gil

    Background: An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. Results: We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. Conclusion: This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1106-6) contains supplementary material, which is available to authorized users.

  • Publication

    On the Bayesian Derivation of a Treatment-based Cancer Ontology

    (American Medical Informatics Association, 2014) Gao, Michael; Warner, Jeremy; Yang, Peter; Alterovitz, Gil

    Traditional cancer classifications are primarily based on anatomical locations. As knowledge is heavily compartmentalized in the oncological specialties, discovering new targets for existing drugs (drug inference) can take years. Furthermore, our lack of understanding of the mechanisms underlying drug efficacy sometimes undercuts the effectiveness of genetic approaches to drug inference. This study tackles the twin problems of cancer reclassification and drug inference by constructing a global cancer ontology inductively from treatment regimens. A topological abstraction algorithm was performed on the bipartite graph of drugs and cancers to highlight important edges, and a Bayesian algorithm was then applied to determine a new treatment-based classification of cancer, producing 6 highly significant clusters (p < 0.05), confirmed by Fisher’s exact test and enrichment analyses. Edge probabilities derived from its drug inference routine matched real edge frequencies (R2 ≈ 0.96). Drug inference results were reinforced by the identification of relevant published Phase II and III clinical trials, and the drug inference routine differentiated between high- and low-likelihood targets (p < 0.05). This novel treatment-based ontology has the potential to reorganize cancer research and provide powerful tools for drug inference using global patterns of drug efficacy.