Person:
Liu, Jun

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Liu

First Name

Jun

Name

Liu, Jun

Search Results

Now showing 1 - 10 of 113
  • Publication
    Sparse Sliced Inverse Regression via Lasso
    (Informa UK Limited, 2019-03-09) Lin, Qian; Zhao, Zhigen; Liu, Jun
    For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the su cient dimension reduction (SDR) space if and only if rho = lim p/n = 0, where p is the dimension and n is the sample size. Thus, when p is of the same or a higher order of n, additional assumptions such as sparsity must be imposed in order to ensure consistency for SIR. By constructing artificial response variables made up from top eigenvectors of the estimated conditional covariance matrix, we introduce a simple Lasso regression method to obtain an estimate of the SDR space. The resulting algorithm, Lasso-SIR, is shown to be consistent and achieve the optimal convergence rate under certain sparsity conditions when p is of order o(n^2 lambda^2), where lambda is the generalized signal-to-noise ratio. We also demonstrate the superior performance of Lasso-SIR compared with existing approaches via extensive numerical studies and several real data examples.
  • Thumbnail Image
    Publication
    Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases
    (Nature Publishing Group, 2015) Yang, Jialiang; Huang, Tao; Petralia, Francesca; Long, Quan; Zhang, Bin; Argmann, Carmen; Zhao, Yong; Mobbs, Charles V.; Schadt, Eric E.; Zhu, Jun; Tu, Zhidong; Ardlie, Kristin G.; Deluca, David S.; Segrè, Ayellet V.; Sullivan, Timothy J.; Young, Taylor R.; Gelfand, Ellen T.; Trowbridge, Casandra A.; Maller, Julian B.; Tukiainen, Taru; Lek, Monkol; Ward, Lucas D.; Kheradpour, Pouya; Iriarte, Benjamin; Meng, Yan; Palmer, Cameron D.; Winckler, Wendy; Hirschhorn, Joel; Kellis, Manolis; MacArthur, Daniel; Getz, Gad; Shablin, Andrey A.; Li, Gen; Zhou, Yi-Hui; Nobel, Andrew B.; Rusyn, Ivan; Wright, Fred A.; Lappalainen, Tuuli; Ferreira, Pedro G.; Ongen, Halit; Rivas, Manuel A.; Battle, Alexis; Mostafavi, Sara; Monlong, Jean; Sammeth, Michael; Mele, Marta; Reverter, Ferran; Goldman, Jakob; Koller, Daphne; Guigo, Roderic; McCarthy, Mark I.; Dermitzakis, Emmanouil T.; Gamazon, Eric R.; Konkashbaev, Anuar; Nicolae, Dan L.; Cox, Nancy J.; Flutre, Timothée; Wen, Xiaoquan; Stephens, Matthew; Pritchard, Jonathan K.; Lin, Luan; Liu, Jun; Brown, Amanda; Mestichelli, Bernadette; Tidwell, Denee; Lo, Edmund; Salvatore, Mike; Shad, Saboor; Thomas, Jeffrey A.; Lonsdale, John T.; Choi, Christopher; Karasik, Ellen; Ramsey, Kimberly; Moser, Michael T.; Foster, Barbara A.; Gillard, Bryan M.; Syron, John; Fleming, Johnelle; Magazine, Harold; Hasz, Rick; Walters, Gary D.; Bridge, Jason P.; Miklos, Mark; Sullivan, Susan; Barker, Laura K.; Traino, Heather; Mosavel, Magboeba; Siminoff, Laura A.; Valley, Dana R.; Rohrer, Daniel C.; Jewel, Scott; Branton, Philip; Sobin, Leslie H.; Qi, Liqun; Hariharan, Pushpa; Wu, Shenpei; Tabor, David; Shive, Charles; Smith, Anna M.; Buia, Stephen A.; Undale, Anita H.; Robinson, Karna L.; Roche, Nancy; Valentino, Kimberly M.; Britton, Angela; Burges, Robin; Bradbury, Debra; Hambright, Kenneth W.; Seleski, John; Korzeniewski, Greg E.; Erickson, Kenyon; Marcus, Yvonne; Tejada, Jorge; Taherian, Mehran; Lu, Chunrong; Robles, Barnaby E.; Basile, Margaret; Mash, Deborah C.; Volpi, Simona; Struewing, Jeff; Temple, Gary F.; Boyer, Joy; Colantuoni, Deborah; Little, Roger; Koester, Susan; Carithers, NCI Latarsha J.; Moore, Helen M.; Guan, Ping; Compton, Carolyn; Sawyer, Sherilyn J.; Demchok, Joanne P.; Vaught, Jimmie B.; Rabiner, Chana A.; Lockhart, Nicole C.
    Aging is one of the most important biological processes and is a known risk factor for many age-related diseases in human. Studying age-related transcriptomic changes in tissues across the whole body can provide valuable information for a holistic understanding of this fundamental process. In this work, we catalogue age-related gene expression changes in nine tissues from nearly two hundred individuals collected by the Genotype-Tissue Expression (GTEx) project. In general, we find the aging gene expression signatures are very tissue specific. However, enrichment for some well-known aging components such as mitochondria biology is observed in many tissues. Different levels of cross-tissue synchronization of age-related gene expression changes are observed, and some essential tissues (e.g., heart and lung) show much stronger “co-aging” than other tissues based on a principal component analysis. The aging gene signatures and complex disease genes show a complex overlapping pattern and only in some cases, we see that they are significantly overlapped in the tissues affected by the corresponding diseases. In summary, our analyses provide novel insights to the co-regulation of age-related gene expression in multiple tissues; it also presents a tissue-specific view of the link between aging and age-related diseases.
  • Thumbnail Image
    Publication
    Network analysis of gene essentiality in functional genomics experiments
    (BioMed Central, 2015) Jiang, Peng; Wang, Hongfang; Li, Wei; Zang, Chongzhi; Li, Bo; Wong, Yinling J.; Meyer, Cliff; Liu, Jun; Aster, Jon; Liu, X. Shirley
    Many genomic techniques have been developed to study gene essentiality genome-wide, such as CRISPR and shRNA screens. Our analyses of public CRISPR screens suggest protein interaction networks, when integrated with gene expression or histone marks, are highly predictive of gene essentiality. Meanwhile, the quality of CRISPR and shRNA screen results can be significantly enhanced through network neighbor information. We also found network neighbor information to be very informative on prioritizing ChIP-seq target genes and survival indicator genes from tumor profiling. Thus, our study provides a general method for gene essentiality analysis in functional genomic experiments (http://nest.dfci.harvard.edu). Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0808-9) contains supplementary material, which is available to authorized users.
  • Thumbnail Image
    Publication
    Landscape of tumor-infiltrating T cell repertoire of human cancers
    (2016) Li, Bo; Li, Taiwen; Pignon, Jean-Christophe; Wang, Binbin; Wang, Jinzeng; Shukla, Sachet; Dou, Ruoxu; Chen, Qianming; Hodi, F. Stephen; Choueiri, Toni K.; Wu, Catherine; Hacohen, Nir; Signoretti, Sabina; Liu, Jun; Liu, X. Shirley
    We developed a computational method to infer the complementarity determining region 3 (CDR3) sequences of tumor infiltrating T-cells in 9,142 RNA-seq samples across 29 cancer types. We identified over 600 thousand CDR3 sequences, including 15% with full-length. CDR3 sequence length distribution and amino acid conservation, as well as variable gene usage of infiltrating T-cells in many tumors, except brain and kidney cancers, resembled those in the peripheral blood of healthy donors. We observed a strong association between T-cell diversity and tumor mutation load, and predicted SPAG5 and TSSK6 as putative immunogenic cancer/testis antigens in multiple cancers. Finally, we identified 3 potential immunogenic somatic mutations based on their co-occurrence with CDR3 sequences. One of them, PRAMEF4 F300V, was predicted to bind strongly to both MHC-I and MHC-II, with matched HLA types in its carriers. Our analyses have the potential to simultaneously identify immunogenic neoantigens and the tumor-reactive T-cell clonotypes.
  • Thumbnail Image
    Publication
    L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs
    (2016) Nevkov, Matey; Liu, Jun; Cai, Tianxi
    It is known that for a certain class of single index models (SIMs) z[]c0, support recovery is impossible when X ~ [N](0, []p×p) and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design X comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with L1 penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on f and [e] compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of β0 if X ~ [N] (0, [Sigma]), and the covariance [Sigma] satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.
  • Thumbnail Image
    Publication
    On consistency and sparsity for sliced inverse regression in high dimensions
    (Institute of Mathematical Statistics, 2018) Lin, Qian; Zhao, Zhigen; Liu, Jun
    We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by Li [J. Amer. Statist. Assoc. 86 (1991) 316–342]. Under mild conditions, the asymptotic ratio ρ = limp/n is the phase transition parameter and the SIR estimator is consistent if and only if ρ = 0. When dimension p is greater than n, we propose a diagonal thresholding screening SIR (DT-SIR) algorithm. This method provides us with an estimate of the eigenspace of var(E[x|y]), the covariance matrix of the conditional expectation. The desired dimension reduction space is then obtained by multiplying the inverse of the covariance matrix on the eigenspace. Under certain sparsity assumptions on both the covariance matrix of predictors and the loadings of the directions, we prove the consistency of DT-SIR in estimating the dimension reduction space in high-dimensional data analysis. Extensive numerical experiments demonstrate superior performances of the proposed method in comparison to its competitors.
  • Thumbnail Image
    Publication
    Robust Variable and Interaction Selection for Logistic Regression and General Index Models
    (Informa UK Limited, 2017) Li, Yang; Liu, Jun
    Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms so as to optimize the extended Bayesian Information Criterion (EBIC). Compared with existing methods for quadratic discriminant analysis variable selection, SODA can deal with high-dimensional data with the number of predictors much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the Sliced Inverse Regression (SIR) (Li, 1991), SODA requires neither linearity nor constant variance condition and is much more robust. Our theoretical establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models.
  • Thumbnail Image
    Publication
    Generalized R-squared for detecting dependence
    (Oxford University Press (OUP), 2017) Wang, X; Jiang, B; Liu, Jun
    Detecting dependence between two random variables is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical to the square of the Pearson correlation coefficient, R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared between the variables. It is particularly effective in handling nonlinearity and heteroscedastic errors. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.
  • Thumbnail Image
    Publication
    Fast parameter estimation in loss tomography for networks of general topology
    (Institute of Mathematical Statistics, 2016) Deng, Ke; Li, Yang; Zhu, Weiping; Liu, Jun
    As a technique to investigate link-level loss rates of a computer network with low operational cost, loss tomography has received considerable attentions in recent years. A number of parameter estimation methods have been proposed for loss tomography of networks with a tree structure as well as a general topological structure. However, these methods suffer from either high computational cost or insufficient use of information in the data. In this paper, we provide both theoretical results and practical algorithms for parameter estimation in loss tomography. By introducing a group of novel statistics and alternative parameter systems, we find that the likelihood function of the observed data from loss tomography keeps exactly the same mathematical formulation for tree and general topologies, revealing that networks with different topologies share the same mathematical nature for loss tomography. More importantly, we discover that a reparametrization of the likelihood function belongs to the standard exponential family, which is convex and has a unique mode under regularity conditions. Based on these theoretical results, novel algorithms to find the MLE are developed. Compared to existing methods in the literature, the proposed methods enjoy great computational advantages.
  • Thumbnail Image
    Publication
    CLIC, a tool for expanding biological pathways based on co-expression across thousands of datasets
    (Public Library of Science, 2017) Li, Yang; Jourdain, Alexis A.; Calvo, Sarah; Liu, Jun; Mootha, Vamsi
    In recent years, there has been a huge rise in the number of publicly available transcriptional profiling datasets. These massive compendia comprise billions of measurements and provide a special opportunity to predict the function of unstudied genes based on co-expression to well-studied pathways. Such analyses can be very challenging, however, since biological pathways are modular and may exhibit co-expression only in specific contexts. To overcome these challenges we introduce CLIC, CLustering by Inferred Co-expression. CLIC accepts as input a pathway consisting of two or more genes. It then uses a Bayesian partition model to simultaneously partition the input gene set into coherent co-expressed modules (CEMs), while assigning the posterior probability for each dataset in support of each CEM. CLIC then expands each CEM by scanning the transcriptome for additional co-expressed genes, quantified by an integrated log-likelihood ratio (LLR) score weighted for each dataset. As a byproduct, CLIC automatically learns the conditions (datasets) within which a CEM is operative. We implemented CLIC using a compendium of 1774 mouse microarray datasets (28628 microarrays) or 1887 human microarray datasets (45158 microarrays). CLIC analysis reveals that of 910 canonical biological pathways, 30% consist of strongly co-expressed gene modules for which new members are predicted. For example, CLIC predicts a functional connection between protein C7orf55 (FMC1) and the mitochondrial ATP synthase complex that we have experimentally validated. CLIC is freely available at www.gene-clic.org. We anticipate that CLIC will be valuable both for revealing new components of biological pathways as well as the conditions in which they are active.