Person:

Li, Cheng

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Li

First Name

Cheng

Name

Li, Cheng

Search Results

Now showing 1 - 10 of 14
  • Publication

    Comparative linkage analysis and visualization of high-density oligonucleotide SNP array data

    (BioMed Central, 2005) Leykin, Igor; Hao, Ke; Cheng, Junsheng; Meyer, Nicole; Pollak, Martin; Smith, Richard JH; Wong, Wing Hung; Rosenow, Carsten; Li, Cheng

    Background: The identification of disease-associated genes using single nucleotide polymorphisms (SNPs) has been increasingly reported. In particular, the Affymetrix Mapping 10 K SNP microarray platform uses one PCR primer to amplify the DNA samples and determine the genotype of more than 10,000 SNPs in the human genome. This provides the opportunity for large scale, rapid and cost-effective genotyping assays for linkage analysis. However, the analysis of such datasets is nontrivial because of the large number of markers, and visualizing the linkage scores in the context of genome maps remains less automated using the current linkage analysis software packages. For example, the haplotyping results are commonly represented in the text format. Results: Here we report the development of a novel software tool called CompareLinkage for automated formatting of the Affymetrix Mapping 10 K genotype data into the "Linkage" format and the subsequent analysis with multi-point linkage software programs such as Merlin and Allegro. The new software has the ability to visualize the results for all these programs in dChip in the context of genome annotations and cytoband information. In addition we implemented a variant of the Lander-Green algorithm in the dChipLinkage module of dChip software (V1.3) to perform parametric linkage analysis and haplotyping of SNP array data. These functions are integrated with the existing modules of dChip to visualize SNP genotype data together with LOD score curves. We have analyzed three families with recessive and dominant diseases using the new software programs and the comparison results are presented and discussed. Conclusions: The CompareLinkage and dChipLinkage software packages are freely available. They provide the visualization tools for high-density oligonucleotide SNP array data, as well as the automated functions for formatting SNP array data for the linkage analysis programs Merlin and Allegro and calling these programs for linkage analysis. The results can be visualized in dChip in the context of genes and cytobands. In addition, a variant of the Lander-Green algorithm is provided that allows parametric linkage analysis and haplotyping.

  • Publication

    Inferring Loss-of-Heterozygosity from Unpaired Tumors Using High-Density Oligonucleotide SNP Arrays

    (Public Library of Science, 2006) Park, Yuhyun; Hao, Ke; Zhao, Xiaojun; Mellinghoff, Ingo K; Hofer, Matthias D; Descazeaud, Aurelien; Rubin, Mark A; Sellers, William R; Bourne, Philip; Beroukhim, Rameen; Lin, Ming; Garraway, Levi; Fox, Edward Alvin; Hochberg, Ephraim; Meyerson, Matthew; Wong, Wing H; Li, Cheng

    Loss of heterozygosity (LOH) of chromosomal regions bearing tumor suppressors is a key event in the evolution of epithelial and mesenchymal tumors. Identification of these regions usually relies on genotyping tumor and counterpart normal DNA and noting regions where heterozygous alleles in the normal DNA become homozygous in the tumor. However, paired normal samples for tumors and cell lines are often not available. With the advent of oligonucleotide arrays that simultaneously assay thousands of single-nucleotide polymorphism (SNP) markers, genotyping can now be done at high enough resolution to allow identification of LOH events by the absence of heterozygous loci, without comparison to normal controls. Here we describe a hidden Markov model-based method to identify LOH from unpaired tumor samples, taking into account SNP intermarker distances, SNP-specific heterozygosity rates, and the haplotype structure of the human genome. When we applied the method to data genotyped on 100 K arrays, we correctly identified 99% of SNP markers as either retention or loss. We also correctly identified 81% of the regions of LOH, including 98% of regions greater than 3 megabases. By integrating copy number analysis into the method, we were able to distinguish LOH from allelic imbalance. Application of this method to data from a set of prostate samples without paired normals identified known regions of prevalent LOH. We have developed a method for analyzing high-density oligonucleotide SNP array data to accurately identify of regions of LOH and retention in tumors without the need for paired normal samples.

  • Publication

    Allele-Specific Amplification in Cancer Revealed by SNP Array Analysis

    (Public Library of Science, 2005) LaFramboise, Thomas; Weir, Barbara Ann; Zhao, Xiaojun; Beroukhim, Rameen; Li, Cheng; Harrington, David; Sellers, William R; Meyerson, Matthew

    Amplification, deletion, and loss of heterozygosity of genomic DNA are hallmarks of cancer. In recent years a variety of studies have emerged measuring total chromosomal copy number at increasingly high resolution. Similarly, loss-of-heterozygosity events have been finely mapped using high-throughput genotyping technologies. We have developed a probe-level allele-specific quantitation procedure that extracts both copy number and allelotype information from single nucleotide polymorphism (SNP) array data to arrive at allele-specific copy number across the genome. Our approach applies an expectation-maximization algorithm to a model derived from a novel classification of SNP array probes. This method is the first to our knowledge that is able to (a) determine the generalized genotype of aberrant samples at each SNP site (e.g., CCCCT at an amplified site), and (b) infer the copy number of each parental chromosome across the genome. With this method, we are able to determine not just where amplifications and deletions occur, but also the haplotype of the region being amplified or deleted. The merit of our model and general approach is demonstrated by very precise genotyping of normal samples, and our allele-specific copy number inferences are validated using PCR experiments. Applying our method to a collection of lung cancer samples, we are able to conclude that amplification is essentially monoallelic, as would be expected under the mechanisms currently believed responsible for gene amplification. This suggests that a specific parental chromosome may be targeted for amplification, whether because of germ line or somatic variation. An R software package containing the methods described in this paper is freely available at http://genome.dfci.harvard.edu/~tlaframb/PLASQ.

  • Publication

    Computational inference of mRNA stability from histone modification and transcriptome profiles

    (Oxford University Press, 2012) Wang, Chengyang; Tian, Rui; Zhao, Qian; Xu, Han; Meyer, Clifford; Li, Cheng; Zhang, Yong; Liu, Xiaole

    Histone modifications play important roles in regulating eukaryotic gene expression and have been used to model expression levels. Here, we present a regression model to systematically infer mRNA stability by comparing transcriptome profiles with ChIP-seq of H3K4me3, H3K27me3 and H3K36me3. The results from multiple human and mouse cell lines show that the inferred unstable mRNAs have significantly longer 3′Untranslated Regions (UTRs) and more microRNA binding sites within 3′UTR than the inferred stable mRNAs. Regression residuals derived from RNA-seq, but not from GRO-seq, are highly correlated with the half-lives measured by pulse-labeling experiments, supporting the rationale of our inference. Whereas, the functions enriched in the inferred stable and unstable mRNAs are consistent with those from pulse-labeling experiments, we found the unstable mRNAs have higher cell-type specificity under functional constraint. We conclude that the systematical use of histone modifications can differentiate non-expressed mRNAs from unstable mRNAs, and distinguish stable mRNAs from highly expressed ones. In summary, we represent the first computational model of mRNA stability inference that compares transcriptome and epigenome profiles, and provides an alternative strategy for directing experimental measurements.

  • Publication

    The Polyoma Virus Large T Binding Protein p150 Is a Transcriptional Repressor of c-MYC

    (Public Library of Science, 2012) Sung, Chang Kyoo; Yim, Hyungshin; Gu, Hongcang; Li, Dawei; Andrews, Erik; Duraisamy, Sekhar; Li, Cheng; Drapkin, Ronny; Benjamin, Thomas

    p150, product of the SALL2 gene, is a binding partner of the polyoma virus large T antigen and a putative tumor suppressor. p150 binds to the nuclease hypersensitive element of the c-MYC promoter and represses c-MYC transcription. Overexpression of p150 in human ovarian surface epithelial cells leads to decreased expression, and downregulation to increased expression, of c-MYC. c-MYC is repressed upon restoration of p150 to ovarian carcinoma cells. Induction of apoptosis by etoposide results in recruitment of p150 to the c-MYC promoter and to repression of c-MYC. Analysis of data in The Cancer Genome Atlas shows negative correlations between SALL2 and c-MYC expression in four common solid tumor types.

  • Publication

    Major Copy Proportion Analysis of Tumor Samples Using SNP Arrays

    (BioMed Central, 2008) Li, Cheng; Beroukhim, Rameen; Weir, Barbara Ann; Winckler, Wendy; Garraway, Levi; Sellers, William R; Meyerson, Matthew

    Background: Single nucleotide polymorphisms (SNPs) are the most common genetic variations in the human genome and are useful as genomic markers. Oligonucleotide SNP microarrays have been developed for high-throughput genotyping of up to 900,000 human SNPs and have been used widely in linkage and cancer genomics studies. We have previously used Hidden Markov Models (HMM) to analyze SNP array data for inferring copy numbers and loss-of-heterozygosity (LOH) from paired normal and tumor samples and unpaired tumor samples. Results: We proposed and implemented major copy proportion (MCP) analysis of oligonucleotide SNP array data. A HMM was constructed to infer unobserved MCP states from observed allele-specific signals through emission and transition distributions. We used 10 K, 100 K and 250 K SNP array datasets to compare MCP analysis with LOH and copy number analysis, and showed that MCP performs better than LOH analysis for allelic-imbalanced chromosome regions and normal contaminated samples. The major and minor copy alleles can also be inferred from allelic-imbalanced regions by MCP analysis. Conclusion: MCP extends tumor LOH analysis to allelic imbalance analysis and supplies complementary information to total copy numbers. MCP analysis of mixing normal and tumor samples suggests the utility of MCP analysis of normal-contaminated tumor samples. The described analysis and visualization methods are readily available in the user-friendly dChip software.

  • Publication

    Integrative analysis of gene and miRNA expression profiles with transcription factor–miRNA feed-forward loops identifies regulators in human cancers

    (Oxford University Press, 2012) Yan, Zhenyu; Shah, Parantu K.; Amin, Samir B.; Samur, Mehmet K.; Huang, Norman; Wang, Xujun; Misra, Vikas; Ji, Hongbin; Gabuzda, Dana; Li, Cheng

    We describe here a novel method for integrating gene and miRNA expression profiles in cancer using feed-forward loops (FFLs) consisting of transcription factors (TFs), miRNAs and their common target genes. The dChip-GemiNI (Gene and miRNA Network-based Integration) method statistically ranks computationally predicted FFLs by their explanatory power to account for differential gene and miRNA expression between two biological conditions such as normal and cancer. GemiNI integrates not only gene and miRNA expression data but also computationally derived information about TF–target gene and miRNA–mRNA interactions. Literature validation shows that the integrated modeling of expression data and FFLs better identifies cancer-related TFs and miRNAs compared to existing approaches. We have utilized GemiNI for analyzing six data sets of solid cancers (liver, kidney, prostate, lung and germ cell) and found that top-ranked FFLs account for ∼20% of transcriptome changes between normal and cancer. We have identified common FFL regulators across multiple cancer types, such as known FFLs consisting of MYC and miR-15/miR-17 families, and novel FFLs consisting of ARNT, CREB1 and their miRNA partners. The results and analysis web server are available at http://www.canevolve.org/dChip-GemiNi.

  • Publication

    Global Gene Expression Profiling in Whole-Blood Samples From Individuals Exposed to Metal Fumes

    (National Institute of Environmental Health Sciences, 2005) Wang, Zhaoxi; Neuburg, Donna; Li, Cheng; Su, Li; Kim, Jee Young; Chen, Jiu Chiuan; Christiani, David

    Accumulating evidence demonstrates that particulate air pollutants can cause both pulmonary and airway inflammation. However, few data show that particulates can induce systemic inflammatory responses. We conducted an exploratory study using microarray techniques to analyze whole-blood total RNA in boilermakers before and after occupational exposure to metal fumes. A self-controlled study design was used to overcome the problems of larger between-individual variation interferences with observations of relatively smaller changes caused by environmental exposure. Moreover, we incorporated the dichotomous data of absolute gene expression status in the microarray analyses. Compared with nonexposed controls, we observed that genes with altered expression in response to particulate exposure were clustered in biologic processes related to inflammatory response, oxidative stress, intracellular signal transduction, cell cycle, and programmed cell death. In particular, the preinflammatory cytokine interleukin 8 and one of its receptors, chemokine receptor 4, seemed to play important roles in early-stage response to heavy metal exposure and were down-regulated. Furthermore, most observed expression variations were from nonsmoking exposed individuals, suggesting that smoking profoundly affects whole-blood expression profiles. Our study is the first to demonstrate that with a paired sampling study design of pre- and postexposed individuals, small changes in gene expression profiling can be measured in whole-blood total RNA from a population-based study. This technique can be applied to evaluate the host response to other forms of environmental exposures.

  • Publication

    A Plasma Biomarker Signature of Immune Activation in HIV Patients on Antiretroviral Therapy

    (Public Library of Science, 2012) Kamat, Anupa U; Misra, Vikas; Cassol, Edana; Ancuta, Petronela; Yan, Zhenyu; Li, Cheng; Morgello, Susan; Gabuzda, Dana

    Background: Immune activation is a strong predictor of disease progression in HIV infection. Combinatorial plasma biomarker signatures that represent surrogate markers of immune activation in both viremic and aviremic HIV patients on combination antiretroviral therapy (cART) have not been defined. Here, we identify a plasma inflammatory biomarker signature that distinguishes between both viremic and aviremic HIV patients on cART and healthy controls and examine relationships of this signature to markers of disease progression. Methods: Multiplex profiling and ELISA were used to detect 15 cytokines/chemokines, soluble IL-2R (sIL-2R), and soluble CD14 (sCD14) in plasma from 57 HIV patients with CD4 nadir <300 cells/(\mu)l and 29 healthy controls. Supervised and unsupervised analyses were used to identify biomarkers explaining variance between groups defined by HIV status or drug abuse. Relationships between biomarkers and disease markers were examined by Spearman correlation. Results: The majority (91%) of HIV subjects were on cART, with 38% having undetectable viral loads (VL). Hierarchical clustering identified a biomarker cluster in plasma consisting of two interferon-stimulated gene products (CXCL9 and CXCL10), T cell activation marker (sIL-2R), and monocyte activation marker (sCD14) that distinguished both viremic and aviremic HIV patients on cART from controls (p<0.0001) and were top-ranked in variables important in projection plots. IL-12 and CCL4 were also elevated in viremic and aviremic patients compared to controls (p<0.05). IL-12 correlated with IFN(\alpha), IFN(\gamma), CXCL9, and sIL-2R (p<0.05). CXCL10 correlated positively with plasma VL and percentage of CD16+ monocytes, and inversely with CD4 count (p = 0.001, <0.0001, and 0.04, respectively). Conclusion: A plasma inflammatory biomarker signature consisting of CXCL9, CXCL10, sIL-2R, and sCD14 may be useful as a surrogate marker to monitor immune activation in both viremic and aviremic HIV patients on cART during disease progression and therapeutic responses.

  • Publication

    Identifying Rare Variants Using a Bayesian Regression Approach

    (BioMed Central, 2011) Yan, Aimin; Laird, Nan; Li, Cheng

    Recent advances in next-generation sequencing technologies have made it possible to generate large amounts of sequence data with rare variants in a cost-effective way. Statistical methods that test variants individually are underpowered to detect rare variants, so it is desirable to perform association analysis of rare variants by combining the information from all variants. In this study, we use a Bayesian regression method to model all variants simultaneously to identify rare variants in a data set from Genetic Analysis Workshop 17. We studied the association between the quantitative risk traits Q1, Q2, and Q4 and the single-nucleotide polymorphisms and identified several positive single-nucleotide polymorphisms for traits Q1 and Q2. However, the model also generated several apparent false positives and missed many true positives, suggesting that there is room for improvement in this model.