IDENTIFICATION of CELL SURFACE MARKERS WHICH CORRELATE WITH SALL4 in a B-CELL ACUTE LYMPHOBLASTIC LEUKEMIA WITH T(8;14) DISCOVERED THROUGH BIOINFORMATIC ANALYSIS of MICROARRAY GENE EXPRESSION DATA
Weinberg, Robert Paul
MetadataShow full item record
AbstractAcute Lymphoblastic Leukemia (ALL) is the most common leukemia in children, causing signficant morbidity and mortality annually in the U.S. We performed exploratory data analysis on several microarray gene expression data sets publicly available in the Gene Expression Omnibus (GEO) repository maintained at the National Center for Biotechnology Information of the National Library of Medicine under the National Institutes of Health (http://ncbi.nlm.nih.gov) looking for novel associations and relationships between the zinc finger transcription factor SALL4 and leukemia.
Through this data mining, we found a subset of B-cell ALL where multiple cell surface markers have relatively high correlation with SALL4. However, in part due to the small number of samples in this group ( n = 13 ), the results of these analyses must be considered with caution until such time as they may be validated experimentally in the lab with living leukemia cells.
We evaluated the transcriptome changes in these leukemia datasets which are associated with the expression of the SALL4. The correlation analysis of the microarray data revealed that a small subset of B-cell ALL, comprising 13 samples, a mature B-cell acute lymphoblastic leukemia with a translocation of t(8;14) subset [B-ALL with t(8;14)] has multiple cell surface marker genes which showed relatively high correlation with SALL4 expression ( | r | > 0.60), whereas 16 other leukemia subsets only showed low-moderate correlation of the same cell surface biomarkers with SALL4 ( | r | < 0.45).
The microarray gene expression data was obtained using the Affymetrix gene chip, HG-U133Plus2, which is a 3’ IVT oligonucleotide array for the detection of cDNA, which is synthesized from mRNA extracted from the relevant human cells. The array consists of both Perfect Match and Mismatch probes for the detection and differential analysis of some 23,520 probe-gene pairs. The luminosity read-out from the gene chip assay then undergoes a number of statistical manipulations which include standardization and normalization of the data prior to its deposit in the GEO library. Within each dataset the gene expression data is normalized but special methods must be used if one wants to compare the data between different datasets from different experiments in the GEO repository. Some datasets include the raw luminosity read-outs.
The majority of this thesis focuses on one specific microarray gene expression dataset, GSE13159, which comprises some 2,096 samples taken from patients with acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), myelodysplastic syndrome (MDS) and normal healthy controls.
After finding the B-ALL with t(8;14) wherein the cell surface markers correlate highly with SALL4, we used the limma package from the R-based Bioconductor platform to perform a linear regression analysis looking for the differential expression of genes in the transcriptome. The linear regression analysis reveals that this B-cell leukemia subset has genes differentially expressed distinct from the average pattern of gene expression of the other lymphoblastic leukemias.
Extensive bioinformatic analyses were carried out on this small group of samples and the limitations of these analyses will be further examined in the discussion section of the paper. Some preliminary functional genomic analysis was carried out on these differentially expressed genes (DEGs) and they were compartmentalized into specific gene ontologies (GO) and KEGG pathways, which includes the hematopoietic pathway. This corollary data can be found in the appendices attached.
There is some overlap of the Gene Ontologies and the KEGG pathways between the 17 leukemia / myelodysplastic groups analyzed, which includes the hematopoietic pathway but the B-ALL with t(8;14) showed differences from the other leukemias.
SALL4 is a zinc-finger transcription factor important in maintaining the pluripotency of embryonic and hematopoietic stem cells as evidenced in transgenic animal models and genetically modified cell lines with either deletion of SALL4 or forced over-expression of SALL4. Experimental evidence also suggests that SALL4 plays an important role in leukemogenesis as well as other oncogenic processes in other neoplasms.
Potentially the association found between these specific cell surface biomarkers with SALL4 expression in this B-ALL with t(8;14) subset may facilitate future research on SALL4. The iPathway tool (www.advaitabio.com) was used to further characterize this B-ALL t(8;14) subset. The iPathway tool revealed 549 differentially expressed genes (DEGs) compared with the normal samples identified out of a total of 20,388 genes with measured expression. These 549 DEGs have a significant impact on 34 biological pathways by KEGG analysis. These 549 DEGs also comprise a significant enrichment of 1431 Gene Ontology (GO) terms, 237 predicted miRNAs and 57 diseases based on uncorrected p-values. These DEGs were analyzed in the context of pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Gene Ontology Consortium database (GO), the miRBase and TARGETSCAN databases. Some of the iPathway results will be found in the appendices. These results must be considered with caution considering significant limitations in this study.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:38962442