Show simple item record

dc.contributor.advisorKung, H.T.
dc.contributor.advisorParkes, David
dc.contributor.advisorLu, Yue
dc.contributor.authorCha, Miriam
dc.date.accessioned2019-12-12T09:10:13Z
dc.date.created2019-05
dc.date.issued2019-05-16
dc.date.submitted2019
dc.identifier.citationCha, Miriam. 2019. Multimodal Sparse Representation Learning and Cross-Modal Synthesis. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42029738*
dc.description.abstractHumans have a natural ability to process and relate concurrent sensations in different sensory modalities such as vision, hearing, smell, and taste. In order for artificial intelligence to be more human-like in their capabilities, it needs to be able to interpret and translate multimodal information. However, multimodal data are heterogenous, and the relationship between modalities is often complex. For example, there exist a number of correct ways to draw an image given a text description. Similarly, many text descriptions can be valid for an image. Summarizing and translating multimodal data is therefore challenging. In this thesis, I describe multimodal sparse coding schemes that can learn to represent multiple data modalities jointly. A key premise behind joint sparse coding is the representational power that captures complementary information while reducing statistical redundancy. As a result, my schemes can improve the performance of classification and retrieval tasks involving co-occurring data modalities. Building on the deep learning framework, I also present probabilistic generative models that produce new data conditioned on an input from another data modality. Specifically, I develop text-to-image synthesis models based on generative adversarial networks (GAN). To improve the visual realism and the diversity of generated images, I propose additional objective functions and a new GAN architecture. Furthermore, I propose a novel sampling strategy for training data that promotes output diversity under adversarial setting.
dc.description.sponsorshipEngineering and Applied Sciences - Computer Science
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectmultimodal learning
dc.subjectgenerative adversarial net
dc.subjectsparse coding
dc.titleMultimodal Sparse Representation Learning and Cross-Modal Synthesis
dc.typeThesis or Dissertation
dash.depositing.authorCha, Miriam
dc.date.available2019-12-12T09:10:13Z
thesis.degree.date2019
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.levelDoctoral
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
thesis.degree.nameDoctor of Philosophy
dc.type.materialtext
thesis.degree.departmentEngineering and Applied Sciences - Computer Science
thesis.degree.departmentEngineering and Applied Sciences - Computer Science
dash.identifier.vireo
dash.author.emailcha.miriam@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record