Publication:

Applied Linear Algebra and Big Data Course Book

Loading...
Thumbnail Image

Date

2019-10-25

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Gandhi, Kabir K. 2019. Applied Linear Algebra and Big Data Course Book. Bachelor's thesis, Harvard College.

Abstract

This project involves the development of course notes for Harvard’s Applied Mathematics 120: Applied Linear Algebra and Big Data. The course notes span eight chapters, beginning with a review of foundational concepts in linear algebra that are used throughout the course. Then, methods for solving linear equations are discussed, including the LU decomposition, iterative methods, the MapReduce algorithm and how to deal with large, sparse matrices that often come up in large-scale applications. Next, eigenvalues, eigenvectors and their respective applications are discussed, including Google’s PageRank algorithm, spectral clustering, solutions to systems of linear ordinary differential equations, transient amplification and the Jordan form. Chapters 4 and 5 explore principal component analysis and singular value decomposition, and several applications of these techniques: image compression, the matrix norm, the condition number, polar decomposition, solving under-determined and over-determined linear equations, multivariate PCA, maximum covariance analysis, and SVD-based recommendation systems. Then, in chapter 6 we discuss the identification and analysis of frequent patterns and applications of similarity analyses. In chapter 7, multiple clustering algorithms are considered, analyzed and demonstrated, including hierarchical, k-means, and self-organizing maps. In addition to the clustering algorithms, the notes cover related issues such as unusually-shaped data (and the application of the Mahalanobis distance in these cases), the "curse of dimensionality", and techniques for clustering very large datasets such as the BFR and CURE algorithms. Finally, machine learning classification algorithms are discussed in chapter 8, including perceptrons, support vector machines and feedforward neural networks, as well as a discussion of over-fitting and neural network optimization. The notes include numerous images, graphics and numerical examples, generated using MATLAB and python, designed to clarify challenging concepts and improve the overall student experience in digesting complex course material. There was an emphasis in designing these notes to include step-by-step numerical examples for frequently arising problems. The course is designed to be predominantly applications-focused and proofs are provided only when they contribute to the understanding of important concepts.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories