Applied Linear Algebra and Big Data Course Book
Gandhi, Kabir K.
MetadataShow full item record
CitationGandhi, Kabir K. 2019. Applied Linear Algebra and Big Data Course Book. Bachelor's thesis, Harvard College.
AbstractThis project involves the development of course notes for Harvard’s Applied Mathematics 120: Applied Linear Algebra and Big Data. The course notes span eight chapters, beginning with a review of foundational concepts in linear algebra that are used throughout the course. Then, methods for solving linear equations are discussed, including the LU decomposition, iterative methods, the MapReduce algorithm and how to deal with large, sparse matrices that often come up in large-scale applications. Next, eigenvalues, eigenvectors and their respective applications are discussed, including Google’s PageRank algorithm, spectral clustering, solutions to systems of linear ordinary differential equations, transient amplification and the Jordan form. Chapters 4 and 5 explore principal component analysis and singular value decomposition, and several applications of these techniques: image compression, the matrix norm, the condition number, polar decomposition, solving under-determined and over-determined linear equations, multivariate PCA, maximum covariance analysis, and SVD-based recommendation systems. Then, in chapter 6 we discuss the identification and analysis of frequent patterns and applications of similarity analyses. In chapter 7, multiple clustering algorithms are considered, analyzed and demonstrated, including hierarchical, k-means, and self-organizing maps. In addition to the clustering algorithms, the notes cover related issues such as unusually-shaped data (and the application of the Mahalanobis distance in these cases), the "curse of dimensionality", and techniques for clustering very large datasets such as the BFR and CURE algorithms. Finally, machine learning classification algorithms are discussed in chapter 8, including perceptrons, support vector machines and feedforward neural networks, as well as a discussion of over-fitting and neural network optimization.
The notes include numerous images, graphics and numerical examples, generated using MATLAB and python, designed to clarify challenging concepts and improve the overall student experience in digesting complex course material. There was an emphasis in designing these notes to include step-by-step numerical examples for frequently arising problems. The course is designed to be predominantly applications-focused and proofs are provided only when they contribute to the understanding of important concepts.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364654
- FAS Theses and Dissertations