Linear Algebra for Data Science Shaina Race, PhD

Linear Algebra for Data Science

Companion Textbook (Work in Progress)

This course is meant to instill a working knowledge of linear algebra terminology and to lay the foundations of advanced data mining techniques like Principal Component Analysis, Singular Value Decomposition, Factor Analysis, Multidimensional Scaling, Correspondence Analysis, Network Analysis, Support Vector Machines and many more. We will cover as many of these methods as we can in the time we have. In order to fully comprehend these important tools and techniques, we need to understand the language in which they are presented: Linear Algebra. This is NOT a rigorous proof-based mathematics course. It is a surface level introduction to the most important definitions and concepts that are needed to understand these important data mining methodologies.


Topic Video Tutorials Slides + Notes Worksheet Code Assignments
Review of Introductory Material
(LA Bootcamp)
Scalar Multiplication and Addition (9:59)
Matrix Multiplication (10:45)
Special Matrices and Operations (5:31)
Systems of Equations & Least Squares (11:51)
Norms, Distances, Similarity (7:39)
Covariance (12:58)
Slides
Printable Notes
Review Pack 2
Solutions
Spans, Bases, and Subspaces
(Class 0 and Class 1)
1 - Spans and Subspaces (18:13)
2 - Coordinates and Bases (6:37)
3 - A Change of Basis (16:45)
4 - One Step Further (7:20)
5 - A More Complete Example (12:16)
Slides
Printable Notes

Textbook Chapter
Worksheet 1
Solution
Quiz 1
Orthogonality
(Class 2)
6 - Orthonormal Basis (8:12)
7 - Orthogonal Matrix (10:20)
8 - Orthogonal Projections (8:24)
9 - Regression as a Projection (8:35)
Slides
Printable Notes

Textbook Chapter
Worksheet 2
Solution
Quiz 2
Eigenvectors and
Introduction to PCA
(Class 3)
10 - Eigenvalues & Eigenvectors (9:16)
11 - EigenFacts (8:51)
12 - Introduction to PCA (10:51)
Slides
Printable Notes

Textbook Chapter
Worksheet 3
Solution
Quiz 3
Principal Components Analysis
(Class 4)
13 - 2-D data in a 3-D world (4:23)
14 - Eigenvalues Give Variance (13:35)
15 - Loadings and Scores (5:29)
16 - Covariance vs. Correlation PCA w/ SAS (20:23)
Slides
Printable Notes

PCA in R Chapter
PCA Text Chapter
Worksheet 4
Solution
TestScores SAS data
TestScores SAS code
Quiz 4
Case Studies with PCA
(Class 5)
17 - The BiPlot (UK Food) (10:44)
18 - FIFA Soccer Players PCA (9:59)
Slides Cancer Gene Data
Cancer Gene LAB
Cancer Gene SOLUTION

Star Wars Data
Star Wars LAB
Star Wars SOLUTION
TestScores csv data
TestScores BiPlot

UKfood Data
UKfood Markdown .pdf


FIFA .csv data
FIFA .Rdata
FIFA Markdown .pdf
No Quiz!
TEAM HW Assignment


Homework data

(Email due 9/2 5pm)
Factor Analysis via PC Rotations
(Class 6)
19 - PCA Recap (optional) (7:33)
20 - Factor Analysis via PCA Rotations (23:32)
Slides
Factor Analysis Text Chapter

Factor Analysis R Chapter
Worksheet 5
Solution
Big5 SAS data
Big5 SAS code

Big5 RData
Big5 Markdown .pdf
Quiz 5

PCR: Principal Components Regression
(Class 7)
21 - Principal Components Regression (PCR) (7:25)
22 - Choosing the number of components (4:20)
23 - PCR Example with Baseball Data (10:33)
24 - <optional> PCR Example with BIG Data (6:00)
25 - <optional> Partial Least Squares (PLS) (5:17)
Slides
ISL 6.7 Lab 3
Worksheet 6
Solution
Baseball SAS Data
Baseball .RData Data
SAS (Viya) code - Baseball

BigData .csv Data
BigData .RData
SAS (Viya) code - BigData

Quiz 6

Read: ISL Sections 6.3 + 6.4
Singular Value Decomposition
(Class 8)
26 - The Singular Value Decomposition (10:05)
27 - Noise Reduction via SVD (5:38)
28 - The SVD of Dr. Rappa (6:30)
Slides

PCA via SVD in R Example
SVD Text Chapter
Worksheet 7
Solution
Images
SVD Markdown
NO Quiz!
Review
Blue Recording
Orange Recording
Review Jeopardy

With Solutions
The Curse of Dimensionality
Curse of Dimensionality.mp4 Slides Carl Sagan's Introduction to Flatland
(Just for fun)
Topic Links
Principal Components Analysis Eigenstyle - The Principal Components of Dresses. Clever analyst attempts to build a machine to pick out dresses she will like on Amazon.com

A Different Ball Game. Analyzing styles of play in football (soccer).

Advanced Statistical Analysis of the SPFL et al. A new team rating for Scottish football (soccer), zeroing in on team strength as a latent variable.
Nonnegative Matrix Factorization Quick Intro to NMF, The method and the R package.

Singular Value Decomposition The Netflix Prize, Big Data, the SVD, and R.

The "Most Basic" SVD tutorial on the market (by Kirk Baker).
Distance Metrics Etc. Ben Federickson Explores Distance Metrics.
Useful Blogs and Webpages The Shape of Data. Jesse Johnson explores MANY common analytical tools and explains them in relatively plain language.