Topic |
Video |
Slides + Notes |
Code + Data |
Exercises + Homework |
Introduction to Data Mining
(Modeling Part I)
(Self-paced by 10/5)
|
Bias-Variance Tradeoff (Train-Valid-Test)
Missing Values
Transactional Data
Variable Transformations
|
Introduction
|
|
Exercises
(Solutions) |
Association Analysis
|
|
Association |
Grocery Data (SAS)
Viya Code
Association Analysis in R
Association Analysis in Python
Grocery Data (.csv)
|
Exercises
(Solutions)
Due BY 9/25 5pm:
Homework
Homework Data (.csv)
public.orderData2 (SAS Viya)
Submit link
Due BY 9/25 11:59pm:
Homework Quiz
|
Classification and Regression Trees (CART) Part 1
|
|
CART
|
Viya Demo 1
Telco Churn(.csv)
Viya Tree Details
A Python run-through
|
Exercises (#2,3,5,7,8)
(Solutions)
|
Classification and Regression Trees (CART) Part 2
|
|
CART
|
Viya Demo 2
Breast Cancer (SAS Dataset)
Demo 2 in R
Breast Cancer Descript'n (.txt)
Decision Trees in R
Breast Cancer (Rdata)
|
Exercises (ALL problems)
(Solutions)
Homework
Homework Data (.csv)
Homework Data (.RData)
Homework Data Dict. (.txt)
Submit link
Due BY Friday 10/16 11:59pm:
Homework Quiz
|
Clustering Part 1
k-means
|
|
Clustering
|
Clustering in R (Code)
Adult Data (.RData)
Adult Data (.csv)
|
|
The Curse of Dimensionality
(self-paced by 10/9)
|
Curse of Dimensionality.mp4 |
Slides
|
|
Carl Sagan's Introduction to Flatland (Just for fun)
|
Clustering Part 2
Hierarchical Clustering
|
|
Clustering
|
HierarchicalClust using k-means centroids in R (Adult data)
Clustering in SAS (code)
Breast Cancer (SAS Dataset)
|
Exercises
(Solutions)
(Detail Solution Problem 2)
|
Clustering Lab
|
|
Choose your Own Adventure - Clustering
|
TeenSNS Data (.sas7bdat) (also in public library)
TeenSNS Data (.csv)
TeenSNS Data (.RData)
TeenSNS Description (.txt)
|
Link to Submit Profiles
One submission per team!
|
k-NearestNeighbor
|
|
kNN
|
PenDigitTrain (SAS)
PenDigitTest (SAS)
PenDigit Description (.txt)
kNN in Base SAS (.pdf)
kNN in R
PenDigits.Rdata
|
kNN Exercises
(kNN Solutions)
|
Modeling Part II
(self-paced by 10/19)
|
|
Model Evaluation
|
|
Exercises
(Solutions)
|
Ensemble Models
|
|
Ensemble Models |
Telco Churn(.csv)
|
|
Review
|
|
Review Slides (with solutions) |
|
|
Exam Open Monday 10/19 Due Friday 10/23 11:59pm
You may use any notes but may not accept exam-specific help from another person |
Ensemble Clustering
(Faculty Workshop - 10/1)
|
Recording
|
Consensus Clustering |
R Code
Adult Data (.RData)
Adult Data (.csv)
|
|