Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

MCS

Big Data

Statistical Machine Learning


(CSE 575)
Note: The information below and course outline are subject to modifications and updates.

About this Course


The link between inference and computation is central to statistical machine learning, which
combines the computational sciences with statistics. In addition to artificial intelligence, fields
such as information management, finance, bioinformatics, and communications are significantly
influenced by developments in statistical machine learning. This course investigates the
data mining and statistical pattern recognition that support artificial intelligence. Main topics
covered include supervised learning; unsupervised learning; and deep learning, including major
components of machine learning and the data analytics that enable it.

Specific topics covered include:


yy Probability distributions yy Clustering
yy Maximum likelihood estimation yy Principal component analysis
yy Naive Bayes yy Neural networks
yy Logistic regression yy Convolutional neural networks
yy Support vector machines

Required Prior Knowledge and Skills


yyBasics of linear algebra, statistics, calculus, and algorithm design and analysis
yyProgramming (language such as Python or MATLAB)

Learning Outcomes

Learners completing this course will be able to:

yyDistinguish between supervised learning and unsupervised learning


yyApply common probability distributions in machine learning applications
yyUse cross validation to select parameters
yyUse maximum likelihood estimate (MLE) for parameter estimation
yyImplement fundamental learning algorithms such as logistic regression and k-means clustering

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 1
MCS
Big Data

yy Implement more advanced learning algorithms such as support vector machines and
convolutional neural networks
yy Design a deep network using an exemplar application to solve a specific problem

yy Apply key techniques employed in building deep learning architectures

Projects
yy Project 1: TBD
yy Project 2: TBD
yy Project 2: Implement k-means algorithm and its variants on a real document clustering task
yy Project 3: Image Classification using a standard dataset to train a deep network to classify
images and evaluate the performance on both the training set and the test set

Course Content

Instruction
yy Video lectures
yy Other videos
yy Readings
yy Interactive learning objects
yy Live office hours

Assessments
yy Practice activities and quizzes (auto-graded)
yy Practice assignments (instructor- or peer-reviewed)
yy Team and/or individual project(s) (instructor-graded)
yy Midterm or final exam (proctored, graded)

Estimated Workload/ Time Commitment Per Week


Approximately 15-20 hours per week.

Technology Requirements

Hardware
yy Standard with major OS

Software and Other


yy Standard - technology integrations will be provided through Coursera

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 2
MCS
Big Data

Course Outline
Unit 1: Introduction to Machine Learning

Learning Objectives
1.1 Describe common misconceptions of machine learning
1.2 Define machine learning
1.3 Distinguish between supervised learning and unsupervised learning
1.4 Compare numerical and graphical data representations
1.5 Describe applications of machine learning

Module 1: Defining Machine Learning


Common misconceptions
What is Machine Learning?
Related fields

Module 2: Styles of Machine Learning


Supervised learning
Unsupervised learning

Module 3: Data Representations


Data representation
Numerical representation
Graph representation

Module 4: Applications of Machine Learning


Recognizing examples
Familiar applications
Emerging applications

Unit 2: Statistical Core of Machine Learning

Learning Objectives
2.1 Apply common probability distributions in machine learning applications
2.2 Use maximum likelihood estimate (MLE) for parameter estimation

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 3
MCS
Big Data

Module 1: Probability
Discrete Random Variables
Probability Mass Function (PMF)
Common Distributions of PMF
• Uniform
• Binomial
Joint Probability Mass Function
Conditional Probability
Relationship Between Marginal and Joint Probability
Bayes Theorem
Independent Random Variables
Continuous Random Variables
Probability Density Function (PDF)
Common Distributions of PDF
• Normal
• Beta
Joint Probability Density Function
Moments of Random Variables

Module 2: Maximum Likelihood Estimation


Likelihood function
• For discrete probability distribution
• For continuous probability distribution
Maximum likelihood estimation
• For discrete probability distribution
• For continuous probability distribution
• For mean and standard deviation

Unit 3: Supervised Learning: Two Models

Learning Objectives
3.1 Differentiate between generative and discriminative models for supervised learning
3.2 Implement fundamental learning algorithms such as Naive Bayes and Logistic Regression
3.3 Interpret empirical comparisons of Naive Bayes and Logistic Regression

Module 1: Generative vs Discriminative Model of Supervised Learning


Generative vs Discriminative models for supervised learning
• Essential distinction
• Generative model: Naive Bayes
• Discriminative model: Logistic Regression

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 4
MCS
Big Data

Module 2: Naive Bayes


• Naive Bayes Assumption
• Decision Rule
Parameters of Naive Bayes
Maximum Likelihood Estimation (MLE) for Naive Bayes Parameters
Text Classification using Naive Bayes
• Bag of Words Model for Text

Module 3: Logistic Regression


Logistic Function
Linear Classifier
Parameter Estimation
Maximizing Conditional Log Likelihood
Gradient Ascent
Optimization Algorithm

Module 4: Comparing the Models


Empirical Comparison of Naive Bayes and Logistic Regression

Unit 4: Supervised Learning: Support Vector Machines

Learning Objectives
4.1 Differentiate between linearly separable and non-separable support vector machines
4.2 Explain the role of the kernel trick in support vector machines
4.3 Explain options for picking magic parameters in support vector machines
4.4 Implement the more advanced learning algorithm known as support vector machines

Module 1: Introduction to Support Vector Machines


SVM: Separable vs non-separable

Module 2: Separable
Linearly Separable Example
Max-margin Separating Hyperplane
Margin Maximization with Canonical Hyperplanes
Optimization Problem of SVM: separable case
Dual SVM Formulation: separable case

Module 3: Non-separable
Linearly Non-separable Example
Hinge Loss
Optimization Problem of SVM: non-separable case
Dual SVM Formulation: non-separable case
Input Space to Feature Space

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 5
MCS
Big Data

Kernel Trick
• Common Kernels
• Test Example
• SVM with the Kernel Trick

Module 4: Parameter Selection


How to Pick the Magic Parameters?
• Option #1: Leave-One-Out Cross Validation (LOOCV)
• Option #2: Cross Validation

Unit 5: Unsupervised Learning: Clustering

Learning Objectives
5.1 Differentiate between clustering in supervised vs. unsupervised learning
5.2 Explain how to efficiently cluster data
5.3 Apply the k-means algorithm
5.4 Explain the relationship between the several K-means variants

Module 1: Introduction to Clustering


The role of clustering in machine learning
Clustering in supervised versus unsupervised learning
How to find good clustering
• Intuition
• An example
• Mathematical formulation
How to efficiently cluster data
• Challenge - combinatorial nature
• Solution:
• High-level Idea:alternation
• Details - step 1: fix the cluster clusters, find the cluster membership
• Details - step 2: fix the cluster membership, update the cluster center

Module 2: K-means
K-means for clustering
K-means models
Properties of the K-means algorithm
• Initialization
• fix the cluster clusters, find the cluster membership
• fix the cluster membership, update the cluster center
• Repeat the above two steps until convergence
Comparing K-means clusterings

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 6
MCS
Big Data

A Numerical Example
• Input data, plot them in 1-d space
• Pick the initial cluster centers
• Run k-means algorithm one iteration
• Show how the cluster membership changes
• Show how the cluster centrs change
K-means algorithm considerations

Module 3: K-means Variants


K-means as matrix factorization
The k-means problem
• Input of k-means
• Mathematical formulation
• Two special case (k=1 vs. k=n)
Hardness of K-means problem
• When d>2, k-means is NP-hard
• When d=1, k-means is polynomially solvable
Optimality of Kmeans
• In general, it only finds a local optimum
• Convergence of kmeans
• The impact of initial cluster centers
• A numerical example about the impact of initial cluster centers
• Impact of outlier
Alternatives to random initialization
• Multiple runs
• kmeans++

Unit 6: Unsupervised Learning: Dimensionality Reduction

Learning Objectives
6.1 Illustrate the process of dimensionality reduction
6.2 Apply the PCA algorithm
6.3 Explain the relationship between PCA and SVD

Module 1: Introduction to Dimensionality Reduction


What is dimensionality reduction?
The role of dimensionality reduction in machine learning

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 7
MCS
Big Data

Module 2: Using Principal Component Analysis (PCA)


Introduction to using PCA
• Inputs of PCA
• Outputs of PCA
• A Numerical example
Maximizing the projected variance for the numerical example (d=1)
• How to calculate the projected data using original data and projection direction
• How to calculate the projected mean
• How to calculate projected variance
Maximizing the projected variance for the general case (d=1)
• One projected data
• Projected sample mean
• Sample variance matrix
• projected variance
Optimization formulation for PCA (d=1)
• Objective function
• Constraint & why we need it
• Optimization variable
Solving the optimization problem for PCA (d=1)
• Overall strategy: lagrangian
• Step 1: write down the lagrangian function
• Step 2: calculate the partial derivative
• Step 3: set the partial derivative to zero
• Step 4: plug in step 3 back to the objective function J
• Step 5: seek for the largest eigenvalue of S
Solving the optimization problem for PCA (d>1)
• Fact: d principle components are the first d eigenvectors of the sample variance matrix S
• Prove it by induction
• Step 0: Base case
• Step 1: projected variance when d>1
• Step 2: the optimization formulation
• Step 3: solve the optimization problem using lagrangian
Minimizing the reconstruction error
• Input data
• Projected data
• Reconstruction error
• Minimizing reconstruction error = maximizing projected variance
A matrix representation for minimizing reconstruction error
• Assumption
• Input data matrix
• Projected data matrix
• PC matrix
• Objective function

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 8
MCS
Big Data

PCA versus SVD


• Assumption
• Input data matrix X
• SVD of X
• Left singular matrix = projected data matrix
• Singular value matrix and right singular vector matrix = PC matrix
PCA versus Feature Selection
• Input data matrix
• Rows of input data matrix
• Columns of input data matrix
• Two key points of PCA
• Un-supervised learning
• Generate a few new features
• Two key points of feature selection
• Typically supervised learning
• Select a few original features

Unit 7: Deep Learning: Key Techniques

Learning Objectives
7.1: Describe the big-picture view of how neural networks work.
7.2: Identify the basic building blocks and notations of deep neural networks.
7.3: Explain how in principle learning is achieved in a deep network.
7.4: Explain key techniques that enable efficient learning in deep networks.
7.5: Appraise the detailed architecture of a basic convolutional neural network.
7.6: Compare the basic concepts and corresponding architecture for recurrent neural networks
and autoencoders.

Module 1: Introduction to Dimensionality Reduction


Brief historical view of artificial neural network and deep learning
Early models of artificial neural network and their learning algorithms
Deep learning: what it is and what it is not

Module 2: Key Techniques Enabling Deep Learning


Back-propagation algorithm for learning
Choice of activation functions
A few regularization methods

Module 3: Some Basic Deep Architecture


Convolutional Neural Network
Recurrent Neural Networks
Autoencoders

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 9
MCS
Big Data

Unit 8: Deep Learning: Exemplar Applications

Learning Outcomes
8.1: Appraise image classification for deep learning
8.2: Appraise video-based inference for deep learning
8.3: Appraise Generative Adversarial Networks (GANs) for deep learning
8.4: Design a deep network using an exemplar application to solve a specific problem

Module 1: Image Classification


A typical network architecture used for image classification
Parameters for defining an image classification network
Common tricks for improving classification performance

Module 2: Video-Based Inference


Challenges in using deep networks for sequential data
Difference between image-based and video-based classification
Using video action recognition to contrast the difference between these classification tasks
A sample network for video-based inference

Module 3: Generative Adversarial Networks ( GANs)


Basic concepts behind GANs
GANS variants and their applications

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 10
MCS
Big Data

About ASU

Established in Tempe in 1885, Arizona State University (ASU) has developed a new model
for the American Research University, creating an institution that is committed to access,
excellence and impact.

As the prototype for a New American University, ASU pursues research that contributes to the
public good, and ASU assumes major responsibility for the economic, social and cultural vitality
of the communities that surround it. Recognizing the university’s groundbreaking initiatives,
partnerships, programs and research, U.S. News and World Report has named ASU as the most
innovative university all three years it has had the category.

The innovation ranking is due at least in part to a more than 80 percent improvement in ASU’s
graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university
in the country and the emphasis on inclusion and student success that has led to more than 50
percent of the school’s in-state freshman coming from minority backgrounds.

About Ira A. Fulton Schools of Engineering

Structured around grand challenges and improving the quality of life on a global scale, the Ira
A. Fulton Schools of Engineering at Arizona State University integrates traditionally separate
disciplines and supports collaborative research in the multidisciplinary areas of biological and
health systems; sustainable engineering and the built environment; matter, transport and energy;
and computing and decision systems. As the largest engineering program in the United States,
students can pursue their educational and career goals through 25 undergraduate degrees or 39
graduate programs and rich experiential education offerings. The Fulton Schools are dedicated
to engineering programs that combine a strong core foundation with top faculty and a reputation
for graduating students who are aggressively recruited by top companies or become superior
candidates for graduate studies in medicine, law, engineering and science.

About the School of Computing, Informatics, & Decision Systems Engineering

The School of Computing, Informatics, and Decision Systems Engineering advances


developments and innovation in artificial intelligence, big data, cybersecurity and digital forensics,
and software engineering. Our faculty are winning prestigious honors in professional societies,
resulting in leadership of renowned research centers in homeland security operational efficiency,
data engineering, and cybersecurity and digital forensics. The school’s rapid growth of student
enrollment isn’t limited to the number of students at ASU’s Tempe and Polytechnic campuses as
it continues to lead in online education. In addition to the Online Master of Computer Science,
the school also offers an Online Bachelor of Science in Software Engineering, and the first four-
year, completely online Bachelor of Science in Engineering program in engineering management.

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 11
MCS
Big Data

Creators
Jingrui He is an assistant professor in the School of Computing, Informatics, and Decision
Systems Engineering at Arizona State University. She received her Ph.D. from Carnegie
Mellon University. She joined ASU in 2014 and directs the Statistical Learning Lab (STAR
Lab). Her research focuses on rare category analysis, heterogeneous machine learning,
active learning and semi-supervised learning, with applications in social media analysis,
healthcare, manufacturing process, etc.

Baoxin Li is currently a professor and the chair of the Computer Science & Engineering
Program and a Graduate Faculty Endorsed to Chair in the Electrical Engineering and
Computer Engineering programs. From 2000 to 2004, he was a Senior Researcher with
SHARP Laboratories of America, where he was the technical lead in developing SHARP’s
HiIMPACT Sports™ technologies. He was also an Adjunct Professor with the Portland State
University from 2003 to 2004. His general research interests are on visual computing and
machine learning, especially their application in the context of human-centered computing.

Hanghang Tong is currently an assistant professor at School of Computing, Informatics,


and Decision Systems Engineering (CIDSE), Arizona State University since August 2014.
Before that,he was an assistant professor at Computer Science Department, City College,
City University of New York, a research staff member at IBM T.J. Watson Research Center
and a Post-doctoral fellow in Carnegie Mellon University. His research interest is in large
scale data mining for graphs and multimedia.

Statistical Machine Learning


Lead: Jingrui He | Updated 6/28/2018 12
AOL-4715

You might also like