|
Machine Learning
CS 691A
Fall 2016
Prof. Gianfranco Doretto
West Virginia University, Department of Computer Science and
Electrical Engineering
|
|
Shortcuts: [eCampus][Class
Notes][Handouts][Assignments][Final
Project]
|
Time and Place
First lecture: August 18, 2016
Last lecture: December 6, 2016
- Tuesday, 2:00pm - 3:15pm in ESB-E G84
- Thursday, 2:00pm - 3:15pm in ESB-E G84
Office
Hours
Thursdays 3:30pm to 4:30pm or by
appointment (send email)
|
|
Important
Dates
- Midterm Exam: December 1, 2016
- Final Project: 5pm on Friday, December 9, 2016
|
|
Syllabus .pdf
What could be achieved by building automated systems that
improve their performance through experience? Robots would learn
how to navigate around based on experience gained by roaming
their environment. Cellphones would get better at recognizing
your voice, and executing and anticipating your commands.
Physicians would prescribe personalized therapies that best cure
diseases based on historical health records. Machine learning
addresses those, and many more problems through the study and
development of techniques enabling a system to automatically
learn from observed data.
The purpose of this graduate course is to provide an
introduction to the fundamental set of techniques and algorithms
that constitute machine learning as of today, while providing a
thorough grounding in the methodologies and theoretic
foundations. The course will also discuss recent applications of
machine learning, such as to computer vision, robotic control,
data mining, autonomous navigation, bioinformatics, speech
recognition, and text and web data processing.
The topics covered can be roughly summarized as follows:
- Basic statistics : Probabilities, MLE, MAP
- Instance-based Learning :
K-Nearest Neighbors, collaborative filtering
- Generative Models :
Naïve Bayes, linear discriminant analysis
- Linear Rules :
Linear regression, logistic regression, perceptron
- Support Vector Machines :
Optimal hyperplane, margin, kernels
- Statistical Learning Theory :
PAC learning, VC dimension
- Component Anlysis : PCA, Kernel PCA
- Clustering :
HAC, k-means, mixture of Gaussians
- Hidden Markov Models: probabilistic
model, estimation, Viterbi
- Graphical Models :
directed, undirected, learning, inference
- Decision Trees :
attribute selection, pruning and overfitting
- Ensemble Learning : Bagging, boosting
Prerequisites
Students entering the class are expected to have a
pre-existing working knowledge (at the level of an undergraduate
class) of multivariate calculus, linear algebra, probability and
statistics, and computer programming and algorithms, though the
class has been designed to allow students with a strong numerate
background to catch up and fully participate. Basic knowledge of
optimization is a plus. Please ask the instructor to take the
self-assessment exam to make sure you have the right background.
|
|
Schedule
and Class Notes
- 08/19: Introduction [slides]
[slides
3/page]
- Course policies
- What is machine learning, and its applications
- Machine learning problem types
- Materials: M:1
- 08/23: Basic statistics [slides]
[slides
3/page]
- Probabilities, independence
- Parameter estimation: maximum likelyhood, and a posteriori
- Hoeffding's bounds
- Materials: M:2.1-2.5, M:3.1-3.4, M:4.6.1-4.6.2, M:6.4.2 or
probability and statistics handout
- 08/25: Naive Bayes [slides]
[slides
3/page]
- Zero-one Bayesian predictor
- Naive Bayes assumption, learning
- Bag of words model
- Gaussian Naive Bayes
- Materials: M:5.7-5.7.1.1, M:3.5, M:4.2-4.2.1
- 08/28: Logistic Regression [slides]
[slides
3/page]
- MLE and MAP training
- Gradient descent
- Generative vs. Discriminative Classifiers
- Materials: M:1.4.6, M:8.1-8.2, M:8.3.1-8.3.2,
M:8.3.6-8.3.7, M:4.2.2-4.2.5, M:8.6-8.6.2
- 09/06: Linear Regression [slides]
[slides
3/page]
- Least squares training
- Ridge regression and lasso
- Polinomial and non-linear features
- Materials: M:1.4.5, M:7.1-7.3, M:7.5
- 09/08: Instance Based Learning [slides]
[slides
3/page]
- Density estimation, binning, Parzen window, smooth kernels
- k-Nearest Neighbor
- Watson-Nadaraya model, kernel regression, classification,
locally weighted regression
- Materials: M:1.4.1-1.4.2, M6.4.4, M14.7
- 09/13: Perceptron and Kernels [slides]
[slides
3/page]
- Perceptron criterion, algorithm, convergence
- Kernels, Mercer theorem, properties
- Materials: B:4.1.7, M:14.1-14.2
- 09/15: Optimization [slides]
[slides
3/page]
- Convexity
- Unconstrained optimization, gradient descent, Newton's
methods
- Constrained optimization, Lagrange duality, LP, QP
- Materials: optimization handouts Tutorial
1 and Tutorial
2
- 09/22: Support Vector Classification [slides]
[slides
3/pages]
- Large margin separation, optimization, support vectors
- Soft margin formulation, dual problem, non-linear
separation, risk and loss
- Materials: M:14.5, B:7.1
- 09/29: Risk Minimization and Model Selection [slides]
[slides
3/page]
- Risk and loss, ERM, overfitting, estimation and
approximation errors
- Cross-validation, regularized risk minimization,
information criteria
- Materials: Sections 1 through 4.2, and 9 of handout,
M:6.5
- 10/04: Support Vector Regression [slides]
[slides
3/pages]
- Representer theorem, kernel ridge regression
- Sparse vector formulation
- Materials: M:15.4.7, B:7.1.4
- 10/06: Dimensionality Reduction [slides]
[slides
3/page]
- 10/13: Clustering [slides]
[slides
3/page]
- Hierarchical Clustering
- Partitional Clustering, K-mean, spectral clustering
- Materials: M:25.1, M:25.5, M:11.4.2.5-11.4.2.7, M11.5.2,
M:25.5, handout Spectral
Clustering
- 10/20: Mixture Models and Expectation Maximization [slides]
[slides
3/page]
- Mixture models, density estimation, clustering, parameter
estimation
- EM, general formulation, convergence, applications
- Materials: M:11.1, M:11.2.1, M:11.2.3, M:11.3,
M:11.4.1-11.4.2, M:11.4.7, M:21.1-21.2
- 10/28: Hidden Markov Models [slides]
[slides
3/page]
- Markov models
- Forward-Backward, Viterbi,
Baum-Welch algorithms
- Materials: M:17.1-17.2.1, M:17.3,
M:17.4-17.5, M:18.1-M18.2
- 11/3: Decision Trees [slides]
[slides
3/page]
- Representation, prediction,
learning, pruning
- Materials: M:16.1, M:16.2
- 11/10: Graphical Models [slides]
[slides
3/page]
- BN, and MRF representation
- Inference, variable elimination,
learning tree structures
- Materials: M:10.1-10.2, M:10.5,
M:19.1-19.3, M:19.4.1, M:10.3, M:20.3, M:10:4, M:26.1-26.3
- 11/17: Ensemble Learning [slides]
[slides
3/page]
- Boosting, AdaBoost, analysis
- Bagging, random forests
- Materials: M:16., handout
AdaBoost, HTF:8.7, HTF:15.1-15.3
- 11/29: Neural Networks [slides]
[slides
3/page]
- Structure, representational power
- Backpropagation, overfitting
- Materials: M:16.5
- 12/6 Deep Learning [slides]
[slides
3/page]
- Convolutional Neural Networks
- Deep Belief Networks
- Materials: M:27.7, M:28.1-28.4
|
|
Assignments and Attendance
Homework assignments can be downloaded from eCampus.
All assignments are due at the beginning of the class on the
due date.
- Homework
1, due Thursday, September 29, at the beginning of the
class
- Homework
2, due Tuesday, October 27, at the beginning of the
class, download HW2_data
- Homework
3, due Tuesday, November 29, at the beginning of the
class, download HW3_data
- Homework
4, due Tuesday, December 6, at the beginning of the
class, download HW4_data
Programming Languages
In class examples will be given in Matlab, which should
provide an effective way to address the programming assignments
and final project needs.
Assignment
Policy
I expect you to try solving each assignment on your
own. However, when being stuck on a problem, I encourage you to
collaborate with other students in the class, subject to the
following rules:
- You
may discuss a problem with any student in this class, and
work together on solving it. This can involve brainstorming
and verbally discussing the problem, going together through
possible solutions, but should not involve one student
telling another a complete solution.
- Once
you solve the homework, you must write up your solutions on
your own, without looking at other people's write-ups or
giving your write-up to others.
- In
your solution for each problem, you must write down the
names of any person with whom you discussed it. This will
not affect your grade.
- Do
not consult solution manuals or other people's solutions
from similar courses.
Late
Assignment Policy
Each student may use three “late days” for the whole course.
The late day count starts from the day and time of the due
date of each assignment. Fractional late days are taken into
account. Additional “late days” after the first three come at
a day cost of 20% of the assignment.
Attendance
Policy
Consistent with WVU guidelines, students absent from regularly
scheduled examinations because of authorized University
activities will have the opportunity to take them at an
alternate time. Make-up exams for absences due to any other
reason will be at the discretion of the instructor
|
|
Final
Project
Final project
directions
|
|
Grading
The course format will predominantly consist of lectures.
Students will have few simple programming assignments, as well
as homework problems. Each student will complete a final
project, and give a project presentation to the class.
Final grades will be based approximately on the following
distribution (extra points may be assigned based on class
participation):
- 40%: Homework assignments (~4 assignments)
- 30%: Midterm exam
- 30%: Final project
Bonus points may be assigned based on class participation. In
addition, creative solutions to homework problems and
programming assignments are appreciated, and may be rewarded by
extra bonus points.
Grades will be assigned according to the following scale:
A=90-100; B=80-89; C=70-79; D=60-69; F= below 60.
|
|
Reference Material
Textbook
The main textbooks for the class are:
- Class notes and handouts
- M: Murphy, Machine Learning: A
Probabilistic Perspective, The MIT Press, 2012. (online)
Additional books that can serve as secondary reference on the
topics covered in class:
- B: Bishop, Pattern Recognition and
Machine Learning, Springer, 2006. (online)
- HTF: Hastie, Tibshirani, Friedman, The
Elements of Statistical Learning, Springer, 2009. (online)
- Duda, Hart, Stork, Pattern Classification, Wiley,
2000. (online)
- Mitchel, Machine Learning, McGraw-Hill, 1997. (online)
In addition, these are some books for further reading beyond
the scope of the course:
- Boyd, Vandenberghe, Convex Optimization, 2004 (online)
- Vapnik, The Nature of Statistical Learning Theory,
Springer, 2000. (online)
- Schoelkopf, Smola, Learning with Kernels, The MIT
Press, 2001. (online)
- Cristianini, Shawe-Taylor, Support Vector Machines and
other kernel-based learning methods, Cambridge
University Press, 2000. (online)
- Joachims, Learning to Classify Text using Support
Vector Machines, Kluwer, 2002. (online)
- Manning, Schuetze, Foundations of Statistical Natural
Language Processing, MIT Press, 1999. (online)
- Manning, Raghavan, Schuetze, Introduction to
Information Retrieval, Cambridge, 2008. (online)
- Koller, Friedman, Probabilistic Graphical Models:
Principles and Techniques, The MIT Press, 2009. (online)
Very
Important Background Handouts
- Probability and Statistics
- Linear Algebra
- Optimization
- Matlab
|
|
Inclusivity and Academic Integrity
Inclusivity Statement
The West Virginia University community is committed to
creating and fostering a positive learning and working
environment based on open communication, mutual respect, and
inclusion.
If you are a person with a disability and anticipate needing any
type of accommodation in order to participate in this class,
please advise me and make appropriate arrangements with the
Office of Disability Services (293-6700). For more information
on West Virginia University's Diversity, Equity, and Inclusion
initiatives, please see http://diversity.wvu.edu.
Integrity Statement
The integrity of the classes offered by any academic
institution solidifies the foundation of its mission and cannot
be sacrificed to expediency, ignorance, or blatant fraud.
Therefore, I will enforce rigorous standards of academic
integrity in all aspects and assignments of this course. For the
detailed policy of West Virginia University regarding the
definitions of acts considered to fall under academic dishonesty
and possible ensuing sanctions, please see the Student Conduct
Code at http://www.arc.wvu.edu/admissions/integrity.html. Should
you have any questions about possibly improper research
citations or references, or any other activity that may be
interpreted as an attempt at academic dishonesty, please see me
before the assignment is due to discuss the matter.
Page
layout by Thorsten
Joachims
|