CS691A Machine Learning - Fall 2016

11 minute read

Machine Learning

CS 691A
Fall 2016
Prof. Gianfranco Doretto
West Virginia University, Department of Computer Science and Electrical Engineering


Shortcuts: [eCampus][Class Notes][Handouts][Assignments][Final Project]

Time and Place

First lecture: August 18, 2016
Last lecture: December 6, 2016

  • Tuesday, 2:00pm - 3:15pm in ESB-E G84
  • Thursday, 2:00pm - 3:15pm in ESB-E G84

Office Hours

Thursdays 3:30pm to 4:30pm or by appointment (send email)

Important Dates

  • Midterm Exam: December 1, 2016
  • Final Project: 5pm on Friday, December 9, 2016

Syllabus .pdf

What could be achieved by building automated systems that improve their performance through experience? Robots would learn how to navigate around based on experience gained by roaming their environment. Cellphones would get better at recognizing your voice, and executing and anticipating your commands. Physicians would prescribe personalized therapies that best cure diseases based on historical health records. Machine learning addresses those, and many more problems through the study and development of techniques enabling a system to automatically learn from observed data.

The purpose of this graduate course is to provide an introduction to the fundamental set of techniques and algorithms that constitute machine learning as of today, while providing a thorough grounding in the methodologies and theoretic foundations. The course will also discuss recent applications of machine learning, such as to computer vision, robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.

The topics covered can be roughly summarized as follows:

  • Basic statistics : Probabilities, MLE, MAP
  • Instance-based Learning : K-Nearest Neighbors, collaborative filtering
  • Generative Models : Naïve Bayes, linear discriminant analysis
  • Linear Rules : Linear regression, logistic regression, perceptron
  • Support Vector Machines : Optimal hyperplane, margin, kernels
  • Statistical Learning Theory : PAC learning, VC dimension
  • Component Anlysis : PCA, Kernel PCA
  • Clustering : HAC, k-means, mixture of Gaussians
  • Hidden Markov Models: probabilistic model, estimation, Viterbi
  • Graphical Models : directed, undirected, learning, inference
  • Decision Trees : attribute selection, pruning and overfitting
  • Ensemble Learning : Bagging, boosting

Students entering the class are expected to have a pre-existing working knowledge (at the level of an undergraduate class) of multivariate calculus, linear algebra, probability and statistics, and computer programming and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. Basic knowledge of optimization is a plus. Please ask the instructor to take the self-assessment exam to make sure you have the right background.


Schedule and Class Notes

  • 08/19: Introduction [slides] [slides 3/page]
    • Course policies
    • What is machine learning, and its applications
    • Machine learning problem types
    • Materials: M:1
  • 08/23: Basic statistics [slides] [slides 3/page]
    • Probabilities, independence
    • Parameter estimation: maximum likelyhood, and a posteriori
    • Hoeffding's bounds
    • Materials: M:2.1-2.5, M:3.1-3.4, M:4.6.1-4.6.2, M:6.4.2 or probability and statistics handout
  • 08/25: Naive Bayes [slides] [slides 3/page]
    • Zero-one Bayesian predictor
    • Naive Bayes assumption, learning
    • Bag of words model
    • Gaussian Naive Bayes
    • Materials: M:5.7-, M:3.5, M:4.2-4.2.1
  • 08/28: Logistic Regression [slides] [slides 3/page]
    • MLE and MAP training
    • Gradient descent
    • Generative vs. Discriminative Classifiers
    • Materials: M:1.4.6, M:8.1-8.2, M:8.3.1-8.3.2, M:8.3.6-8.3.7, M:4.2.2-4.2.5, M:8.6-8.6.2
  • 09/06: Linear Regression [slides] [slides 3/page]
    • Least squares training
    • Ridge regression and lasso
    • Polinomial and non-linear features
    • Materials: M:1.4.5, M:7.1-7.3, M:7.5
  • 09/08: Instance Based Learning [slides] [slides 3/page]
    • Density estimation, binning, Parzen window, smooth kernels
    • k-Nearest Neighbor
    • Watson-Nadaraya model, kernel regression, classification, locally weighted regression
    • Materials: M:1.4.1-1.4.2, M6.4.4, M14.7
  • 09/13: Perceptron and Kernels [slides] [slides 3/page]
    • Perceptron criterion, algorithm, convergence
    • Kernels, Mercer theorem, properties
    • Materials: B:4.1.7, M:14.1-14.2
  • 09/15: Optimization [slides] [slides 3/page]
    • Convexity
    • Unconstrained optimization, gradient descent, Newton's methods
    • Constrained optimization, Lagrange duality, LP, QP
    • Materials: optimization handouts Tutorial 1 and Tutorial 2
  • 09/22: Support Vector Classification [slides] [slides 3/pages]
    • Large margin separation, optimization, support vectors
    • Soft margin formulation, dual problem, non-linear separation, risk and loss
    • Materials: M:14.5, B:7.1
  • 09/29: Risk Minimization and Model Selection [slides] [slides 3/page]
    • Risk and loss, ERM, overfitting, estimation and approximation errors
    • Cross-validation, regularized risk minimization, information criteria
    • Materials: Sections 1 through 4.2, and 9 of handout, M:6.5
  • 10/04: Support Vector Regression [slides] [slides 3/pages]
    • Representer theorem, kernel ridge regression
    • Sparse vector formulation
    • Materials: M:15.4.7, B:7.1.4
  • 10/06: Dimensionality Reduction [slides] [slides 3/page]
  • 10/13: Clustering [slides] [slides 3/page]
    • Hierarchical Clustering
    • Partitional Clustering, K-mean, spectral clustering
    • Materials: M:25.1, M:25.5, M:, M11.5.2, M:25.5, handout Spectral Clustering
  • 10/20: Mixture Models and Expectation Maximization [slides] [slides 3/page]
    • Mixture models, density estimation, clustering, parameter estimation
    • EM, general formulation, convergence, applications
    • Materials: M:11.1, M:11.2.1, M:11.2.3, M:11.3, M:11.4.1-11.4.2, M:11.4.7, M:21.1-21.2
  • 10/28: Hidden Markov Models [slides] [slides 3/page]
    • Markov models
    • Forward-Backward, Viterbi, Baum-Welch algorithms
    • Materials: M:17.1-17.2.1, M:17.3, M:17.4-17.5, M:18.1-M18.2
  • 11/3: Decision Trees [slides] [slides 3/page]
    • Representation, prediction, learning, pruning
    • Materials: M:16.1, M:16.2
  • 11/10: Graphical Models [slides] [slides 3/page]
    • BN, and MRF representation
    • Inference, variable elimination, learning tree structures
    • Materials: M:10.1-10.2, M:10.5, M:19.1-19.3, M:19.4.1, M:10.3, M:20.3, M:10:4, M:26.1-26.3
  • 11/17: Ensemble Learning [slides] [slides 3/page]
    • Boosting, AdaBoost, analysis
    • Bagging, random forests
    • Materials: M:16., handout AdaBoost, HTF:8.7, HTF:15.1-15.3
  • 11/29: Neural Networks [slides] [slides 3/page]
    • Structure, representational power
    • Backpropagation, overfitting
    • Materials: M:16.5
  • 12/6 Deep Learning [slides] [slides 3/page]
    • Convolutional Neural Networks
    • Deep Belief Networks
    • Materials: M:27.7, M:28.1-28.4

Assignments and Attendance

Homework assignments can be downloaded from eCampus.

All assignments are due at the beginning of the class on the due date.

  • Homework 1, due Thursday, September 29, at the beginning of the class
  • Homework 2, due Tuesday, October 27, at the beginning of the class, download HW2_data
  • Homework 3, due Tuesday, November 29, at the beginning of the class, download HW3_data
  • Homework 4, due Tuesday, December 6, at the beginning of the class, download HW4_data

Programming Languages
In class examples will be given in Matlab, which should provide an effective way to address the programming assignments and final project needs.

Assignment Policy
I expect you to try solving each assignment on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:
  1. You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.
  2. Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.
  3. In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.
  4. Do not consult solution manuals or other people's solutions from similar courses.

Late Assignment Policy
Each student may use three “late days” for the whole course. The late day count starts from the day and time of the due date of each assignment. Fractional late days are taken into account. Additional “late days” after the first three come at a day cost of 20% of the assignment.

Attendance Policy
Consistent with WVU guidelines, students absent from regularly scheduled examinations because of authorized University activities will have the opportunity to take them at an alternate time. Make-up exams for absences due to any other reason will be at the discretion of the instructor


Final Project

Final project directions


The course format will predominantly consist of lectures. Students will have few simple programming assignments, as well as homework problems. Each student will complete a final project, and give a project presentation to the class.

Final grades will be based approximately on the following distribution (extra points may be assigned based on class participation):

  • 40%: Homework assignments (~4 assignments)
  • 30%: Midterm exam
  • 30%: Final project

Bonus points may be assigned based on class participation. In addition, creative solutions to homework problems and programming assignments are appreciated, and may be rewarded by extra bonus points.

Grades will be assigned according to the following scale: A=90-100; B=80-89; C=70-79; D=60-69; F= below 60.


Reference Material


The main textbooks for the class are:

  • Class notes and handouts
  • M: Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012. (online)

Additional books that can serve as secondary reference on the topics covered in class:

  • B: Bishop, Pattern Recognition and Machine Learning, Springer, 2006. (online)
  • HTF: Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer, 2009. (online)
  • Duda, Hart, Stork, Pattern Classification, Wiley, 2000. (online)
  • Mitchel, Machine Learning, McGraw-Hill, 1997. (online)

In addition, these are some books for further reading beyond the scope of the course:

  • Boyd, Vandenberghe, Convex Optimization, 2004 (online)
  • Vapnik, The Nature of Statistical Learning Theory, Springer, 2000. (online)
  • Schoelkopf, Smola, Learning with Kernels, The MIT Press, 2001. (online)
  • Cristianini, Shawe-Taylor, Support Vector Machines and other kernel-based learning methods, Cambridge University Press, 2000. (online)
  • Joachims, Learning to Classify Text using Support Vector Machines, Kluwer, 2002. (online)
  • Manning, Schuetze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. (online)
  • Manning, Raghavan, Schuetze, Introduction to Information Retrieval, Cambridge, 2008. (online)
  • Koller, Friedman, Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009. (online)

Very Important Background Handouts

Inclusivity and Academic Integrity

Inclusivity Statement
The West Virginia University community is committed to creating and fostering a positive learning and working environment based on open communication, mutual respect, and inclusion.

If you are a person with a disability and anticipate needing any type of accommodation in order to participate in this class, please advise me and make appropriate arrangements with the Office of Disability Services (293-6700). For more information on West Virginia University's Diversity, Equity, and Inclusion initiatives, please see http://diversity.wvu.edu.

Integrity Statement
The integrity of the classes offered by any academic institution solidifies the foundation of its mission and cannot be sacrificed to expediency, ignorance, or blatant fraud. Therefore, I will enforce rigorous standards of academic integrity in all aspects and assignments of this course. For the detailed policy of West Virginia University regarding the definitions of acts considered to fall under academic dishonesty and possible ensuing sanctions, please see the Student Conduct Code at http://www.arc.wvu.edu/admissions/integrity.html. Should you have any questions about possibly improper research citations or references, or any other activity that may be interpreted as an attempt at academic dishonesty, please see me before the assignment is due to discuss the matter.

Page layout by Thorsten Joachims