# CS696 Machine Learning - Fall 2020

## Who When Where

Instructor: Prof. Gianfranco Doretto

Department of Computer Science and Electrical Engineering

West Virginia University

First lecture: August 27, 2020

Last lecture: December 3, 2020 (remote)

Class meetings: Thursdays, 5:00pm - 7:50pm

Designated classroom: AERB 135

Class format: **Hybrid**

## Office Hours

Held virtually by appointment (send email)

## Important Dates

- Midterm Exam:
*November 19, 2020* - Final Project:
*December 4, 2020*

## Syllabus

- What could be achieved by building automated systems that
improve their performance through experience? Robots would learn
how to navigate around based on experience gained by roaming
their environment. Cellphones would get better at recognizing
your voice, and executing and anticipating your commands.
Physicians would prescribe personalized therapies that best cure
diseases based on historical health records. Machine learning
addresses those, and many more problems through the study and
development of techniques enabling a system to automatically
learn from observed data.

The purpose of this graduate course is to provide an introduction to the fundamental set of techniques and algorithms that constitute machine learning as of today, while providing a thorough grounding in the methodologies and theoretic foundations. The course will also discuss recent applications of machine learning, such as to computer vision, robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.

### Topics

The topics covered can be roughly summarized as follows:

**Basic statistics :**Probabilities, MLE, MAP**Instance-based Learning :**K-Nearest Neighbors, collaborative filtering**Generative Models :**Naïve Bayes, linear discriminant analysis**Linear Rules :**Linear regression, logistic regression, perceptron**Support Vector Machines :**Optimal hyperplane, margin, kernels**Statistical Learning Theory :**PAC learning, VC dimension**Component Anlysis :**PCA, Kernel PCA**Clustering :**HAC, k-means, mixture of Gaussians**Hidden Markov Models :**probabilistic model, estimation, Viterbi**Graphical Models :**directed, undirected, learning, inference**Decision Trees :**attribute selection, pruning and overfitting**Ensemble Learning :**Bagging, boosting<

## Prerequisites

Students entering the class are expected to have a pre-existing working knowledge (at the level of an undergraduate class) of multivariate calculus, linear algebra, probability and statistics, and computer programming and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. Basic knowledge of optimization is a plus. Please ask the instructor to take the self-assessment exam to make sure you have the right background.

## Schedule and Class Notes

- 08/19 Introduction [slides] [slides 3/page]
- Course policies
- What is machine learning, and its applications
- Machine learning problem types
- Materials: M:1

- 08/23: Basic statistics [slides] [slides 3/page]
- Probabilities, independence
- Parameter estimation: maximum likelyhood, and a posteriori
- Hoeffding’s bounds
- Materials: M:2.1-2.5, M:3.1-3.4, M:4.6.1-4.6.2, M:6.4.2 or probability and statistics handout

- 08/25 Naive Bayes [slides] [slides 3/page]
- Zero-one Bayesian predictor
- Naive Bayes assumption, learning
- Bag of words model
- Gaussian Naive Bayes
- Materials: M:5.7-5.7.1.1, M:3.5, M:4.2-4.2.1

- 08/28 Logistic Regression [slides] [slides 3/page]
- MLE and MAP training
- Gradient descent
- Generative vs. Discriminative Classifiers
- Materials: M:1.4.6, M:8.1-8.2, M:8.3.1-8.3.2, M:8.3.6-8.3.7, M:4.2.2-4.2.5, M:8.6-8.6.2

- 09/06: Linear Regression [slides [slides 3/page]
- Least squares training
- Ridge regression and lasso
- Polinomial and non-linear features
- Materials: M:1.4.5, M:7.1-7.3, M:7.5

- 09/08: Instance Based Learning [slides] [slides 3/page]
- Density estimation, binning, Parzen window, smooth kernels
- k-Nearest Neighbor
- Watson-Nadaraya model, kernel regression, classification, locally weighted regression
- Materials: M:1.4.1-1.4.2, M6.4.4, M14.7

- 09/13: Perceptron and Kernels [slides] [slides 3/page]
- Perceptron criterion, algorithm, convergence
- Kernels, Mercer theorem, properties
- Materials: B:4.1.7, M:14.1-14.2

- 09/15: Optimization [slides] [slides 3/page]
- Convexity
- Unconstrained optimization, gradient descent, Newton’s methods
- Constrained optimization, Lagrange duality, LP, QP
- Materials: optimization handouts Tutorial 1 and Tutorial 2

- 09/22: Support Vector Classification [slides] [slides 3/pages]
- Large margin separation, optimization, support vectors
- Soft margin formulation, dual problem, non-linear separation, risk and loss
- Materials: M:14.5, B:7.1

- 09/29: Risk Minimization and Model Selection [slides] [slides 3/page]
- Risk and loss, ERM, overfitting, estimation and approximation errors
- Cross-validation, regularized risk minimization, information criteria
- Materials: Sections 1 through 4.2, and 9 of handout, M:6.5

- 10/04: Support Vector Regression [slides] [slides 3/pages]
- Representer theorem, kernel ridge regression
- Sparse vector formulation
- Materials: M:15.4.7, B:7.1.4

- 10/06: Dimensionality Reduction [slides] [slides 3/page]
- Feature selection
- Latent linear and nonlinear models
- Materials: M:3.5.4, M:12.2-12.3, handout KPCA, handout Laplacian

- 10/13: Clustering [slides] [slides 3/page]
- Hierarchical Clustering
- Partitional Clustering, K-mean, spectral clustering
- Materials: M:25.1, M:25.5, M:11.4.2.5-11.4.2.7, M11.5.2, M:25.5, handout Spectral Clustering

- 10/20: Mixture Models and Expectation Maximization [slides] [slides 3/page]
- Mixture models, density estimation, clustering, parameter estimation
- EM, general formulation, convergence, applications
- Materials: M:11.1, M:11.2.1, M:11.2.3, M:11.3, M:11.4.1-11.4.2, M:11.4.7, M:21.1-21.2

- 10/28: Hidden Markov Models [slides] [slides 3/page]
- Markov models
- Forward-Backward, Viterbi, Baum-Welch algorithms
- Materials: M:17.1-17.2.1, M:17.3, M:17.4-17.5, M:18.1-M18.2

- 11/3: Decision Trees [slides] [slides 3/page]
- Representation, prediction, learning, pruning
- Materials: M:16.1, M:16.2

- 11/10: Graphical Models [slides] [slides 3/page]
- BN, and MRF representation<
- Inference, variable elimination, learning tree structures
- Materials: M:10.1-10.2, M:10.5, M:19.1-19.3, M:19.4.1, M:10.3, M:20.3, M:10:4, M:26.1-26.3

- 11/17: Ensemble Learning [slides] [slides 3/page]
- Boosting, AdaBoost, analysis
- Bagging, random forests
- Materials: M:16., handout AdaBoost, HTF:8.7, HTF:15.1-15.3

- 11/29: Neural Networks [slides] [slides 3/page]
- Structure, representational power
- Backpropagation, overfitting
- Materials: M:16.5

- 12/6 Deep Learning [slides] [slides 3/page]
- Convolutional Neural Networks
- Deep Belief Networks
- Materials: M:27.7, M:28.1-28.4

## Homework

Homework assignments can be downloaded from eCampus.

All assignments are due at the beginning of the class on the due date.

- Homework 1, due Thursday, September 29, at the beginning of the class
- Homework 2, due Tuesday, October 27, at the beginning of the class, download HW2_data
- Homework 3, due Tuesday, November 29, at the beginning of the class, download HW3_data
- Homework 4, due Tuesday, December 6, at the beginning of the class, download HW4_data

### Programming Languages

In class examples will be given in Matlab, which should provide an effective way to address the programming assignments and final project needs.

## Assignment Policy

I expect you to try solving each assignment on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

- You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.
- Once you solve the homework, you must write up your solutions on your own, without looking at other people’s write-ups or giving your write-up to others.
- In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.
- Do not consult solution manuals or other people’s solutions from similar courses.

### Late Assignment Policy

Each student may use three “late days” for the whole course. The late day count starts from the day and time of the due date of each assignment. Fractional late days are taken into account. Additional “late days” after the first three come at a day cost of 20% of the assignment.

### Attendance Policy

Consistent with WVU guidelines, students absent from regularly scheduled examinations because of authorized University activities will have the opportunity to take them at an alternate time. Make-up exams for absences due to any other reason will be at the discretion of the instructor

## Final Project

Final project directions

## Grading

The course format will predominantly consist of lectures. Students will have few simple programming assignments, as well as homework problems. Each student will complete a final project, and give a project presentation to the class.

Final grades will be based approximately on the following distribution (extra points may be assigned based on class participation):

- 40%: Homework assignments (~4 assignments)
- 30%: Midterm exam
- 30%: Final project

Bonus points may be assigned based on class participation. In addition, creative solutions to homework problems and programming assignments are appreciated, and may be rewarded by extra bonus points.

Grades will be assigned according to the following scale: A=90-100; B=80-89; C=70-79; D=60-69; F= below 60.

## Reference Material

### Textbook

The main textbooks for the class are:

- Class notes and handouts
**M:**Murphy,*Machine Learning: A Probabilistic Perspective*, The MIT Press, 2012. (online)

Additional books that can serve as secondary reference on the topics covered in class:

**B:**Bishop,*Pattern Recognition and Machine Learning*, Springer, 2006. (online)**HTF:**Hastie, Tibshirani, Friedman,*The Elements of Statistical Learning*, Springer, 2009. (online)- Duda, Hart, Stork,
*Pattern Classification*, Wiley, 2000. (online)< - Mitchel,
*Machine Learning*, McGraw-Hill, 1997. (online)

In addition, these are some books for further reading beyond the scope of the course:

- Boyd, Vandenberghe,
*Convex Optimization*, 2004 (online) - Vapnik,
*The Nature of Statistical Learning Theory*, Springer, 2000. (online) - Schoelkopf, Smola,
*Learning with Kernels*, The MIT Press, 2001. (online) - Cristianini, Shawe-Taylor,
*Support Vector Machines and other kernel-based learning methods*, Cambridge University Press, 2000. (online) - Joachims,
*Learning to Classify Text using Support Vector Machines*, Kluwer, 2002. (online) - Manning, Schuetze,
*Foundations of Statistical Natural Language Processing*, MIT Press, 1999. (online) - Manning, Raghavan, Schuetze,
*Introduction to Information Retrieval*, Cambridge, 2008. (online) - Koller, Friedman,
*Probabilistic Graphical Models: Principles and Techniques*, The MIT Press, 2009. (online)

### Very Important Background Handouts

- Probability and Statistics
- Linear Algebra
- B:xC
- Tutorial from Stanford
- Online course from MIT

- Optimization
- B:xD and B:xE
- Tutorial 1 and Tutorial 2 from Stanford

- Matlab
- Tutorial from U. of Dundee

## Inclusivity and Academic Integrity

### Inclusivity Statement

The West Virginia University community is committed to creating and fostering a positive learning and working environment based on open communication, mutual respect, and inclusion. If you are a person with a disability and anticipate needing any type of accommodation in order to participate in this class, please advise me and make appropriate arrangements with the Office of Disability Services (293-6700). For more information on West Virginia University’s Diversity, Equity, and Inclusion initiatives, please see http://diversity.wvu.edu.

### Integrity Statement

The integrity of the classes offered by any academic institution solidifies the foundation of its mission and cannot be sacrificed to expediency, ignorance, or blatant fraud. Therefore, I will enforce rigorous standards of academic integrity in all aspects and assignments of this course. For the detailed policy of West Virginia University regarding the definitions of acts considered to fall under academic dishonesty and possible ensuing sanctions, please see the Student Conduct Code at http://www.arc.wvu.edu/admissions/integrity.html. Should you have any questions about possibly improper research citations or references, or any other activity that may be interpreted as an attempt at academic dishonesty, please see me before the assignment is due to discuss the matter.