ECE 512  Data Science from a Signal Processing Perspective
Fall 2023
Instructor:
Dr. Dror Baron,
email: barondror AT ncsu DOT edu,
office hour (Zoom): Monday 23 pm.
Teaching assistant:
Hangjin Liu,
email: hliu25 AT ncsu DOT edu,
office hour (Zoom): Friday 1112.
Classrooms:
Classes will be on Monday and Wednesday, 11:4513:00, EB2 1226.
Modules have been recorded electronically and are available on
Youtube.
Announcements
 5 August 2023:
Started updating webpage for Fall 2023 semester.
 6 August 2023:
Moodle page;
Panopto videos;
Google mailing group.
 8 August 2023:
Tentative schedule;
continuity plan; and
updated Panopto info.
 9 August 2023:
Syllabus.
 10 August 2023:
Updated the
introduction slides.
 12 August 2023:
Updated the
Probability and Models slides,
including adjustments to how the video modules and supplements are arranged (below).
 15 August 2023:
Updated grade structure on course webpage;
Homework 1 is due on August 30.
 20 August 2023:
Our TA will be Ms. Hangjin Liu.
 24 August 2023:
Homework 2 is due on September 13;
Homework 3 is due on September 20.
 25 August 2023:
Updated
syllabus.
 26 August 2023:
Module 11
on computational complexity has been rerecorded.
 4 September 2023:
Homework 4 is due on September 27;
Homework 5 is due on October 11.
 6 September 2023:
You may use chatGPT and similar software when working on homeworks
and tests, but must perform "quality control."
And you may volunteer to lead another class discussion
for 1% extra credit.
 12 September 2023:
Tentative schedule
(dates of some quizzes and homeworks have shifted, for example
Homework 3 is now due on September 27 and
Homework 4 is now due on October 4).
 17 September 2023:
Homework 6 is due on October 18; and
Homework 7 is due on October 25.
 19 September 2023:
Updated the
optimization slides.
Useful Links
About this Course
Prerequisites
The main prerequisite is eagerness to learn about data science.
Technical prerequisites include undergraduate signal processing (ECE 421),
probability (ST 371),
comfort in math (linear algebra, calculus, multidimensional spaces),
and comfort programming (we will be using Matlab and/or Python; see below).
Purpose
ECE 512 (Data Science from a Signal Processing Perspective) will acquaint
students with some core basic topics in data science.
Some specific topics that are covered will be described in the course outline.
Course Outline
The course will proceed as follows:
 Introduction.
 Scientific programming (including data structures and computational complexity).
 Optimization.
 Machine learning basics (classification, clustering, and regression).
 Sparse signal processing (including wavelets).
 Dimensionality reduction (including principle components analysis).
Course Materials
Textbook
The instructor will be borrowing and inspired by several textbooks (see below).
You need not purchase any of these.
There will also be some references provided (to academic papers) in the slides and assignments;
this is meant for your enrichment if you find that topic of special interest.
 C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
 D. MacKay, Information Theory, Inference, and Learning Algorithms, 2003.
 M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.
 T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learnin, 2001.
 T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, 1990.
 S. Mallat, A Wavelet Tour of Signal Processing, 1999.
Matlab/Python
We will be using the Matlab and/or Python languages during the course.
We will have some computer homework questions;
either language can be used to submit homeworks.
(Note that many other programming platforms can and are often used.)
Here are some resources for these languages:
Slides and Modules
Course materials are in several slide decks, where each one covers a major topic.
Under each deck of slides, we organize and describe corresponding modules,
which have been recorded to YouTube (links below).
We also have some supplements, which provide details about some
of the more delicate course topics.

Projects  this set of slides summarizes material
about projects that will appear during the course.
(Note that in the past we used ``projects" to mean assignments with a programming nature that revolved around some application theme. Starting from Fall 2022, these are bunched into homework.)

Introduction.
 We will give quick overview of course, including some administrative stuff, during the first class.
 Module 1  Motivation for data science and applications. (Intro slides, pages 1727, 15 minutes.)
 Module 2  Polynomial curve fitting example (Intro slides, pages 2838, 21 minutes.)
 Matlab curve fitting example.

Probability and models.
 Linear_algebra module: recording from ECE 421 (signal processing)
reviews basic linear algebra concepts.
 Module 3  Probability spaces and Bayes' rule (Models slides, pages 19, 26 minutes.)
 Module 4  Random variables, expectation, and variance. (Models slides, pages 1017, 31 minutes)
 Module 5  Machine learning terminology. (Models slides, pages 1823, 7 minutes.)
 Supplement about test and training data.
 Module 6  Models and minimum description length. (Models slides, pages 2433, 31 minutes.)
 Note: prior to Fall 2022, we had an (old) Module 7 with various supplements; these have all been retired. The material is now arranged in submodules
7a, 7b, and 7c.
 Module 7A  Model complexity, parametric models. (Models slides, pages 3440, 11 minutes.)
 Module 7B  Penalty for learning. (Models slides, pages 4149, 16 minutes.)
 Module 7C  Decoding perspective. (Models slides, pages 5056, 11 minutes.)
 Comprehensive supplement on models, MDL, twopart codes, and model complexity.
 Supplement on norms.
 Module 8  Kolmogorov complexity. (Models slides, pages 43b45b, 8 minutes.
Note that these slides appear after page 56, which concludes Module 7c, because more slides were added in Fall 2022.)

Scientific programming.
 Module 9  Resource consumption of algorithms. (Scientific programming slides, pages 110, 22 minutes.)
 Supplement on two example sorting algorithms.
 Module 10  Orders of growth of resource consumption. (Scientific programming slides, pages 1116, 17 minutes.)
 Module 11  Computational complexity. (Scientific programming slides, pages 1722, 14 minutes.
Note that this is a 2023 version; here is the
2020 recording.)
 Supplement on example of computational complexity.
 Module 12  Algorithm selection. (Scientific programming slides, pages 2329, 16 minutes.)
 Module 13  Divide and conquer. (Scientific programming slides, pages 3033, 11 minutes.)
 mergesort and merge routines developed in class.
 Module 14  Computational architectures. (Scientific programming slides, pages 3440, 15 minutes.)
 Module 15  Parallel processing. (Scientific programming slides, pages 4144, 6 minutes.)
 Module 16  Stacks, queues, and linked lists. (Scientific programming slides, pages 4555, 20 minutes.)
 Module 17  Graphs. (Scientific programming slides, pages 5661, 15 minutes.)
 Module 18  Trees. (Scientific programming slides, pages 6269, 15 minutes.)
 Module 19  Profiling. (Scientific programming slides, pages 7076, 11 minutes.)

Optimization.
 Module 20  Motivation for optimization. (Optimization slides, pages 18, 10 minutes.)
 Supplement providing dynamic programming example.
 Module 21  Dynamic programming. (Optimization slides, pages 922, 26 minutes.)
 Module 22  Linear programming. (Optimization slides, pages 2327, 15 minutes.)
 Line search example code.
 Module 23  Convex programming. (Optimization slides, pages 2836, 16 minutes.)
 Module 24  Integer programming. (Optimization slides, pages 3741, 10 minutes.)
 Module 25  Nonconvex programming. (Optimization slides, pages 4252, 21 minutes.)
 Annealing example code
(this resembles MCMC discussed for nonconvex programming).

Machine learning.
 Module 26  Two classifiers. (Machine learning slides, pages 113, 27 minutes.)
 Matlab classification example.
 Supplement about the curse of dimensionality.
 Module 27  Decision theory. (Machine learning slides, pages 1416, 20 minutes.)
 Supplement on least squares.
 Module 28  Clustering. (Machine learning slides, pages 1720, 21 minutes.)
 Supplement on K means algorithm.
 Supplement about loss functions.
 Module 29  Linear regression. (Machine learning slides, pages 2129, 23 minutes.)
 Module 30  Subset selection. (Machine learning slides, pages 3034, 13 minutes.)
 Supplement about subset selection.
 Module 31  Shrinkage. (Machine learning slides, pages 3545. 17 minutes.)
 Supplement about shrinkage.
 Module 32  Decision trees. (Machine learning slides, pages 4649, 4 minutes.)
 Module 33  Linear classification. (Machine learning slides, pages 5055, 14 minutes.)
 Module 34  LDA and QDA. (Machine learning slides, pages 5663, 29 minutes.)
 Supplement containing an example on Bayesian classification.
 Supplement on Bayesian distributions.
 Module 35  Logistic regression. (Machine learning slides, pages 6466, 8 minutes.)
 Module 36  Basis expansions. (Machine learning slides, pages 6773, 10 minutes.)
 Module 37  Kernel methods. (Machine learning slides, pages 7477, 5 minutes.)
 Module 38  Support vector machines. (Machine learning slides, pages 7882, 8 minutes.)
 Convolutional neural networks slides by Abhishek Jain (TA in 2017).

Sparse signal processing.
 Module 39  Sparsity. (Sparse signal processing slides, pages 18, 13 minutes.)
 Module 40  Bases. (Sparse signal processing slides, pages 919, 26 minutes.)
 Supplement on inner product spaces.
 Supplement on bases and LTI systems.
 Module 41  Frames. (Sparse signal processing slides, pages 2022, 8 minutes.)
 Module 42  Wavelets. (Sparse signal processing slides, pages 2333, 25 minutes.)
 Module 43  Multi resolution approximation. (Sparse signal processing slides, pages 3443, 22 minutes.)
 Supplement on direct sums.
 Module 44  Compressed sensing. (Sparse signal processing slides, pages 4451, 11 minutes.)
 Module 45  Compressive signal acquisition. (Sparse signal processing slides, pages 5264, 12 minutes.)
 Module 46  Sparse recovery. (Sparse signal processing slides, pages 6577, 31 minutes.)
 Supplement on LASSO.
 Supplement on machine learning vs. CS.
 Module 47  Optimal sparse recovery. (Sparse signal processing slides, pages 7882, 13 minutes.)
 Module 48  Information theoretic performance limits. (Sparse signal processing slides, pages 8390, 14 minutes.)
 Supplement on single letter bound for CS.
 Module 49  Precise performance limits. (Sparse signal processing slides, pages 9196, 34 minutes.)
 Supplement deriving precise performance limit for CS.
 Module 50  Approximate message passing. (Sparse signal processing slides, pages 97106, 16 minutes.)
 Supplement on AMP implementation; and the
AMP and denoise routines developed in class.
 Supplement on solving Tanaka's fixed point equation numerically;
and Matlab for Tanaka's equation.

Dimensionality reduction.
 Module 51  Dimensionality reduction. (Dimensionality reduction slides, pages 112, 14 minutes.)
 Supplement on deriving PCA.
Software
Below are Matlab and Python implementations for various examples provided
during the course. Many thanks to Dhananjai Ravindra, Jordan Miller, and
Deveshwar Hariharan for translating Matlab scripts to Python!
Assignments and Grading
Component 
% of Grade 
Due Date 
Tests: 
40% (3 tests) 
See course schedule 
Homework: 
30% 
Throughout course 
Final Project: 
20% 
Due last week end of course 
Quizzes: 
5% 
See course schedule 
Lead class discussion: 
5% 
Schedule TBD 
Up to 23% extra credit will be provided.
We encourage students to be proactive about their studies,
including class participation, office hours, emails to the instructor and TA,
spotting errors, and making suggestions.
Homework
We expect homeworks roughly every 12 weeks. They will be posted below, and solutions
will be submitted electronically on
Moodle.
Some of these homeworks will be more theoretical in nature, while others
will be closer to applications, which we hope will help students
appreciate how data science is used in many real world settings.
Final Project
The final project will involve a topic that 23 students choose to work on.
This could involve reading a paper and presenting it to the class,
working on a data set using an algorithm that wasn't covered in depth in class,
or even (hopefully) presenting new results that you worked on.
A list of possible topics for projects appears here.
You will be submitting a final report (34 pages are expected; you do not need to attach code)
and video recording of a presentation (5 minutes).
The reports and videos will be peergraded by other students; each student will peergrade
several reports and videos.
Overall, the objective of the final project is to provide
students a personalized learning experience and an opportunity to present
their findings to the class.
Regulations for individual projects can be found
here.
Based on our grading rubric
for the final project, you can see that 10% of the grade requires timely submission
of the project proposal; the video and report will each be 45%.
The project proposal will briefly describe the way how you envision your
project, in order to help us make sure that you are on track.
Tests
Below are past tests (and their solutions) throughout the history of this course.
Feedback
Students are encouraged to send feedback to the instructional staff.