ECE 512 - Data Science from a Signal Processing Perspective
Dr. Dror Baron,
email: barondror AT ncsu DOT edu,
office hour (Zoom): Monday 2-3 pm.
email: hliu25 AT ncsu DOT edu,
office hour (Zoom): Friday 11-12.
Classes will be on Monday and Wednesday, 11:45-13:00, EB2 1226.
Modules have been recorded electronically and are available on
- 5 August 2023:
Started updating webpage for Fall 2023 semester.
- 6 August 2023:
Google mailing group.
- 8 August 2023:
continuity plan; and
updated Panopto info.
- 9 August 2023:
- 10 August 2023:
- 12 August 2023:
Probability and Models slides,
including adjustments to how the video modules and supplements are arranged (below).
- 15 August 2023:
Updated grade structure on course webpage;
Homework 1 is due on August 30.
- 20 August 2023:
Our TA will be Ms. Hangjin Liu.
- 24 August 2023:
Homework 2 is due on September 13;
Homework 3 is due on September 20.
- 25 August 2023:
- 26 August 2023:
on computational complexity has been re-recorded.
- 4 September 2023:
Homework 4 is due on September 27;
Homework 5 is due on October 11.
- 6 September 2023:
You may use chatGPT and similar software when working on homeworks
and tests, but must perform "quality control."
And you may volunteer to lead another class discussion
for 1% extra credit.
- 12 September 2023:
(dates of some quizzes and homeworks have shifted, for example
Homework 3 is now due on September 27 and
Homework 4 is now due on October 4).
- 17 September 2023:
Homework 6 is due on October 18; and
Homework 7 is due on October 25.
- 19 September 2023:
About this Course
The main prerequisite is eagerness to learn about data science.
Technical prerequisites include undergraduate signal processing (ECE 421),
probability (ST 371),
comfort in math (linear algebra, calculus, multi-dimensional spaces),
and comfort programming (we will be using Matlab and/or Python; see below).
ECE 512 (Data Science from a Signal Processing Perspective) will acquaint
students with some core basic topics in data science.
Some specific topics that are covered will be described in the course outline.
The course will proceed as follows:
- Scientific programming (including data structures and computational complexity).
- Machine learning basics (classification, clustering, and regression).
- Sparse signal processing (including wavelets).
- Dimensionality reduction (including principle components analysis).
The instructor will be borrowing and inspired by several textbooks (see below).
You need not purchase any of these.
There will also be some references provided (to academic papers) in the slides and assignments;
this is meant for your enrichment if you find that topic of special interest.
- C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
- D. MacKay, Information Theory, Inference, and Learning Algorithms, 2003.
- M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learnin, 2001.
- T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, 1990.
- S. Mallat, A Wavelet Tour of Signal Processing, 1999.
We will be using the Matlab and/or Python languages during the course.
We will have some computer homework questions;
either language can be used to submit homeworks.
(Note that many other programming platforms can and are often used.)
Here are some resources for these languages:
Slides and Modules
Course materials are in several slide decks, where each one covers a major topic.
Under each deck of slides, we organize and describe corresponding modules,
which have been recorded to YouTube (links below).
We also have some supplements, which provide details about some
of the more delicate course topics.
Projects - this set of slides summarizes material
about projects that will appear during the course.
(Note that in the past we used ``projects" to mean assignments with a programming nature that revolved around some application theme. Starting from Fall 2022, these are bunched into homework.)
- We will give quick overview of course, including some administrative stuff, during the first class.
- Module 1 - Motivation for data science and applications. (Intro slides, pages 17-27, 15 minutes.)
- Module 2 - Polynomial curve fitting example (Intro slides, pages 28-38, 21 minutes.)
- Matlab curve fitting example.
Probability and models.
- Linear_algebra module: recording from ECE 421 (signal processing)
reviews basic linear algebra concepts.
- Module 3 - Probability spaces and Bayes' rule (Models slides, pages 1-9, 26 minutes.)
- Module 4 - Random variables, expectation, and variance. (Models slides, pages 10-17, 31 minutes)
- Module 5 - Machine learning terminology. (Models slides, pages 18-23, 7 minutes.)
- Supplement about test and training data.
- Module 6 - Models and minimum description length. (Models slides, pages 24-33, 31 minutes.)
- Note: prior to Fall 2022, we had an (old) Module 7 with various supplements; these have all been retired. The material is now arranged in sub-modules
7a, 7b, and 7c.
- Module 7A - Model complexity, parametric models. (Models slides, pages 34-40, 11 minutes.)
- Module 7B - Penalty for learning. (Models slides, pages 41-49, 16 minutes.)
- Module 7C - Decoding perspective. (Models slides, pages 50-56, 11 minutes.)
- Comprehensive supplement on models, MDL, two-part codes, and model complexity.
- Supplement on norms.
- Module 8 - Kolmogorov complexity. (Models slides, pages 43b-45b, 8 minutes.
Note that these slides appear after page 56, which concludes Module 7c, because more slides were added in Fall 2022.)
- Module 9 - Resource consumption of algorithms. (Scientific programming slides, pages 1-10, 22 minutes.)
- Supplement on two example sorting algorithms.
- Module 10 - Orders of growth of resource consumption. (Scientific programming slides, pages 11-16, 17 minutes.)
- Module 11 - Computational complexity. (Scientific programming slides, pages 17-22, 14 minutes.
Note that this is a 2023 version; here is the
- Supplement on example of computational complexity.
- Module 12 - Algorithm selection. (Scientific programming slides, pages 23-29, 16 minutes.)
- Module 13 - Divide and conquer. (Scientific programming slides, pages 30-33, 11 minutes.)
- mergesort and merge routines developed in class.
- Module 14 - Computational architectures. (Scientific programming slides, pages 34-40, 15 minutes.)
- Module 15 - Parallel processing. (Scientific programming slides, pages 41-44, 6 minutes.)
- Module 16 - Stacks, queues, and linked lists. (Scientific programming slides, pages 45-55, 20 minutes.)
- Module 17 - Graphs. (Scientific programming slides, pages 56-61, 15 minutes.)
- Module 18 - Trees. (Scientific programming slides, pages 62-69, 15 minutes.)
- Module 19 - Profiling. (Scientific programming slides, pages 70-76, 11 minutes.)
- Module 20 - Motivation for optimization. (Optimization slides, pages 1-8, 10 minutes.)
- Supplement providing dynamic programming example.
- Module 21 - Dynamic programming. (Optimization slides, pages 9-22, 26 minutes.)
- Module 22 - Linear programming. (Optimization slides, pages 23-27, 15 minutes.)
- Line search example code.
- Module 23 - Convex programming. (Optimization slides, pages 28-36, 16 minutes.)
- Module 24 - Integer programming. (Optimization slides, pages 37-41, 10 minutes.)
- Module 25 - Non-convex programming. (Optimization slides, pages 42-52, 21 minutes.)
- Annealing example code
(this resembles MCMC discussed for non-convex programming).
- Module 26 - Two classifiers. (Machine learning slides, pages 1-13, 27 minutes.)
- Matlab classification example.
- Supplement about the curse of dimensionality.
- Module 27 - Decision theory. (Machine learning slides, pages 14-16, 20 minutes.)
- Supplement on least squares.
- Module 28 - Clustering. (Machine learning slides, pages 17-20, 21 minutes.)
- Supplement on K means algorithm.
- Supplement about loss functions.
- Module 29 - Linear regression. (Machine learning slides, pages 21-29, 23 minutes.)
- Module 30 - Subset selection. (Machine learning slides, pages 30-34, 13 minutes.)
- Supplement about subset selection.
- Module 31 - Shrinkage. (Machine learning slides, pages 35-45. 17 minutes.)
- Supplement about shrinkage.
- Module 32 - Decision trees. (Machine learning slides, pages 46-49, 4 minutes.)
- Module 33 - Linear classification. (Machine learning slides, pages 50-55, 14 minutes.)
- Module 34 - LDA and QDA. (Machine learning slides, pages 56-63, 29 minutes.)
- Supplement containing an example on Bayesian classification.
- Supplement on Bayesian distributions.
- Module 35 - Logistic regression. (Machine learning slides, pages 64-66, 8 minutes.)
- Module 36 - Basis expansions. (Machine learning slides, pages 67-73, 10 minutes.)
- Module 37 - Kernel methods. (Machine learning slides, pages 74-77, 5 minutes.)
- Module 38 - Support vector machines. (Machine learning slides, pages 78-82, 8 minutes.)
- Convolutional neural networks slides by Abhishek Jain (TA in 2017).
Sparse signal processing.
- Module 39 - Sparsity. (Sparse signal processing slides, pages 1-8, 13 minutes.)
- Module 40 - Bases. (Sparse signal processing slides, pages 9-19, 26 minutes.)
- Supplement on inner product spaces.
- Supplement on bases and LTI systems.
- Module 41 - Frames. (Sparse signal processing slides, pages 20-22, 8 minutes.)
- Module 42 - Wavelets. (Sparse signal processing slides, pages 23-33, 25 minutes.)
- Module 43 - Multi resolution approximation. (Sparse signal processing slides, pages 34-43, 22 minutes.)
- Supplement on direct sums.
- Module 44 - Compressed sensing. (Sparse signal processing slides, pages 44-51, 11 minutes.)
- Module 45 - Compressive signal acquisition. (Sparse signal processing slides, pages 52-64, 12 minutes.)
- Module 46 - Sparse recovery. (Sparse signal processing slides, pages 65-77, 31 minutes.)
- Supplement on LASSO.
- Supplement on machine learning vs. CS.
- Module 47 - Optimal sparse recovery. (Sparse signal processing slides, pages 78-82, 13 minutes.)
- Module 48 - Information theoretic performance limits. (Sparse signal processing slides, pages 83-90, 14 minutes.)
- Supplement on single letter bound for CS.
- Module 49 - Precise performance limits. (Sparse signal processing slides, pages 91-96, 34 minutes.)
- Supplement deriving precise performance limit for CS.
- Module 50 - Approximate message passing. (Sparse signal processing slides, pages 97-106, 16 minutes.)
- Supplement on AMP implementation; and the
AMP and denoise routines developed in class.
- Supplement on solving Tanaka's fixed point equation numerically;
and Matlab for Tanaka's equation.
- Module 51 - Dimensionality reduction. (Dimensionality reduction slides, pages 1-12, 14 minutes.)
- Supplement on deriving PCA.
Below are Matlab and Python implementations for various examples provided
during the course. Many thanks to Dhananjai Ravindra, Jordan Miller, and
Deveshwar Hariharan for translating Matlab scripts to Python!
Assignments and Grading
||% of Grade
||40% (3 tests)
||See course schedule
||Due last week end of course
||See course schedule
|Lead class discussion:
Up to 2-3% extra credit will be provided.
We encourage students to be proactive about their studies,
including class participation, office hours, emails to the instructor and TA,
spotting errors, and making suggestions.
We expect homeworks roughly every 1-2 weeks. They will be posted below, and solutions
will be submitted electronically on
Some of these homeworks will be more theoretical in nature, while others
will be closer to applications, which we hope will help students
appreciate how data science is used in many real world settings.
The final project will involve a topic that 2-3 students choose to work on.
This could involve reading a paper and presenting it to the class,
working on a data set using an algorithm that wasn't covered in depth in class,
or even (hopefully) presenting new results that you worked on.
A list of possible topics for projects appears here.
You will be submitting a final report (3-4 pages are expected; you do not need to attach code)
and video recording of a presentation (5 minutes).
The reports and videos will be peer-graded by other students; each student will peer-grade
several reports and videos.
Overall, the objective of the final project is to provide
students a personalized learning experience and an opportunity to present
their findings to the class.
Regulations for individual projects can be found
Based on our grading rubric
for the final project, you can see that 10% of the grade requires timely submission
of the project proposal; the video and report will each be 45%.
The project proposal will briefly describe the way how you envision your
project, in order to help us make sure that you are on track.
Below are past tests (and their solutions) throughout the history of this course.
Students are encouraged to send feedback to the instructional staff.