ECE 512 - Data Science from a Signal Processing Perspective
Fall 2024
Instructor:
Dr. Dror Baron,
email: barondror AT ncsu DOT edu,
office hour (Zoom): Wednesday 3-4 pm.
Teaching assistant:
None.
Classrooms:
Classes will be on Monday and Wednesday, 11:45-13:00, EB2 1229.
Modules have been recorded electronically and are available on
Youtube.
Announcements
- 18 August 2024:
Started updating webpage for Fall 2024 semester.
- 21 August 2024:
Quiz 1 is online, and due on August 24;
Homework (HW) 1 is due on August 28.
Take the quiz and submit homeworks on Moodle.
- 25 August 2024:
Quiz 2 is online, and due on August 29;
HW 2 is due on September 11.
- 27 August 2024:
Schedule has been updated.
- 1 September 2024:
Quiz 3 is online; and
HW 3 is due on September 25.
- 15 September 2024:
Quiz 4 is online; and
HW 4 is due on October 2.
- 29 September 2024:
HW 5 is due on October 9;
HW 6 is due on October 16;
Quizzes 5-6 are due the days that HW5-6 are due; and
an updated list of suggested
topics for projects.
- 9 October 2024:
2024 Test1 and its
solution.
- 13 October 2024:
HW 7 is due on October 23;
HW 8 is due on October 30;
HW 9 is due on November 25;
and Quizzes 7-9 will be due based on our schedule.
- 16 October 2024:
Schedule has been updated.
- 21 October 2024:
Schedule has been updated.
- 15 November 2024:
2024 Test2 and its
solution.
Useful Links
About this Course
Prerequisites
The main prerequisite is eagerness to learn about data science.
Technical prerequisites include undergraduate signal processing (ECE 421),
probability (ST 371),
comfort in math (linear algebra, calculus, multi-dimensional spaces),
and comfort programming (we will be using Matlab and/or Python; see below).
Purpose
ECE 512 (Data Science from a Signal Processing Perspective) will acquaint
students with some core basic topics in data science.
Some specific topics that are covered will be described in the course outline.
Course Outline
The course will proceed as follows:
- Introduction.
- Scientific programming (including data structures and computational complexity).
- Optimization.
- Machine learning basics (classification, clustering, and regression).
- Sparse signal processing (including wavelets).
- Dimensionality reduction (including principle components analysis).
Course Materials
Textbook
The instructor will be borrowing and inspired by several textbooks (see below).
You need not purchase any of these.
There will also be some references provided (to academic papers) in the slides and assignments;
this is meant for your enrichment if you find that topic of special interest.
- C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
- D. MacKay, Information Theory, Inference, and Learning Algorithms, 2003.
- M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learnin, 2001.
- T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, 1990.
- S. Mallat, A Wavelet Tour of Signal Processing, 1999.
Matlab/Python
We will be using the Matlab and/or Python languages during the course.
We will have some computer homework questions;
either language can be used to submit homeworks.
(Note that many other programming platforms can and are often used.)
Here are some resources for these languages:
Slides and Modules
Course materials are in several slide decks, where each one covers a major topic.
Under each deck of slides, we organize and describe corresponding modules,
which have been recorded to YouTube (links below).
We also have some supplements, which provide details about some
of the more delicate course topics.
-
Projects - this set of slides summarizes material
about projects that will appear during the course.
(Note that in the past we used ``projects" to mean assignments with a programming nature that revolved around some application theme. Starting from Fall 2022, these are bunched into homework.)
-
Introduction.
- We will give quick overview of course, including some administrative stuff, during the first class.
- Module 1 - Motivation for data science and applications. (Intro slides, pages 17-27, 15 minutes.)
- Module 2 - Polynomial curve fitting example (Intro slides, pages 28-38, 21 minutes.)
- Matlab curve fitting example.
-
Probability and models.
- Linear_algebra module: recording from ECE 421 (signal processing)
reviews basic linear algebra concepts.
- Module 3 - Probability spaces and Bayes' rule (Models slides, pages 1-9, 26 minutes.)
- Module 4 - Random variables, expectation, and variance. (Models slides, pages 10-17, 31 minutes)
- Module 5 - Machine learning terminology. (Models slides, pages 18-23, 7 minutes.)
- Supplement about test and training data.
- Module 6 - Models and minimum description length. (Models slides, pages 24-33, 31 minutes.)
- Note: prior to Fall 2022, we had an (old) Module 7 with various supplements; these have all been retired. The material is now arranged in sub-modules
7a, 7b, and 7c.
- Module 7A - Model complexity, parametric models. (Models slides, pages 34-40, 11 minutes.)
- Module 7B - Penalty for learning. (Models slides, pages 41-49, 16 minutes.)
- Module 7C - Decoding perspective. (Models slides, pages 50-56, 11 minutes.)
- Comprehensive supplement on models, MDL, two-part codes, and model complexity.
- Supplement on norms.
- Module 8 - Kolmogorov complexity. (Models slides, pages 43b-45b, 8 minutes.
Note that these slides appear after page 56, which concludes Module 7c, because more slides were added in Fall 2022.)
-
Scientific programming.
- Module 9 - Resource consumption of algorithms. (Scientific programming slides, pages 1-10, 22 minutes.)
- Supplement on two example sorting algorithms.
- Module 10 - Orders of growth of resource consumption. (Scientific programming slides, pages 11-16, 17 minutes.)
- Module 11 - Computational complexity. (Scientific programming slides, pages 17-22, 14 minutes.
Note that this is a 2023 version; here is the
2020 recording.)
- Supplement on example of computational complexity.
- Module 12 - Algorithm selection. (Scientific programming slides, pages 23-29, 16 minutes.)
- Module 13 - Divide and conquer. (Scientific programming slides, pages 30-33, 11 minutes.)
- mergesort and merge routines developed in class.
- Module 14 - Computational architectures. (Scientific programming slides, pages 34-40, 15 minutes.)
- Module 15 - Parallel processing. (Scientific programming slides, pages 41-44, 6 minutes.)
- Module 16 - Stacks, queues, and linked lists. (Scientific programming slides, pages 45-55, 20 minutes.)
- Module 17 - Graphs. (Scientific programming slides, pages 56-61, 15 minutes.)
- Module 18 - Trees. (Scientific programming slides, pages 62-69, 15 minutes.)
- Module 19 - Profiling. (Scientific programming slides, pages 70-76, 11 minutes.)
-
Optimization.
- Module 20 - Motivation for optimization. (Optimization slides, pages 1-8, 10 minutes.)
- Supplement providing dynamic programming example.
- Module 21 - Dynamic programming. (Optimization slides, pages 9-22, 26 minutes.)
- Module 22 - Linear programming. (Optimization slides, pages 23-27, 15 minutes.)
- Line search example code.
- Module 23 - Convex programming. (Optimization slides, pages 28-36, 16 minutes.)
- Module 24 - Integer programming. (Optimization slides, pages 37-41, 10 minutes.)
- Module 25 - Non-convex programming. (Optimization slides, pages 42-52, 21 minutes.)
- Annealing example code
(this resembles MCMC discussed for non-convex programming).
-
Machine learning.
- Module 26 - Two classifiers. (Machine learning slides, pages 1-13, 27 minutes.)
- Matlab classification example.
- Supplement about the curse of dimensionality.
- Module 27 - Decision theory. (Machine learning slides, pages 14-16, 20 minutes.)
- Supplement on least squares.
- Module 28 - Clustering. (Machine learning slides, pages 17-20, 21 minutes.)
- Supplement on K means algorithm.
- Supplement about loss functions.
- Module 29 - Linear regression. (Machine learning slides, pages 21-29, 23 minutes.)
- Module 30 - Subset selection. (Machine learning slides, pages 30-34, 13 minutes.)
- Supplement about subset selection.
- Module 31 - Shrinkage. (Machine learning slides, pages 35-45. 17 minutes.)
- Supplement about shrinkage.
- Module 32 - Decision trees. (Machine learning slides, pages 46-49, 4 minutes.)
- Module 33 - Linear classification. (Machine learning slides, pages 50-55, 14 minutes.)
- Module 34 - LDA and QDA. (Machine learning slides, pages 56-63, 29 minutes.)
- Supplement containing an example on Bayesian classification.
- Supplement on Bayesian distributions.
- Module 35 - Logistic regression. (Machine learning slides, pages 64-66, 8 minutes.)
- Module 36 - Basis expansions. (Machine learning slides, pages 67-73, 10 minutes.)
- Module 37 - Kernel methods. (Machine learning slides, pages 74-77, 5 minutes.)
- Module 38 - Support vector machines. (Machine learning slides, pages 78-82, 8 minutes.)
- Convolutional neural networks slides by Abhishek Jain (TA in 2017).
-
Sparse signal processing.
- Module 39 - Sparsity. (Sparse signal processing slides, pages 1-8, 13 minutes.)
- Module 40 - Bases. (Sparse signal processing slides, pages 9-19, 26 minutes.)
- Supplement on inner product spaces.
- Supplement on bases and LTI systems.
- Module 41 - Frames. (Sparse signal processing slides, pages 20-22, 8 minutes.)
- Module 42 - Wavelets. (Sparse signal processing slides, pages 23-33, 25 minutes.)
- Module 43 - Multi resolution approximation. (Sparse signal processing slides, pages 34-43, 22 minutes.)
- Supplement on direct sums.
- Module 44 - Compressed sensing. (Sparse signal processing slides, pages 44-51, 11 minutes.)
- Module 45 - Compressive signal acquisition. (Sparse signal processing slides, pages 52-64, 12 minutes.)
- Module 46 - Sparse recovery. (Sparse signal processing slides, pages 65-77, 31 minutes.)
- Supplement on LASSO.
- Supplement on machine learning vs. CS.
- Module 47 - Optimal sparse recovery. (Sparse signal processing slides, pages 78-82, 13 minutes.)
- Module 48 - Information theoretic performance limits. (Sparse signal processing slides, pages 83-90, 14 minutes.)
- Supplement on single letter bound for CS.
- Module 49 - Precise performance limits. (Sparse signal processing slides, pages 91-96, 34 minutes.)
- Supplement deriving precise performance limit for CS.
- Module 50 - Approximate message passing. (Sparse signal processing slides, pages 97-106, 16 minutes.)
- Supplement on AMP implementation; and the
AMP and denoise routines developed in class.
- Supplement on solving Tanaka's fixed point equation numerically;
and Matlab for Tanaka's equation.
-
Dimensionality reduction.
- Module 51 - Dimensionality reduction. (Dimensionality reduction slides, pages 1-12, 14 minutes.)
- Supplement on deriving PCA.
Software
Below are Matlab and Python implementations for various examples provided
during the course. Many thanks to Dhananjai Ravindra, Jordan Miller, and
Deveshwar Hariharan for translating Matlab scripts to Python!
Assignments and Grading
Component |
% of Grade |
Due Date |
Tests: |
40% (3 tests) |
See course schedule |
Homework: |
30% |
Throughout course |
Final Project: |
20% |
Due last week end of course |
Quizzes: |
5% |
See course schedule |
Lead class discussion: |
5% |
Schedule TBD |
Up to 2-3% extra credit will be provided.
We encourage students to be proactive about their studies,
including class participation, office hours, evmails to the instructor and TA,
spotting errors, and making suggestions.
Homework
We expect homeworks roughly every 1-2 weeks. They will be posted below, and solutions
will be submitted electronically on
Moodle.
Some of these homeworks will be more theoretical in nature, while others
will be closer to applications, which we hope will help students
appreciate how data science is used in many real world settings.
Final Project
The final project will involve a topic that 2-3 students choose to work on.
This could involve reading a paper and presenting it to the class,
working on a data set using an algorithm that wasn't covered in depth in class,
or even (hopefully) presenting new results that you worked on.
A list of possible topics for projects appears here;
it has been updated in Fall 2024.
You will be submitting a final report (3-4 pages are expected; you do not need to attach code). In some past years, students submitted video recordings;
we may have presentations in class this year.
The reports and videos / in-class presentations will be peer-graded by other students; each student will peer-grade
several reports and videos.
Overall, the objective of the final project is to provide
students a personalized learning experience and an opportunity to present
their findings to the class.
Regulations for individual projects can be found
here.
Note that 5% of the project grade involves timely submission of the project proposal.
Dates when the project reports and peer grading are due will be advertised during the semester.
The project proposal will briefly describe the way how you envision your
project, in order to help us make sure that you are on track.
Tests
Below are past tests (and their solutions) throughout the history of this course.
Feedback
Students are encouraged to send feedback to the instructional staff.