## Dror Baron - Information Theoretic Results in Compressed Sensing

Compressed sensing: Compressed sensing (CS) is a new framework for integrated sensing and compression. The fundamental revelation is that, if an N-sample signal x is sparse and has a good K-term approximation in some basis, then it can be reconstructed using M =O(K log(N/K)) << N linear projections of x onto another basis. Furthermore, x can be reconstructed using linear programming, which has polynomial complexity. Some of the CS projects I have worked on are described here, and links to numerous other papers appear on the Nuit Blanche blog and the compressed sensing resource page.

My early contribution to CS involved an information theoretic approach, where a probabilistic framework enables to leverage ideas from source and channel coding. The CS system resembles a communication system, where source and channel encoding are replaced by a matrix-vector multiplication, and channel and source decoding are replaced by CS reconstruction, which typically relies on optimization routines. I sometimes refer to these steps as CS encoding and decoding, respectively.
Information theoretic results on CS performance: By thinking about a linear measurement system in terms of joint source-channel coding, it is straightforward to employ the source-channel separation theorems of information theory to derive bounds on the number of CS measurements. The following paper introduced this interpretation, leading to a simple lower bound on the number of measurements required:
• S. Sarvotham, D. Baron, and R. G. Baraniuk, "Measurements vs. Bits: Compressed Sensing meets Information Theory," Proceedings of the 44th Allerton Conference on Communication, Control, and Computing, Monticello, IL, September 2006 (talk, pdf).
A few years later, Dongning Guo, Shlomo Shamai, and I established a precise characterizationof CS performance using replica methods, which have been used to analyze CDMA systems. Interestingly, we proved that sparse measurement matrices offer the same performance as dense ones, and that belief propagation (see below) is an asymptotically optimal algorithm. It is important to stress that our results are specific to Gaussian measurement noise, and a specific large-system limit. That said, the results can be extended to arbitrary (non-Gaussian) noise via techniques introduced by Dongning and his collaborator C. Wang.
• D. Guo, D. Baron, and S. Shamai, "A Single-letter Characterization of Optimal Noisy Compressed Sensing," Proceedings of the 47th Allerton Conference on Communication, Control, and Computing, Monticello, IL, September 2009 (ppt, pdf).
This work was also featured on Igor Carron's famous blog, and is discussed in detail in a talk I gave at Google Research, Mountain View, CA, October 2009. The results with Guo and Shamai are easily extended beyond compressed sensing to other linear measurement systems. In fact, many areas of science and engineering feature systems that acquire signals in a linear or approximately linear fashion. Applications include multiuser detection (e.g., CDMA), medical imaging, financial prediction, electromagnetic scattering, and seismics.
Another line of work by Yihong Wu and Sergio Verdu has explored the reconstruction quality of CS in the limit of high signal to noise ratio (SNR). Inspired by the analysis of Wu and Verdu, Junan Zhu and I studied the theoretically optimal performance for CS with finite SNR by evaluating Tanaka's fixed point equation, and we identified regions with different behaviors in the reconstruction performance.
• J. Zhu and D. Baron, "Performance Regions in Compressed Sensing from Noisy Measurements," Conf. Inf. Sciences Systems, Baltimore, MD, March 2013 (pdf, talk).
The following plot illustrates our main findings. The horizontal axes are the SNR, γ, and measurement rate, R; the vertical axis is the minimum mean square error (MMSE). Not surprisingly, the MMSE goes down as R and γ increase. Interestingly, the (R,γ) region can be separated into regions: (i) Region 1 is the high measurement rate region, Tanaka's equation has a single fixed point, and its solution offers good MMSE performance; (ii) Region 4 has a poor combination of R and γ, and Tanaka's equation has a single bad fixed point; and (iii) Tanaka's equation has multiple fixed points in the medium rate Regions 2 and 3, where in Region 2 (lower R) the bad fixed point has lower free energy (and thus drives the MMSE), whereas in Region 3 (higher R) the good fixed point has lower energy. Interestingly, belief propagation (BP) algorithms operate at the MSE-performance of the bad fixed point, and even in Region 3 the MSE-performance of BP is poor.

Information theoretic limits on arbitrary error metrics: In compressed sensing problems, the performance of the estimation process is often characterized by some error metric that quantifies the distance or distortion between the original and estimated signals. Some well-known metrics include squared error, absolute error, and support (nonzero locations) error. Existing reconstruction algorithms have targeted metrics such as squared error and support error, but to the best of our knowledge there was no previous work that could optimize the signal in a manner that minimizes an arbitrary error metric.

In this work, we developed a general algorithm that deals with arbitrary error metrics in noisy compressed sensing. The algorithm relies on the relaxed belief propagation method, which in the large system limit decouples the multiple linear mixing channels of the vector compressed sensing acquisition system into a parallel series of scalar Gaussian channels. We utilized the statistical information from those scalar Gaussian channels to obtain the expected error metric of interest, and then found the estimand that minimizes the expected error. We also gave the fundamental information-theoretic performance limit of an estimator for a given metric, and verified that our algorithm is metric-optimal.

In follow up work with Liyi Dai and Jin Tan, we asked whether our results on arbitrary error metrics can yield CS algorithms that minimize the worst case error. For Gaussian mixture signals x corrupted by Gaussian noise v, i.e., q = x + v, we proved that a Wiener filter denoiser (scaling the noisy observations q by a multiplicative constant) offers asymptotically optimal worst-case error. Because belief propagation (BP) decouples the CS problem into a scalar channel of the form q = x + v, Wiener filters are also useful in minimizing the worst case error in CS.

• J. Tan, D. Baron, and L. Dai, "Wiener Filters in Gaussian Mixture Signal Estimation with l∞-Norm Error," IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 6626-6635, October 2014 (pdf, arxiv).
Although the Wiener filter is asymptotically optimal, for finite-sized problems its worst case error is sub-optimal. To design practical worst-case solvers, we observed that the ellp error converges to the worst cases error (ell) in the limit of large p. Motivated by this observation, we decided to estimate x from q using an ellp estimator, which we denote by x̂p. In the figure below, we plot x̂p as a function of q. (Note that each entry xi of the vector x is estimated using its corresponding noisy observation qi.) The line marked by c*qi is linear in qi, it is the Wiener filter. For p = 1, 2, and 10, the functions coincide with the Wiener filter c*qi when |qi| is far from zero. Near zero, the values of x̂p,i are strictly zero and x̂p,i is quite different from the linear Wiener filter c*qi; it can be seen that x̂10,i is closer to the straight line than x̂2,i, which is closer than x̂1,i. These results suggest that, when p goes to infinity, x̂p converges to Wiener filtering of the outputs of the AWGN channel. This ellp estimator was applied to CS, and indeed it reduces the worst case error.
• J. Tan, D. Baron, and L. Dai, "Signal Estimation with Low Infinity-Norm Error by Minimizing the Mean p-Norm Error," Conf. Inf. Sciences Systems, Princeton, NJ, March 2014 (pdf, slides).

Mismatched estimation in compressed sensing: Many scientific and engineering problems can be approximated as linear systems, where the input signal is modeled as a vector, and each observation is a linear combination of the entries in the input vector corrupted by white Gaussian noise. In joint work with Yanting Ma and Ahmad Beirami, the input signal is modeled as a realization of a vector of independent and identically distributed random variables. The goal is to estimate the input signal such that the mean square error (MSE), which is the Euclidean distance between the estimated signal and the true input signal averaged over all possible realizations of the input and the observation, is minimized. It is well-known that the best possible MSE, minimum mean square error (MMSE), can be achieved by computing conditional expectation, which is the mean or average value of the input given the observation vector, where the true distribution of the input is used. However, the true distribution is usually not known exactly in practice, and so conditional expectation is computed with a postulated distribution that differs from the true distribution; we call this procedure mismatched estimation, and it yields an MSE that is higher than the MMSE. We are interested in characterizing the excess MSE (EMSE) above the MMSE due to mismatched estimation in large linear systems, where the length of the input and the number of observations grow to infinity together, and their ratio is fixed.

Earlier work by Verdu has investigated the EMSE in additive white Gaussian noise (AWGN) scalar channel estimation problems, where each observation is the noise corrupted version of one entry in the input vector, in contrast to a noisy linear combination of all entries in the linear systems, which we study. Our work derives the relationship between the EMSE in large linear systems and that in AWGN scalar channels. We show that the EMSE in large linear systems is larger than that in scalar channels using the same postulated distribution of the input; we call this an amplification effect. Moreover, closed form approximations of the relationship between the EMSE in large linear systems and in AWGN scalar channels are provided. For example, the figure above examines a compressed sensing system where the input signal is nonzero with probability 0.1. When the reconstruction algorithm uses an incorrect parameter that moves away from the correct value of 0.1, we can see that the EMSE increases. Additionally, our first order approximation is reasonable when the postulated parameter is close to 0.1; the second order closed form approximation is more precise. These results are described in detail in the following paper.

Back to my homepage.
Last updated May 2015.