Dror Baron -
Information Theoretic Results in Compressed Sensing
Compressed sensing:
Compressed sensing (CS) is a new framework for integrated sensing and
compression. The fundamental revelation is that, if an N-sample signal
x is sparse and has a good K-term approximation in some basis, then
it can be reconstructed using M =O(K log(N/K)) << Nlinear projections of x
onto another basis.
Furthermore, x can be reconstructed using linear
programming, which has polynomial complexity.
Some of the CS projects I have worked on are described here, and
links to numerous other papers appear on the
Nuit Blanche
blog and the
compressed sensing resource page.
My early contribution to CS involved an information theoretic
approach, where a probabilistic framework enables to leverage ideas from source and
channel coding. The CS system resembles a communication system, where source and
channel encoding are replaced by a matrix-vector multiplication, and channel
and source decoding are replaced by CS reconstruction, which typically relies on
optimization routines. I sometimes refer to these steps as CS encoding and
decoding, respectively.
Information theoretic results on CS performance:
By thinking about a linear measurement system in terms of joint source-channel coding,
it is straightforward to employ the source-channel separation theorems
of information theory to derive bounds on the number of CS measurements.
The following paper introduced this interpretation, leading to a
simple lower bound on the number of measurements required:
S. Sarvotham,
D. Baron,
and R. G. Baraniuk,
"Measurements vs. Bits: Compressed Sensing meets Information Theory,"
Proceedings of the 44th Allerton Conference on Communication,
Control, and Computing,
Monticello, IL, September 2006
(talk,
pdf).
A few years later,
Dongning Guo,
Shlomo Shamai, and I established a precise characterizationof CS
performance using replica methods, which have been used to analyze
CDMA systems. Interestingly, we proved that sparse measurement
matrices offer the same performance as dense ones, and that belief
propagation (see below) is an asymptotically optimal algorithm.
It is important to stress that our results are specific to Gaussian
measurement noise, and a specific large-system limit. That said,
the results can be extended to arbitrary (non-Gaussian) noise
via techniques introduced by Dongning and his collaborator C. Wang.
D. Guo,
D. Baron, and
S. Shamai,
"A Single-letter Characterization of Optimal
Noisy Compressed Sensing,"
Proceedings of the 47th Allerton Conference on Communication,
Control, and Computing,
Monticello, IL, September 2009
(ppt,
pdf).
The results with Guo and Shamai are easily extended beyond compressed sensing
to other linear measurement systems. In fact, many areas of
science and engineering feature systems that acquire signals
in a linear or approximately linear fashion. Applications
include multiuser detection (e.g., CDMA), medical imaging,
financial prediction, electromagnetic scattering, and seismics.
Another line of work by
Yihong Wu
and
Sergio Verdu
has explored the reconstruction quality of
CS in the limit of high signal to noise ratio (SNR).
Inspired by the analysis of Wu and Verdu,
Junan Zhu
and I studied the theoretically optimal performance for CS
with finite SNR by evaluating Tanaka's
fixed point equation, and we identified regions with
different behaviors in the reconstruction performance.
J. Zhu and
D. Baron,
"Performance Regions in Compressed Sensing from Noisy Measurements,"
Conf. Inf. Sciences Systems, Baltimore, MD, March 2013
(pdf,
talk).
The following plot illustrates our main findings.
The horizontal axes are the SNR, γ, and measurement rate, R;
the vertical axis is the minimum mean square error (MMSE).
Not surprisingly, the MMSE goes down as R and γ increase.
Interestingly, the (R,γ) region can be separated into regions:
(i) Region 1 is the high measurement rate region,
Tanaka's equation has a single
fixed point, and its solution offers good MMSE performance;
(ii) Region 4 has a poor combination of R and γ, and
Tanaka's equation has a single bad fixed point; and
(iii) Tanaka's equation has multiple fixed points in
the medium rate Regions 2 and 3,
where in Region 2 (lower R) the bad fixed point
has lower free energy (and thus drives the MMSE),
whereas in Region 3 (higher R) the good fixed point has
lower energy. Interestingly, belief propagation (BP) algorithms
operate at the MSE-performance of the bad fixed point, and
even in Region 3 the MSE-performance of BP is poor.
Information theoretic limits on arbitrary error metrics:
In compressed sensing problems, the performance of the estimation process is
often characterized by some error metric that quantifies the distance or distortion
between the original and estimated signals. Some well-known metrics
include squared error, absolute error, and support (nonzero locations) error.
Existing reconstruction algorithms have targeted metrics such as
squared error and support error, but to the best of our knowledge
there was no previous work that could optimize the signal in a manner
that minimizes an arbitrary error metric.
In this work, we developed a general algorithm that deals with
arbitrary error metrics in noisy compressed sensing.
The algorithm relies on the relaxed belief propagation method, which in
the large system limit decouples the multiple linear mixing channels of the vector
compressed sensing acquisition system into a parallel series of
scalar Gaussian channels. We utilized the statistical information
from those scalar Gaussian channels to obtain the expected error metric of
interest, and then found the estimand that minimizes the expected error. We
also gave the fundamental information-theoretic performance limit of an
estimator for a given metric, and verified that our algorithm is
metric-optimal.
J. Tan,
D. Carmon, and D. Baron,
"Signal Estimation with Additive Error Metrics in Compressed
Sensing,"
IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 150-158,
January 2014
(pdf,
arxiv,
software).
J. Tan,
D. Carmon,
and D. Baron,
"Optimal Estimation with Arbitrary Error Metrics in Compressed Sensing,"
IEEE Statistical Signal Processing workshop,
Ann Arbor, MI, August 2012
(pdf,
software).
You may also want to watch the tutorial video about
this research work by
Jin Tan.
In follow up work with Liyi Dai and
Jin Tan,
we asked whether our results on arbitrary error metrics
can yield CS algorithms that minimize the worst case error.
For Gaussian mixture signals x corrupted by Gaussian noise v,
i.e., q = x + v, we proved that a Wiener filter denoiser
(scaling the noisy observations q by a multiplicative constant)
offers asymptotically optimal worst-case error.
Because belief propagation (BP) decouples the CS problem into
a scalar channel of the form q = x + v, Wiener filters are also
useful in minimizing the worst case error in CS.
J. Tan, D. Baron, and L. Dai, "Wiener Filters in Gaussian
Mixture Signal Estimation with l∞-Norm Error," IEEE Trans. Inf.
Theory, vol. 60, no. 10, pp. 6626-6635, October 2014
(pdf,
arxiv).
Although the Wiener filter is asymptotically optimal,
for finite-sized problems its worst case error is sub-optimal.
To design practical worst-case solvers, we observed that
the ell_{p} error
converges to the worst cases error
(ell_{∞}) in the limit of large p.
Motivated by this observation, we decided to estimate x from q
using an ell_{p} estimator, which we denote by
x̂_{p}.
In the figure below, we plot x̂_{p}
as a function of q. (Note that each entry x_{i} of the
vector x is estimated using its corresponding noisy observation
q_{i}.)
The line marked by c*q_{i} is linear in q_{i},
it is the Wiener filter.
For p = 1, 2, and 10, the functions coincide with the
Wiener filter c*q_{i} when |q_{i}| is far from zero.
Near zero, the values of x̂_{p,i} are
strictly zero and x̂_{p,i}
is quite different from the linear Wiener filter
c*q_{i}; it can be seen that x̂_{10,i}
is closer to the straight line than x̂_{2,i},
which is closer than x̂_{1,i}.
These results suggest that, when p goes to infinity,
x̂_{p} converges to Wiener filtering of the
outputs of the AWGN channel. This ell_{p}
estimator was applied to CS, and indeed it reduces the
worst case error.
J. Tan,
D. Baron,
and L. Dai,
"Signal Estimation with Low Infinity-Norm Error
by Minimizing the Mean p-Norm Error,"
Conf. Inf. Sciences Systems, Princeton, NJ, March 2014
(pdf,
slides).
Mismatched estimation in compressed sensing:
Many scientific and engineering problems can be approximated
as linear systems, where the input signal is modeled as a
vector, and each observation is a linear combination of the
entries in the input vector corrupted by white Gaussian noise.
In joint work with
Yanting Ma and
Ahmad Beirami,
the input signal is modeled as a realization of
a vector of independent and identically distributed random
variables. The goal is to estimate the input signal such that
the mean square error (MSE), which is the Euclidean distance
between the estimated signal and the true input signal averaged
over all possible realizations of the input and the observation,
is minimized. It is well-known that the best possible MSE,
minimum mean square error (MMSE), can be achieved by computing
conditional expectation, which is the mean or average value of
the input given the observation vector, where the true
distribution of the input is used.
However, the true distribution is usually not known exactly in
practice, and so conditional expectation is computed with a
postulated distribution that differs from the true distribution;
we call this procedure mismatched estimation, and it
yields an MSE that is higher than the MMSE. We are
interested in characterizing the excess MSE (EMSE) above the
MMSE due to mismatched estimation in large linear systems,
where the length of the input and the number of observations
grow to infinity together, and their ratio is fixed.
Earlier work by Verdu has investigated the EMSE in additive
white Gaussian noise (AWGN) scalar channel estimation problems,
where each observation is the noise corrupted version of one
entry in the input vector, in contrast to a noisy linear
combination of all entries in the linear systems, which we study.
Our work derives the relationship between the EMSE in large
linear systems and that in AWGN scalar channels. We show that
the EMSE in large linear systems is larger than that in scalar
channels using the same postulated distribution of the input;
we call this an amplification effect. Moreover, closed form
approximations of the relationship between the EMSE in large
linear systems and in AWGN scalar channels are provided.
For example, the figure above examines a compressed sensing system
where the input signal is nonzero with probability 0.1. When the
reconstruction algorithm uses an incorrect parameter that moves away
from the correct value of 0.1, we can see that the EMSE increases.
Additionally, our first order approximation is reasonable
when the postulated parameter is close to 0.1; the second
order closed form approximation is more precise.
These results are described in detail in the following paper.
Y. Ma,
D. Baron, and
A. Beirami,
"Mismatched Estimation in Large Linear Systems,"
Proc. IEEE Int. Symp. Inf. Theory,
Hong Kong, June 2015
(pdf,
talk,
arxiv).