Denoising, lossy compression, and signal estimation:
There are fundamental connections between denoising, lossy compression, signal estimation
(including in the linear setting of compressed sensing),
and other statistical inference problems.
For example, a denoising problem where an input x is observed with noise z,
y=x+z, can be solved by applying a lossy compressor to y. That is, we seek the portion
of y that is compressible and thus has low entropy.
Over the last several years, we have been exploring a framework based on the
minimum description length (MDL) principle to solve these sorts
of problems. This line of work started when
Tsachy Weissman
and I took his previous work on universal lossy compression of binary vectors
and extended it to lossy compression of continuous-valued signals.
The main idea is to examine a quantized version of the signal,
Q(w), where Q is a quantizer and w is a discrete-valued vector.
We then quantify the information complexity of Q(w) by evaluating
the length required by a universal data compressor to compress w,
which we denote by U(w). The goal becomes to find a good trade-off
between U(w) and the quality that Q(w) provides in describing the data.
For example:
Lossy compression:
The original signal is x, we want to compress it lossily,
the quality provided by Q(w) can be quantified by D(Q(w),x),
where D is a distortion function (e.g., square error),
and the optimization becomes
ŵ=argminw{U(w)+ λD(x-Q(w))},
where λ is a Lagrangian parameter, which relates to
the slope of the rate distortion function, and
x̂=Q(ŵ).
Denoising:
The goal is to estimate x from noisy observations, y=x+z,
and the optimization problem resembles that of lossy compression.
Compressed sensing:
In compressed sensing (CS),
the measurements are derived in a linear fashion with
additive noise, y=Φx+z. Similar to the denoising setting,
the quality provided by Q(w) relates to the log likelihood
of the residual, y-ΦQ(w). Therefore, the resulting
optimization problem becomes
ŵ=argminw{U(w)+ λ||y-ΦQ(w)||2}.
The reader may wonder why we use a universal coding length U(w)
instead of some entropy H(w) or other complexity measure.
The key issue is that in many practical problems the input is generated
by an unknown statistical mechanism, and the universal compressor
allows to quantify the complexity of a variety of inputs generated
by a variety of possible statistical distributions in a data-adaptive manner.
This philosophy is inspired by the MDL priniciple.
This MDL approach can be generalized to a system
(illustrated in the figure below) that
measures an input x using a measurement operator J, and then passes
w=J(x) through a noise operator Z. As before, the goal is to reconstruct
or estimate the input signal x from the noisy measurements y.
The input x of this nonlinear
system can be estimated as follows,
ŵ=argminw{U(w)-log(fZ(Y=y|W=Q(w)))},
where we highlight that different noise mechanisms fZ(Y|W)
can be used.
These observations and generalizations of MDL appear in the
following paper.
An MDL approach to compressed sensing:
Recall the above optimization problem for compressed sensing (CS),
ŵ=argminw{U(w)+ λ||y-ΦQ(w)||2}.
A brute force optimization over w is computationally intractable. Instead, we propose
an algorithmic approach based on Markov chain Monte Carlo (MCMC). This approach is
reminiscent of our framework for lossy data compression.
In our objective function, we first need to quantize the estimate w.
A judicious design of our quantization levels is crucial.
On the one hand, with a small number of fixed quantization levels, w will
suffer from large quantization errors.
On the other hand, a large number of levels will make it difficult for the algorithm
to converge, not to mention that the algorithm will become much slower.
Due to these limitations, we developed a four-stage framework
(size- and level-adaptive MCMC, SLA-MCMC) that adaptively matches the
complexity of w to that of the true underlying alphabet of the signal.
A flowchart for SLA-MCMC is shown in the following figure,
where L(r) denotes running r iterations of a level-adaptive MCMC
algorithm that uses a fixed number of levels while adaptively adjusting
the quantization levels; MERGE, ADD-out, and ADD-in are three
procedures that remove or add levels to the current quantizer; D1-D3
are critetia to determine when to carry out these three procedures
(details appear in our papers).
The estimation quality of SLA-MCMC is comparable and in many cases better than existing
algorithms. The following figure illustrates the estimation quality of SLA-MCMC and some
existing algorithms. The quality is quantified by the mean signal-to-distortion ratio
(defined as the signal energy divided by the mean squared error; the higher the better)
for a two-state Markov source with a zero state and a non-zero state,
where non-zeros follow a uniform distribution, U[0,1].
J. Zhu,
D. Baron, and
M. F. Duarte,
"Recovery from Linear Measurements with Complexity-Matching
Universal Signal Estimation,"
IEEE Trans. Signal Proc., vol. 63, no. 6, pp. 1512-1527, March 2015
(arxiv,
pdf,
Matlab).
J. Zhu,
D. Baron, and
M. F. Duarte,
"Complexity-Adaptive Universal Signal Estimation for Compressed Sensing,"
Proc. Stat. Signal Process. Workshop, Gold Coast, Australia, June 2014
(pdf,
poster).