Dror Baron - Universal Denoising


In recent years, we have been exploring denoising problems, where a signal x is measured with additive noise, y=x+z. The goal is to estimate x from y. This framework is directly applicable to many areas where signals are measured in a noisy way, including in imaging systems, audio, and denoising medical data. When a statistical characterization of the input x and noise z is available, a Bayesian approach can be used. However, in many problems these statistics are unavailable, and we must use a universal approach that adapts to the data at hand. Below we describe two denoising approaches that can adapt to unknown input statistics.


Universal denoising based on context quantization and Gaussian mixture learning: Our approach is based on a universal denoiser for stationary ergodic inputs that performs context quantization; this denoiser was proposed by Sivaramakrishnan and Weissman. The key idea is to partition the stationary ergodic signal denoising problem into multiple denoising problems involving subsequences that are conditionally independent and identically distributed (i.i.d.). This denoiser has been proved to asymptotically achieve the minimum mean square error (MMSE) for signals with bounded components. We overcome the limitation of boundedness by replacing the density estimation approach of Sivaramakrishnan and Weissman with a Gaussian mixture (GM) learning algorithm. Specifically, a GM model is learned for each noisy subsequence, and we obtain an estimate of the distribution of the corresponding clean subsequence by subtracting the noise variance from each Gaussian component of the learned GM model. This density estimate is used to denoise the subequence with a standard Bayesian technique.

We have applied our universal denoiser within the approximate message passing (AMP) recovery framework for linear inverse problems, which leads to a promising universal recovery algorithm AMP-UD (AMP with a universal denoiser). A block diagram of our approach appears in the following figure.

Flow Chart of AMP-UD.
Numerical results show that AMP-UD compares favorably with existing reconstruction algorithms in terms of both reconstruction quality and runtime, despite not knowing the input statistics of the stationary ergodic signal. Example numerical results for signals generated by a two state Markov source with a zero state and a non-zero state, where the non-zeros follow a Rademacher distribution (-1 and +1 are taken with equal probability) are shown in the following figure. As the measurement rate R (the ratio between the number of measurements and signal dimension) increases, the signal to distortion ratio improves. That said, AMP-UD reconstructs better than the other algorithms, except for the low-R low-SNR setting.
MP-UD for dense Markov Rademacher signal.
Note that this specific plot shows results for measurements rates that vary from 0.6 to 1.6. In other problems, especially when the input is sparse (the signals used to generate the above plot were non-sparse), compressed sensing (CS) techniques have shown that it is possible to sample and later reconstruct the input from greatly reduced measurement rates R. We have also considered CS-style settings in our work, which appears in the following publication. You may also want to view the following tutorial video about this research by Yanting Ma and Junan Zhu.

Universality within a model class (a.k.a. the power of mixing): To estimate a parametric signal x with unknown parameters from the noisy observations y, one may first find the best parameters (i.e., via maximum likelihood (ML)) θ*, and then plug the parameters θ* into the MMSE estimator, E[x|y,θ*]. This approach is known as a plug-in denoiser. The plug-in is often useful, especially when the signal dimension is large, so that θ* is likely to match the signal well. For low dimensional problems, however, one may not have sufficient data to obtain an accurate θ*. The problem in the low dimensional setting is that we over-commit to one parameter.

To address this challenge, we propose a mixture denoiser (MixD), x̂ = ∫ E[x|y,θ] p(θ|y) dθ, which mixes over the MMSE estimators w.r.t. each possible θ, where the mixing probability is the posterior of θ. The following figure compares the excess mean square error (MSE) of MixD beyond that of the plug-in as a function of the signal dimension N, where the input signal is Bernoulli distributed. It can be seen that MixD has lower excess MSE than the plug-in for small N, and the advantage vanishes as N grows. Both MixD and the plug-in approach the MMSE for large N.

MixD vs. Plug-in for Bernoulli signal.

Last updated July 2014.