Underwater Dereverberation Tutorial

This page outlines how to estimate the impulse response (IR) of an underwater recording channel, invert it, and adjust source spectral levels (SSL) of an unknown source recorded in the same channel. The published paper can be found here: Source characterization. Matlab code is available upon request.

Here is an introduction to the problem:
The figure below (Figure 1(a)) shows a diagram of the inverse problem in an underwater recording environment. The recording process can be modeled as the convolution of individual IRs. Here, the input signal of the source d_i(t) is recorded in a reverberant channel g(t) with a hydrophone r_2(t). The hydrophone is connected to an analog to digital converter (ADC, denoted by r_1(t)) and the recorded output signal d_o(t) is stored on a hard-drive.

1pool-eps-converted-to

Figure 1: Pool diagram showing schematics of (a) the inverse problem with an unknown source and (b) the forward problem with a known source.

The problem of interest here is to estimate the input signal which is not immediately possible since both the source and the IR of the channel are unknown. To estimate the IR, the source signal is replaced by a known signal, shown in Fig. 1(b). For the forward problem, the source signal s(t) is fed through a playback and pre-amp device p_1(t) which is connected to a transmitting transducer p_2(t). It is assumed that both the unknown source and the transmitting transducer have similar directionality and are of similar shape. The channel and the recording equipment is the same as in the inverse problem and the recorded signal is denoted by o(t). The total IR combining the playback and recording devices with the channel is abbreviated by the filter h(t):

  h(t) = r_{1}(t) \ast r_{2}(t) \ast g(t) \ast p_2(t) \ast p_1(t)

Our first task is to identify the IR of the stochastic system h(t) and use an incoherent equations to estimate SSL of the unknown signal d_i(t):

  SSL[f] = 10\log_{10}(S_{d_o}) - 20\log_{10}(\mathbb{E}[|H|]) + 20\log_{10}(|P_2|) + 20\log_{10}(|P_1|)

Note that we take an ensemble average over the amplitude response of h(t) (|H|) because (a) of the pressure fluctuations in a reverberant enclosure and (b) the system might not be entirely time invariant. We will however assume that the system is ergodic, which means that that we can take averages in time (successive measurements) to get an average in space (i.e., the pressure distribution). We will also assumed that the system is linear, i.e., that there exists adequate acoustic support for the lowest frequency of interest. In the above equation, the power spectral density of the recorded signal S_{d_o} is known as well as the amplitude responses of the electrical equipment (measured or factory calibrated). The only unknown is the the channel h(t).

Figure 2 is an overview how h(t) can be estimated and the experiment can be conducted:

decay_time

Figure 2: Flowchart showing procedure to obtain the impulse response h(t) in the forward problem to estimate the unknown source in the inverse problem.

In practice, the IR is obtained by linear (overlap and add method) deconvolution. We want to have a long excitation signal to ensure transducers have sufficient excitation time at lower frequencies (delayed low-frequency components). However, since we want to play the excitation signal multiple times to deconcolve multiple realizations of the IR, we need to make sure that the playback period of the excitation signals is longer than the IR. If the period is shorter, than the tail of the previous IR is still decaying while we play the next signal. We therefore need a way to approximate the lenght of the IR s.t. we can design an appropriate pause between realizations.

We therefore need to know the time required for the reverberant energy to decay to the noise floor (Fig. 2(a)): this time is approximated by the reverberation time (room acoustics). An equation to approximate the energy to decay to the noise floor in seconds (T_{60}, which is the time corresponding to a decay of 60 dB) is given by:

  T_{60} = \frac{24 ln(10)}{c} \frac{V}{-S\log( 1-\sum_{i=1}^6 \alpha_i A_i/S)},

where c denotes soundspeed (1500 m/s for the freshwater pool here), V denotes the volume of the pool (in m^3), {\alpha}_i is the absorption coefficient of each surface area A_i (in m^2) and S (in m^2) represents the combined area of all underwater walls and water/air boundary of the rectangular enclosure. The absorption coefficient for each boundary can be estimated using

  \alpha_i = 1 - \left|\frac{z_w-z_i}{z_w+z_i}\right|,

where z_w represents the acoustic impedance of water and z_i the acoustic impedance of the boundary. Standard values for z are 415 Ns/m^{-3} (air), 8 \times 10^{6} Ns/m^{-3} (concrete) and 1.5 \times 10^{6} Ns/m^{-3} (water).

The theoretical T_{60} can subsequently be used to design an excitation signal (e.g. a logarithmic sweep) s(t) that is approximately 5 times longer (Fig. 2(b)) to increase SNR. In addition, no preliminary experiment is required to estimate the RT. The frequency range of the excitation signal should exceed the frequency range of interest to minimize transducer transients and its maximum sampling rate is given by the minimum sampling rate of either the playback or recording system. The mean of s(t) should be removed (eliminates uncorrelated noise).

The next step is to conduct the experiment (Fig. 2(c)) using a channel length of 1 m such that results correspond to SSL. If possible, noise sources such as pool pumps and water overflow mechanics should be eliminated. I recommend to record 100 successive realizations of the excitation signal each separated by a time sufficiently longer than T_{60} to allow energy to decay between recordings. In addition, recording a control signal is recommended (a word of caution: it is very easy to be off by a constant when computing the transfer function). Once all excitation and control signals are recorded, the unknown source can be recorded in the same channel.

The IR h(t) can now be deconvolved (Fig. 2(e)) by correlating the recorded response (Fig. 2(d)) with the inverse of the excitation signal f(t):

  s(t) \ast f(t) = \delta(t) \\  o(t) \ast f(t) = h(t) \ast s(t) \ast f(t) \label{equ:pool_reverb2} \\  o(t) \ast f(t) = h(t) \label{equ:pool_reverb3}

The inverse filter f(t) needs to be appropriately designed to avoid a squaring of the magnitude spectrum. The intruded pure delay (conjugate multiplication with the inverse filter) can be easily ignored. This method ensures that h(t) is invariant to the amplitude of the input signal. Temporal inversion causes a phase inversion and the cross-correlation results in a pure delay of h(t). Hence h(t) has no amplitude or phase contributions due to the excitation method (and it reflects the IR of the system alone).

Afterwards, the IR is filtered to remove resonance frequencies. The room must be large relative to the lowest frequency of interest. The lowest frequency for which this applies is called the Schroeder frequency f_s and is given by:

  f_s = 0.6 \sqrt{\frac{c^3 T_{60}}{V}}.

In practise, the IR is band-pass filtered at this step: while the lower bound is given by the Schroeder frequency, actual cutoff frequencies should correspond to the bandwidth of interest or might be dictated by the frequency response of the equipment.

Once the IR is filtered, its length needs to be estimated (Fig. 2(f)) which is accomplished using the either the measure of echo density or Schroeders method of backwards integration. Echo density is function of time and can be plotted concurringly with the IR to identify the transition time (t_{end}) between early reflections and late reverberation before the IR is truncated using a window (Fig. 1(g)). Schroders method can be used to estimate when the IR decays in the noise floor. While this method is more accurate and includes more energy of the IR, it seems that the energy within the late reverberant IR tail is insignificant.

Now we have obtained 100 realizations of h(t) which are properly filtered and windowed. All realizations can subsequently be averaged incoherently to obtain an estimate of |H| as given in the equation at the beginning of this tutorial. For a more in depth discussion, please read the paper.

Hope this helps!

~kai