Local SNR Dropout inconsitency

While working with the DFN3 training and model implementation, I came across a couple of points regarding the local SNR-based switching mechanism that I'd like to kindly ask for clarification on.

### 1. About the lsnr_dropout Configuration Flag
I noticed the lsnr_dropout flag in the training configuration. This setting defaults to False and is also explicitly set to False in all of the provided DFN3 training config.ini files.
I was wondering if this feature was actively used during the final experiments for the paper?

### 2. Discrepancy in LSNR-based Decoder Switching Logic

The DFN3 paper describes a multi-stage switching mechanism based on the predicted local SNR. Specifically, the paper states:

> We further predict the local SNR ξ ∈ [−15, 35] dB within the encoder network on frame level. This allows to completely disable the ERB or the DF decoder depending on the current noise conditions. That is, we define the following criteria:
> ξ < −10 dB: Disable both decoders, return silent spectrum.
> ξ > 20 dB: Disable DF decoder. Only low noise condition, enhancing the periodicity is not necessary.
> else: Run all stages for best noise reduction

When reviewing the implementation, I was able to find the -10 dB threshold being used to create a boolean mask for active frames. Here is the relevant code snippet I'm looking at:

in [deepfilternet3.DfNet.forward()](https://github.com/Rikorose/DeepFilterNet/blob/main/DeepFilterNet/df/deepfilternet3.py#L389):

```
if self.lsnr_droput:
            idcs = lsnr.squeeze() > -10.0
            b, t = (spec.shape[0], spec.shape[2])
            m = torch.zeros((b, 1, t, self.erb_bins), device=spec.device)
            df_coefs = torch.zeros((b, t, self.nb_df, self.df_order * 2))
            spec_m = spec.clone()
            emb = emb[:, idcs]
            e0 = e0[:, :, idcs]
            e1 = e1[:, :, idcs]
            e2 = e2[:, :, idcs]
            e3 = e3[:, :, idcs]
            c0 = c0[:, :, idcs]

        if self.run_erb:
            if self.lsnr_droput:
                m[:, :, idcs] = self.erb_dec(emb, e3, e2, e1, e0)
            else:
                m = self.erb_dec(emb, e3, e2, e1, e0)
            spec_m = self.mask(spec, m)
        else:
            m = torch.zeros((), device=spec.device)
            spec_m = torch.zeros_like(spec)

        if self.run_df:
            if self.lsnr_droput:
                df_coefs[:, idcs] = self.df_dec(emb, c0)
            else:
                df_coefs = self.df_dec(emb, c0)
            df_coefs = self.df_out_transform(df_coefs)
            spec_e = self.df_op(spec.clone(), df_coefs)
            spec_e[..., self.nb_df :, :] = spec_m[..., self.nb_df :, :]
        else:
            df_coefs = torch.zeros((), device=spec.device)
            spec_e = spec_m
```

`idcs = lsnr.squeeze() > -10.0` will create a mask for each T where lsnr is greater than -10 dB

however, the same idcs is used for the df_coefs. So in this case, all time points with an lsnr of >-10 dB will be sent through both decoders, those below, will be zeros. But never will the df_decoder be turned off.

This could cause some artefacts I assume, maybe like the one mentioned here: #657  

Also, if the lsnr_dropout isn't used, why train the lsnr_fc? The LSNR Loss only helps in making the lsnr_fc better, but if it isn't used, why train it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local SNR Dropout inconsitency #658

1. About the lsnr_dropout Configuration Flag

2. Discrepancy in LSNR-based Decoder Switching Logic

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local SNR Dropout inconsitency #658

Description

1. About the lsnr_dropout Configuration Flag

2. Discrepancy in LSNR-based Decoder Switching Logic

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions