Question about target units normalization

I wonder if DiffNorm is designed to normalize target speech units. Why is the  src-feat required during the training of VAE and Diffusion in the provided script? I read the paper and didn't see any mention of using source speech when training VAE and Diffusion.
.
I assume the src-feat is just a dummy argument, so I tried to set the src-feat to be the same as the tgt-feat to proceed with the training. When I perform unit generation (inference), I get the normalized units exactly the same as the original units. Any ideas?
.
Note that i got pretty high acc (~99%) on VAE task and 82% acc on Diffusion task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about target units normalization #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about target units normalization #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions