-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I wonder if DiffNorm is designed to normalize target speech units. Why is the src-feat required during the training of VAE and Diffusion in the provided script? I read the paper and didn't see any mention of using source speech when training VAE and Diffusion.
.
I assume the src-feat is just a dummy argument, so I tried to set the src-feat to be the same as the tgt-feat to proceed with the training. When I perform unit generation (inference), I get the normalized units exactly the same as the original units. Any ideas?
.
Note that i got pretty high acc (~99%) on VAE task and 82% acc on Diffusion task.
Metadata
Metadata
Assignees
Labels
No labels