Skip to content

Question about target units normalization #2

@mailong25

Description

@mailong25

I wonder if DiffNorm is designed to normalize target speech units. Why is the src-feat required during the training of VAE and Diffusion in the provided script? I read the paper and didn't see any mention of using source speech when training VAE and Diffusion.
.
I assume the src-feat is just a dummy argument, so I tried to set the src-feat to be the same as the tgt-feat to proceed with the training. When I perform unit generation (inference), I get the normalized units exactly the same as the original units. Any ideas?
.
Note that i got pretty high acc (~99%) on VAE task and 82% acc on Diffusion task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions