Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I change the duration of the output audio? #21

Open
ChloeL19 opened this issue Jun 13, 2023 · 3 comments
Open

How can I change the duration of the output audio? #21

ChloeL19 opened this issue Jun 13, 2023 · 3 comments

Comments

@ChloeL19
Copy link

Hi, This is really a lovely repository. But how can I change the duration of the generated audio?
Thanks!!

@ChloeL19
Copy link
Author

Okay, so I just kind of forced a different latent embedding size. I wanted one second of output, so I divided the original latent dimension (256) by 10 and then rounded up.

def prepare_latents(self, batch_size, inference_scheduler, num_channels_latents, dtype, device):
    # EDIT: they are hardcoding the latent size here!! to 256! I want to change this!
    shape = (batch_size, num_channels_latents, 256, 16)
    shape = (batch_size, num_channels_latents, 26, 16) # scaled to one second???

Indeed, the inference script now outputs audio files that are 1 second in length. Is this....okay??

@ChloeL19
Copy link
Author

I suppose duration could be introduced as a training argument, and then saved as part of the training config and used in this way to adjust the lengths of the audio generated during the inference process...

@cvillela
Copy link

cvillela commented Jul 3, 2023

Would really like an audio sample duration feature as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants