Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to adjust the 512x512 to a different height/width? #22

Open
DustinBrett opened this issue Apr 19, 2023 · 3 comments
Open

Comments

@DustinBrett
Copy link

DustinBrett commented Apr 19, 2023

I have been adding this amazing software to my personal website and currently I have it making new wallpapers every few minutes. But what I would love is if I could sync up the height/width of the viewport with the image generation. I see some hardcoded 512's but I am unable to build this without CUDA (at least in WSL). Is it possible on the JavaScript side to set these values? Thanks!

@MasterJH5574
Copy link
Collaborator

Hi @DustinBrett, thanks for the suggestion! We’re so glad to see you use our work as a generator of new wallpapers, and the image size adjustment will definitely make things cooler. However, it is still an ongoing job that we are working on, and it might still take some time to ship. Sorry for the inconvenience here. We will update once we support.

@DustinBrett
Copy link
Author

Thanks and no problem on the inconvenience, I am just happy to use it. I've also made an "app" within my side project daedalOS to create images at that correct resolution.

image

@matbee-eth
Copy link

matbee-eth commented Apr 28, 2023

Hey, I'm currently working through the code and trying to figure out where it differs from the current Diffusers implementation here, https://huggingface.co/blog/stable_diffusion

Docs use this as the example to set the width and height in the torch.randn to create the tensor shape.

latents = torch.randn(
    (batch_size, unet.in_channels, height // 8, width // 8),
    generator=generator,
)
latents = latents.to(torch_device)

I don't see a similar execution in the code- how is the latent noise currently being created, and where should I implement this as its quite a bit different

I see some references to: "this.vm.getFunction", but this loses me as it doesnt point to any known location that I can find.

There is a deploy.py that contains a section of interest but, again, I don't see how it's connected to the implementation.

        latents = torch.randn(
            (1, 4, 64, 64),
            device="cpu",
            dtype=torch.float32,
        )
        latents = tvm.nd.array(latents.numpy(), self.tvm_device)

        for i in tqdm(range(len(self.scheduler.timesteps))):
            t = self.scheduler.timesteps[i]
            self.debug_dump(f"unet_input_{i}", latents)
            self.debug_dump(f"timestep_{i}", t)
            noise_pred = self.unet_latents_to_noise_pred(latents, t, text_embeddings)
            self.debug_dump(f"unet_output_{i}", noise_pred)
            latents = self.scheduler.step(self.vm, noise_pred, latents, i)

        self.debug_dump("vae_input", latents)
        image = self.vae_to_image(latents)
        self.debug_dump("vae_output", image)
        image = self.image_to_rgba(image)
        return Image.fromarray(image.numpy().view("uint8").reshape(512, 512, 4))

Is this used in the deployed webgl version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants