diff --git a/.idea/.gitignore b/.idea/.gitignore deleted file mode 100644 index 9c6244e..0000000 --- a/.idea/.gitignore +++ /dev/null @@ -1,11 +0,0 @@ -# Default ignored files -/shelf/ -/workspace.xml -# Editor-based HTTP Client requests -/httpRequests/ -# Datasource local storage ignored files -/dataSources/ -/dataSources.local.xml - -*.pyc -.idea diff --git a/.streamlit/config.toml b/.streamlit/config.toml index 0cd5f70..0b2a88c 100644 --- a/.streamlit/config.toml +++ b/.streamlit/config.toml @@ -34,7 +34,6 @@ maxMessageSize = 200 enableWebsocketCompression = false [browser] -serverAddress = "localhost" gatherUsageStats = false serverPort = 8501 diff --git a/README.md b/README.md index ccb49e2..cfb4cdf 100644 --- a/README.md +++ b/README.md @@ -1,41 +1,45 @@ #
Web-based UI for Stable Diffusion
-## Created by [sd-webui](https://github.com/sd-webui) +## Created by [Sygil.Dev](https://github.com/sygil-dev) -## [Visit sd-webui's Discord Server](https://discord.gg/gyXNe4NySY) [![Discord Server](https://user-images.githubusercontent.com/5977640/190528254-9b5b4423-47ee-4f24-b4f9-fd13fba37518.png)](https://discord.gg/gyXNe4NySY) +## [Join us at Sygil.Dev's Discord Server](https://discord.gg/gyXNe4NySY) [![Discord Server](https://user-images.githubusercontent.com/5977640/190528254-9b5b4423-47ee-4f24-b4f9-fd13fba37518.png)](https://discord.gg/gyXNe4NySY) ## Installation instructions for: -- **[Windows](https://sd-webui.github.io/stable-diffusion-webui/docs/1.windows-installation.html)** -- **[Linux](https://sd-webui.github.io/stable-diffusion-webui/docs/2.linux-installation.html)** + +- **[Windows](https://sygil-dev.github.io/sygil-webui/docs/1.windows-installation.html)** +- **[Linux](https://sygil-dev.github.io/sygil-webui/docs/2.linux-installation.html)** ### Want to ask a question or request a feature? -Come to our [Discord Server](https://discord.gg/gyXNe4NySY) or use [Discussions](https://github.com/sd-webui/stable-diffusion-webui/discussions). +Come to our [Discord Server](https://discord.gg/gyXNe4NySY) or use [Discussions](https://github.com/sygil-dev/sygil-webui/discussions). ## Documentation -[Documentation is located here](https://sd-webui.github.io/stable-diffusion-webui/) +[Documentation is located here](https://sygil-dev.github.io/sygil-webui/) ## Want to contribute? Check the [Contribution Guide](CONTRIBUTING.md) -[sd-webui](https://github.com/sd-webui) is: +[Sygil-Dev](https://github.com/Sygil-Dev) main devs: + * ![hlky's avatar](https://avatars.githubusercontent.com/u/106811348?s=40&v=4) [hlky](https://github.com/hlky) * ![ZeroCool940711's avatar](https://avatars.githubusercontent.com/u/5977640?s=40&v=4)[ZeroCool940711](https://github.com/ZeroCool940711) * ![codedealer's avatar](https://avatars.githubusercontent.com/u/4258136?s=40&v=4)[codedealer](https://github.com/codedealer) ### Project Features: -* Two great Web UI's to choose from: Streamlit or Gradio -* No more manually typing parameters, now all you have to do is write your prompt and adjust sliders * Built-in image enhancers and upscalers, including GFPGAN and realESRGAN + +* Generator Preview: See your image as its being made + * Run additional upscaling models on CPU to save VRAM -* Textual inversion 🔥: [info](https://textual-inversion.github.io/) - requires enabling, see [here](https://github.com/hlky/sd-enable-textual-inversion), script works as usual without it enabled -* Advanced img2img editor with Mask and crop capabilities -* Mask painting 🖌️: Powerful tool for re-generating only specific parts of an image you want to change (currently Gradio only) -* More diffusion samplers 🔥🔥: A great collection of samplers to use, including: - - `k_euler` (Default) + +* Textual inversion: [Reaserch Paper](https://textual-inversion.github.io/) + +* K-Diffusion Samplers: A great collection of samplers to use, including: + + - `k_euler` - `k_lms` - `k_euler_a` - `k_dpm_2` @@ -44,57 +48,78 @@ Check the [Contribution Guide](CONTRIBUTING.md) - `PLMS` - `DDIM` -* Loopback ➿: Automatically feed the last generated sample back into img2img -* Prompt Weighting 🏋️: Adjust the strength of different terms in your prompt -* Selectable GPU usage with `--gpu ` -* Memory Monitoring 🔥: Shows VRAM usage and generation time after outputting -* Word Seeds 🔥: Use words instead of seed numbers -* CFG: Classifier free guidance scale, a feature for fine-tuning your output -* Automatic Launcher: Activate conda and run Stable Diffusion with a single command -* Lighter on VRAM: 512x512 Text2Image & Image2Image tested working on 4GB +* Loopback: Automatically feed the last generated sample back into img2img + +* Prompt Weighting & Negative Prompts: Gain more control over your creations + +* Selectable GPU usage from Settings tab + +* Word Seeds: Use words instead of seed numbers + +* Automated Launcher: Activate conda and run Stable Diffusion with a single command + +* Lighter on VRAM: 512x512 Text2Image & Image2Image tested working on 4GB (with *optimized* mode enabled in Settings) + * Prompt validation: If your prompt is too long, you will get a warning in the text output field -* Copy-paste generation parameters: A text output provides generation parameters in an easy to copy-paste form for easy sharing. -* Correct seeds for batches: If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`. + +* Sequential seeds for batches: If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`. + * Prompt matrix: Separate multiple prompts using the `|` character, and the system will produce an image for every combination of them. -* Loopback for Image2Image: A checkbox for img2img allowing to automatically feed output image as input for the next batch. Equivalent to saving output image, and replacing input image with it. +* [Gradio] Advanced img2img editor with Mask and crop capabilities -# Stable Diffusion Web UI -A fully-integrated and easy way to work with Stable Diffusion right from a browser window. +* [Gradio] Mask painting 🖌️: Powerful tool for re-generating only specific parts of an image you want to change (currently Gradio only) + +# SD WebUI + +An easy way to work with Stable Diffusion right from your browser. ## Streamlit ![](images/streamlit/streamlit-t2i.png) **Features:** -- Clean UI with an easy to use design, with support for widescreen displays. -- Dynamic live preview of your generations -- Easily customizable presets right from the WebUI (Coming Soon!) -- An integrated gallery to show the generations for a prompt or session (Coming soon!) -- Better optimization VRAM usage optimization, less errors for bigger generations. -- Text2Video - Generate video clips from text prompts right from the WEb UI (WIP) -- Concepts Library - Run custom embeddings others have made via textual inversion. -- Actively being developed with new features being added and planned - Stay Tuned! -- Streamlit is now the new primary UI for the project moving forward. -- *Currently in active development and still missing some of the features present in the Gradio Interface.* + +- Clean UI with an easy to use design, with support for widescreen displays +- *Dynamic live preview* of your generations +- Easily customizable defaults, right from the WebUI's Settings tab +- An integrated gallery to show the generations for a prompt +- *Optimized VRAM* usage for bigger generations or usage on lower end GPUs +- *Text to Video:* Generate video clips from text prompts right from the WebUI (WIP) +- Image to Text: Use [CLIP Interrogator](https://github.com/pharmapsychotic/clip-interrogator) to interrogate an image and get a prompt that you can use to generate a similar image using Stable Diffusion. +- *Concepts Library:* Run custom embeddings others have made via textual inversion. +- Textual Inversion training: Train your own embeddings on any photo you want and use it on your prompt. +- **Currently in development: [Stable Horde](https://stablehorde.net/) integration; ImgLab, batch inputs, & mask editor from Gradio + +**Prompt Weights & Negative Prompts:** + +To give a token (tag recognized by the AI) a specific or increased weight (emphasis), add `:0.##` to the prompt, where `0.##` is a decimal that will specify the weight of all tokens before the colon. +Ex: `cat:0.30, dog:0.70` or `guy riding a bicycle :0.7, incoming car :0.30` + +Negative prompts can be added by using `###` , after which any tokens will be seen as negative. +Ex: `cat playing with string ### yarn` will negate `yarn` from the generated image. + +Negatives are a very powerful tool to get rid of contextually similar or related topics, but **be careful when adding them since the AI might see connections you can't**, and end up outputting gibberish + +**Tip:* Try using the same seed with different prompt configurations or weight values see how the AI understands them, it can lead to prompts that are more well-tuned and less prone to error. Please see the [Streamlit Documentation](docs/4.streamlit-interface.md) to learn more. - -## Gradio +## Gradio [Legacy] ![](images/gradio/gradio-t2i.png) **Features:** -- Older UI design that is fully functional and feature complete. + +- Older UI that is functional and feature complete. - Has access to all upscaling models, including LSDR. - Dynamic prompt entry automatically changes your generation settings based on `--params` in a prompt. - Includes quick and easy ways to send generations to Image2Image or the Image Lab for upscaling. -- *Note, the Gradio interface is no longer being actively developed and is only receiving bug fixes.* + +**Note: the Gradio interface is no longer being actively developed by Sygil.Dev and is only receiving bug fixes.** Please see the [Gradio Documentation](docs/5.gradio-interface.md) to learn more. - ## Image Upscalers --- @@ -106,8 +131,8 @@ Please see the [Gradio Documentation](docs/5.gradio-interface.md) to learn more. Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strong the effect is. If you want to use GFPGAN to improve generated faces, you need to install it separately. -Download [GFPGANv1.3.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth) and put it -into the `/stable-diffusion-webui/models/gfpgan` directory. +Download [GFPGANv1.4.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth) and put it +into the `/sygil-webui/models/gfpgan` directory. ### RealESRGAN @@ -117,20 +142,24 @@ Lets you double the resolution of generated images. There is a checkbox in every There is also a separate tab for using RealESRGAN on any picture. Download [RealESRGAN_x4plus.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth) and [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth). -Put them into the `stable-diffusion-webui/models/realesrgan` directory. +Put them into the `sygil-webui/models/realesrgan` directory. -### GoBig, LSDR, and GoLatent *(Currently Gradio Only)* +### LSDR + +Download **LDSR** [project.yaml](https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1) and [model last.cpkt](https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1). Rename last.ckpt to model.ckpt and place both under `sygil-webui/models/ldsr/` + +### GoBig, and GoLatent *(Currently on the Gradio version Only)* More powerful upscalers that uses a seperate Latent Diffusion model to more cleanly upscale images. -Download **LDSR** [project.yaml](https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1) and [ model last.cpkt](https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1). Rename last.ckpt to model.ckpt and place both under stable-diffusion-webui/models/ldsr/ - -Please see the [Image Enhancers Documentation](docs/5.image_enhancers.md) to learn more. +Please see the [Image Enhancers Documentation](docs/6.image_enhancers.md) to learn more. ----- -### *Original Information From The Stable Diffusion Repo* +### *Original Information From The Stable Diffusion Repo:* + # Stable Diffusion + *Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:* [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)
@@ -144,7 +173,6 @@ Please see the [Image Enhancers Documentation](docs/5.image_enhancers.md) to lea which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/). - [Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion model. Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. @@ -164,15 +192,14 @@ then finetuned on 512x512 images. in its training data. Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion). -## Comments +## Comments - Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion) -and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). -Thanks for open-sourcing! + and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). + Thanks for open-sourcing! - The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). - ## BibTeX ``` @@ -184,7 +211,4 @@ Thanks for open-sourcing! archivePrefix={arXiv}, primaryClass={cs.CV} } - ``` - - diff --git a/Web_based_UI_for_Stable_Diffusion_colab.ipynb b/Web_based_UI_for_Stable_Diffusion_colab.ipynb new file mode 100644 index 0000000..c0a1500 --- /dev/null +++ b/Web_based_UI_for_Stable_Diffusion_colab.ipynb @@ -0,0 +1,554 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "private_outputs": true, + "provenance": [], + "collapsed_sections": [ + "5-Bx4AsEoPU-", + "xMWVQOg0G1Pj" + ] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Sygil-Dev/sygil-webui/blob/dev/Web_based_UI_for_Stable_Diffusion_colab.ipynb)" + ], + "metadata": { + "id": "S5RoIM-5IPZJ" + } + }, + { + "cell_type": "markdown", + "source": [ + "# README" + ], + "metadata": { + "id": "5-Bx4AsEoPU-" + } + }, + { + "cell_type": "markdown", + "source": [ + "###
Web-based UI for Stable Diffusion
\n", + "\n", + "## Created by [Sygil-Dev](https://github.com/Sygil-Dev)\n", + "\n", + "## [Visit Sygil-Dev's Discord Server](https://discord.gg/gyXNe4NySY) [![Discord Server](https://user-images.githubusercontent.com/5977640/190528254-9b5b4423-47ee-4f24-b4f9-fd13fba37518.png)](https://discord.gg/gyXNe4NySY)\n", + "\n", + "## Installation instructions for:\n", + "\n", + "- **[Windows](https://sygil-dev.github.io/sygil-webui/docs/1.windows-installation.html)** \n", + "- **[Linux](https://sygil-dev.github.io/sygil-webui/docs/2.linux-installation.html)**\n", + "\n", + "### Want to ask a question or request a feature?\n", + "\n", + "Come to our [Discord Server](https://discord.gg/gyXNe4NySY) or use [Discussions](https://github.com/Sygil-Dev/sygil-webui/discussions).\n", + "\n", + "## Documentation\n", + "\n", + "[Documentation is located here](https://sygil-dev.github.io/sygil-webui/)\n", + "\n", + "## Want to contribute?\n", + "\n", + "Check the [Contribution Guide](CONTRIBUTING.md)\n", + "\n", + "[Sygil-Dev](https://github.com/Sygil-Dev) main devs:\n", + "\n", + "* ![hlky's avatar](https://avatars.githubusercontent.com/u/106811348?s=40&v=4) [hlky](https://github.com/hlky)\n", + "* ![ZeroCool940711's avatar](https://avatars.githubusercontent.com/u/5977640?s=40&v=4)[ZeroCool940711](https://github.com/ZeroCool940711)\n", + "* ![codedealer's avatar](https://avatars.githubusercontent.com/u/4258136?s=40&v=4)[codedealer](https://github.com/codedealer)\n", + "\n", + "### Project Features:\n", + "\n", + "* Two great Web UI's to choose from: Streamlit or Gradio\n", + "\n", + "* No more manually typing parameters, now all you have to do is write your prompt and adjust sliders\n", + "\n", + "* Built-in image enhancers and upscalers, including GFPGAN and realESRGAN\n", + "\n", + "* Run additional upscaling models on CPU to save VRAM\n", + "\n", + "* Textual inversion 🔥: [info](https://textual-inversion.github.io/) - requires enabling, see [here](https://github.com/hlky/sd-enable-textual-inversion), script works as usual without it enabled\n", + "\n", + "* Advanced img2img editor with Mask and crop capabilities\n", + "\n", + "* Mask painting 🖌️: Powerful tool for re-generating only specific parts of an image you want to change (currently Gradio only)\n", + "\n", + "* More diffusion samplers 🔥🔥: A great collection of samplers to use, including:\n", + " \n", + " - `k_euler` (Default)\n", + " - `k_lms`\n", + " - `k_euler_a`\n", + " - `k_dpm_2`\n", + " - `k_dpm_2_a`\n", + " - `k_heun`\n", + " - `PLMS`\n", + " - `DDIM`\n", + "\n", + "* Loopback ➿: Automatically feed the last generated sample back into img2img\n", + "\n", + "* Prompt Weighting 🏋️: Adjust the strength of different terms in your prompt\n", + "\n", + "* Selectable GPU usage with `--gpu `\n", + "\n", + "* Memory Monitoring 🔥: Shows VRAM usage and generation time after outputting\n", + "\n", + "* Word Seeds 🔥: Use words instead of seed numbers\n", + "\n", + "* CFG: Classifier free guidance scale, a feature for fine-tuning your output\n", + "\n", + "* Automatic Launcher: Activate conda and run Stable Diffusion with a single command\n", + "\n", + "* Lighter on VRAM: 512x512 Text2Image & Image2Image tested working on 4GB\n", + "\n", + "* Prompt validation: If your prompt is too long, you will get a warning in the text output field\n", + "\n", + "* Copy-paste generation parameters: A text output provides generation parameters in an easy to copy-paste form for easy sharing.\n", + "\n", + "* Correct seeds for batches: If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`.\n", + "\n", + "* Prompt matrix: Separate multiple prompts using the `|` character, and the system will produce an image for every combination of them.\n", + "\n", + "* Loopback for Image2Image: A checkbox for img2img allowing to automatically feed output image as input for the next batch. Equivalent to saving output image, and replacing input image with it.\n", + "\n", + "# Stable Diffusion Web UI\n", + "\n", + "A fully-integrated and easy way to work with Stable Diffusion right from a browser window.\n", + "\n", + "## Streamlit\n", + "\n", + "![](https://github.com/aedhcarrick/sygil-webui/blob/patch-2/images/streamlit/streamlit-t2i.png?raw=1)\n", + "\n", + "**Features:**\n", + "\n", + "- Clean UI with an easy to use design, with support for widescreen displays.\n", + "- Dynamic live preview of your generations\n", + "- Easily customizable presets right from the WebUI (Coming Soon!)\n", + "- An integrated gallery to show the generations for a prompt or session (Coming soon!)\n", + "- Better optimization VRAM usage optimization, less errors for bigger generations.\n", + "- Text2Video - Generate video clips from text prompts right from the WEb UI (WIP)\n", + "- Concepts Library - Run custom embeddings others have made via textual inversion.\n", + "- Actively being developed with new features being added and planned - Stay Tuned!\n", + "- Streamlit is now the new primary UI for the project moving forward.\n", + "- *Currently in active development and still missing some of the features present in the Gradio Interface.*\n", + "\n", + "Please see the [Streamlit Documentation](docs/4.streamlit-interface.md) to learn more.\n", + "\n", + "## Gradio\n", + "\n", + "![](https://github.com/aedhcarrick/sygil-webui/blob/patch-2/images/gradio/gradio-t2i.png?raw=1)\n", + "\n", + "**Features:**\n", + "\n", + "- Older UI design that is fully functional and feature complete.\n", + "- Has access to all upscaling models, including LSDR.\n", + "- Dynamic prompt entry automatically changes your generation settings based on `--params` in a prompt.\n", + "- Includes quick and easy ways to send generations to Image2Image or the Image Lab for upscaling.\n", + "- *Note, the Gradio interface is no longer being actively developed and is only receiving bug fixes.*\n", + "\n", + "Please see the [Gradio Documentation](docs/5.gradio-interface.md) to learn more.\n", + "\n", + "## Image Upscalers\n", + "\n", + "---\n", + "\n", + "### GFPGAN\n", + "\n", + "![](https://github.com/aedhcarrick/sygil-webui/blob/patch-2/images/GFPGAN.png?raw=1)\n", + "\n", + "Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strong the effect is.\n", + "\n", + "If you want to use GFPGAN to improve generated faces, you need to install it separately.\n", + "Download [GFPGANv1.4.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth) and put it\n", + "into the `/sygil-webui/models/gfpgan` directory. \n", + "\n", + "### RealESRGAN\n", + "\n", + "![](https://github.com/aedhcarrick/sygil-webui/blob/patch-2/images/RealESRGAN.png?raw=1)\n", + "\n", + "Lets you double the resolution of generated images. There is a checkbox in every tab to use RealESRGAN, and you can choose between the regular upscaler and the anime version.\n", + "There is also a separate tab for using RealESRGAN on any picture.\n", + "\n", + "Download [RealESRGAN_x4plus.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth) and [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth).\n", + "Put them into the `sygil-webui/models/realesrgan` directory. \n", + "\n", + "\n", + "\n", + "### LSDR\n", + "\n", + "Download **LDSR** [project.yaml](https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1) and [model last.cpkt](https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1). Rename last.ckpt to model.ckpt and place both under `sygil-webui/models/ldsr/`\n", + "\n", + "### GoBig, and GoLatent *(Currently on the Gradio version Only)*\n", + "\n", + "More powerful upscalers that uses a seperate Latent Diffusion model to more cleanly upscale images.\n", + "\n", + "\n", + "\n", + "Please see the [Image Enhancers Documentation](docs/6.image_enhancers.md) to learn more.\n", + "\n", + "-----\n", + "\n", + "### *Original Information From The Stable Diffusion Repo*\n", + "\n", + "# Stable Diffusion\n", + "\n", + "*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*\n", + "\n", + "[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)
\n", + "[Robin Rombach](https://github.com/rromb)\\*,\n", + "[Andreas Blattmann](https://github.com/ablattmann)\\*,\n", + "[Dominik Lorenz](https://github.com/qp-qp)\\,\n", + "[Patrick Esser](https://github.com/pesser),\n", + "[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)
\n", + "\n", + "**CVPR '22 Oral**\n", + "\n", + "which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).\n", + "\n", + "[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion\n", + "model.\n", + "Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. \n", + "Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487), \n", + "this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.\n", + "With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.\n", + "See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).\n", + "\n", + "## Stable Diffusion v1\n", + "\n", + "Stable Diffusion v1 refers to a specific configuration of the model\n", + "architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet\n", + "and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and \n", + "then finetuned on 512x512 images.\n", + "\n", + "*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present\n", + "in its training data. \n", + "Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).\n", + "\n", + "## Comments\n", + "\n", + "- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)\n", + " and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). \n", + " Thanks for open-sourcing!\n", + "\n", + "- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). \n", + "\n", + "## BibTeX\n", + "\n", + "```\n", + "@misc{rombach2021highresolution,\n", + " title={High-Resolution Image Synthesis with Latent Diffusion Models}, \n", + " author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n", + " year={2021},\n", + " eprint={2112.10752},\n", + " archivePrefix={arXiv},\n", + " primaryClass={cs.CV}\n", + "}\n", + "\n", + "```" + ], + "metadata": { + "id": "z4kQYMPQn4d-" + } + }, + { + "cell_type": "markdown", + "source": [ + "# Config options for Colab instance\n", + "> Before running, make sure GPU backend is enabled. (Unless you plan on generating with Stable Horde)\n", + ">> Runtime -> Change runtime type -> Hardware Accelerator -> GPU (Make sure to save)" + ], + "metadata": { + "id": "iegma7yteERV" + } + }, + { + "cell_type": "code", + "source": [ + "#@markdown WebUI repo (and branch)\n", + "repo_name = \"Sygil-Dev/sygil-webui\" #@param {type:\"string\"}\n", + "repo_branch = \"dev\" #@param {type:\"string\"}\n", + "\n", + "#@markdown Mount Google Drive\n", + "mount_google_drive = True #@param {type:\"boolean\"}\n", + "save_outputs_to_drive = True #@param {type:\"boolean\"}\n", + "#@markdown Folder in Google Drive to search for custom models\n", + "MODEL_DIR = \"\" #@param {type:\"string\"}\n", + "\n", + "#@markdown Enter auth token from Huggingface.co\n", + "#@markdown >(required for downloading stable diffusion model.)\n", + "HF_TOKEN = \"\" #@param {type:\"string\"}\n", + "\n", + "#@markdown Select which models to prefetch\n", + "STABLE_DIFFUSION = True #@param {type:\"boolean\"}\n", + "WAIFU_DIFFUSION = False #@param {type:\"boolean\"}\n", + "TRINART_SD = False #@param {type:\"boolean\"}\n", + "SD_WD_LD_TRINART_MERGED = False #@param {type:\"boolean\"}\n", + "GFPGAN = True #@param {type:\"boolean\"}\n", + "REALESRGAN = True #@param {type:\"boolean\"}\n", + "LDSR = True #@param {type:\"boolean\"}\n", + "BLIP_MODEL = False #@param {type:\"boolean\"}\n", + "\n" + ], + "metadata": { + "id": "OXn96M9deVtF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Setup\n", + "\n", + ">Runtime will crash when installing conda. This is normal as we are forcing a restart of the runtime from code.\n", + "\n", + ">Just hit \"Run All\" again. 😑" + ], + "metadata": { + "id": "IZjJSr-WPNxB" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "eq0-E5mjSpmP" + }, + "source": [ + "#@title Make sure we have access to GPU backend\n", + "!nvidia-smi -L" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Install miniConda (mamba)\n", + "!pip install condacolab\n", + "import condacolab\n", + "condacolab.install_from_url(\"https://github.com/conda-forge/miniforge/releases/download/4.14.0-0/Mambaforge-4.14.0-0-Linux-x86_64.sh\")\n", + "\n", + "import condacolab\n", + "condacolab.check()\n", + "# The runtime will crash here!!! Don't panic! We planned for this remember?" + ], + "metadata": { + "id": "cDu33xkdJ5mD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Clone webUI repo and download font\n", + "import os\n", + "REPO_URL = os.path.join('https://github.com', repo_name)\n", + "PATH_TO_REPO = os.path.join('/content', repo_name.split('/')[1])\n", + "!git clone {REPO_URL}\n", + "%cd {PATH_TO_REPO}\n", + "!git checkout {repo_branch}\n", + "!git pull\n", + "!wget -O arial.ttf https://github.com/matomo-org/travis-scripts/blob/master/fonts/Arial.ttf?raw=true" + ], + "metadata": { + "id": "pZHGf03Vp305" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Install dependencies\n", + "!mamba install cudatoolkit=11.3 git numpy=1.22.3 pip=20.3 python=3.8.5 pytorch=1.11.0 scikit-image=0.19.2 torchvision=0.12.0 -y\n", + "!python --version\n", + "!pip install -r requirements.txt" + ], + "metadata": { + "id": "dmN2igp5Yk3z" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Install localtunnel to openGoogle's ports\n", + "!npm install localtunnel" + ], + "metadata": { + "id": "Nxaxfgo_F8Am" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Mount Google Drive (if selected)\n", + "if mount_google_drive:\n", + " # Mount google drive to store outputs.\n", + " from google.colab import drive\n", + " drive.mount('/content/drive/', force_remount=True)\n", + "\n", + "if save_outputs_to_drive:\n", + " # Make symlink to redirect downloads\n", + " OUTPUT_PATH = os.path.join('/content/drive/MyDrive', repo_name.split('/')[1], 'outputs')\n", + " os.makedirs(OUTPUT_PATH, exist_ok=True)\n", + " os.symlink(OUTPUT_PATH, os.path.join(PATH_TO_REPO, 'outputs'), target_is_directory=True)\n" + ], + "metadata": { + "id": "pcSWo9Zkzbsf" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#@title Pre-fetch models\n", + "%cd {PATH_TO_REPO}\n", + "# make list of models we want to download\n", + "model_list = {\n", + " 'stable_diffusion': f'{STABLE_DIFFUSION}',\n", + " 'waifu_diffusion': f'{WAIFU_DIFFUSION}',\n", + " 'trinart_stable_diffusion': f'{TRINART_SD}',\n", + " 'sd_wd_ld_trinart_merged': f'{SD_WD_LD_TRINART_MERGED}',\n", + " 'gfpgan': f'{GFPGAN}',\n", + " 'realesrgan': f'{REALESRGAN}',\n", + " 'ldsr': f'{LDSR}',\n", + " 'blip_model': f'{BLIP_MODEL}'}\n", + "download_list = {k for (k,v) in model_list.items() if v == 'True'}\n", + "\n", + "# get model info (file name, download link, save location)\n", + "import yaml\n", + "from pprint import pprint\n", + "with open('configs/webui/webui_streamlit.yaml') as f:\n", + " dataMap = yaml.safe_load(f)\n", + "models = dataMap['model_manager']['models']\n", + "\n", + "# copy script from model manager\n", + "import requests, time\n", + "from requests.auth import HTTPBasicAuth\n", + "\n", + "def download_file(file_name, file_path, file_url):\n", + " os.makedirs(file_path, exist_ok=True)\n", + " if os.path.exists(os.path.join(MODEL_DIR , file_name)):\n", + " print( file_name + \"found in Google Drive\")\n", + " print( \"Creating symlink...\")\n", + " os.symlink(os.path.join(MODEL_DIR , file_name), os.path.join(file_path, file_name))\n", + " elif not os.path.exists(os.path.join(file_path , file_name)):\n", + " print( \"Downloading \" + file_name + \"...\", end=\"\" )\n", + " token = None\n", + " if \"huggingface.co\" in file_url:\n", + " token = HTTPBasicAuth('token', HF_TOKEN)\n", + " try:\n", + " with requests.get(file_url, auth = token, stream=True) as r:\n", + " starttime = time.time()\n", + " r.raise_for_status()\n", + " with open(os.path.join(file_path, file_name), 'wb') as f:\n", + " for chunk in r.iter_content(chunk_size=8192):\n", + " f.write(chunk)\n", + " if ((time.time() - starttime) % 60.0) > 2 :\n", + " starttime = time.time()\n", + " print( \".\", end=\"\" )\n", + " print( \"done\" )\n", + " print( \" \" + file_name + \" downloaded to \\'\" + file_path + \"\\'\" )\n", + " except:\n", + " print( \"Failed to download \" + file_name + \".\" )\n", + " else:\n", + " print( file_name + \" already exists.\" )\n", + "\n", + "# download models in list\n", + "for model in download_list:\n", + " model_name = models[model]['model_name']\n", + " file_info = models[model]['files']\n", + " for file in file_info:\n", + " file_name = file_info[file]['file_name']\n", + " file_url = file_info[file]['download_link']\n", + " if 'save_location' in file_info[file]:\n", + " file_path = file_info[file]['save_location']\n", + " else: \n", + " file_path = models[model]['save_location']\n", + " download_file(file_name, file_path, file_url)\n", + "\n", + "# add custom models not in list\n", + "CUSTOM_MODEL_DIR = os.path.join(PATH_TO_REPO, 'models/custom')\n", + "if MODEL_DIR != \"\":\n", + " MODEL_DIR = os.path.join('/content/drive/MyDrive', MODEL_DIR)\n", + " if os.path.exists(MODEL_DIR):\n", + " custom_models = os.listdir(MODEL_DIR)\n", + " custom_models = [m for m in custom_models if os.path.isfile(MODEL_DIR + '/' + m)]\n", + " os.makedirs(CUSTOM_MODEL_DIR, exist_ok=True)\n", + " print( \"Custom model(s) found: \" )\n", + " for m in custom_models:\n", + " print( \" \" + m )\n", + " os.symlink(os.path.join(MODEL_DIR , m), os.path.join(CUSTOM_MODEL_DIR, m))\n", + "\n" + ], + "metadata": { + "id": "vMdmh81J70yA" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Launch the web ui server\n", + "### (optional) JS to prevent idle timeout:\n", + "Press 'F12' OR ('CTRL' + 'SHIFT' + 'I') OR right click on this website -> inspect. Then click on the console tab and paste in the following code.\n", + "```js,\n", + "function ClickConnect(){\n", + "console.log(\"Working\");\n", + "document.querySelector(\"colab-toolbar-button#connect\").click()\n", + "}\n", + "setInterval(ClickConnect,60000)\n", + "```" + ], + "metadata": { + "id": "pjIjiCuJysJI" + } + }, + { + "cell_type": "code", + "source": [ + "#@title Press play on the music player to keep the tab alive (Uses only 13MB of data)\n", + "%%html\n", + "Press play on the music player to keep the tab alive, then start your generation below (Uses only 13MB of data)
\n", + "