Initial overhaul of documentation (#1262)
Added documents for Streamlit and Gradio. Added new screenshots for each respective version of the UI. Changed `upscalers` to `image enhancers`. Documents currently finished: - linux-installation.md - image-enhancers.md - ReadMe Documents currently still WIP: - streamlit-interface.md - gradio-interface.md Documents still awaiting changes: - windows-installation.md (I'll need help for this one) - custom-models.md - cli.md Feedback is welcomed and needed on any changes that should happen for the documents already finished. (Also fixed small bug with the `webui_streamlit.yaml` file) # Checklist: - [x] I have changed the base branch to `dev` - [x] I have performed a self-review of my own code - [x] I have commented my code in hard-to-understand areas - [x] I have made corresponding changes to the documentation Co-authored-by: ZeroCool <ZeroCool940711@users.noreply.github.com> Co-authored-by: Thomas Mello <work.mello@gmail.com>
122
README.md
@ -1,8 +1,13 @@
|
||||
# Web based UI for Stable Diffusion by [sd-webui](https://github.com/sd-webui)
|
||||
![](images/sd-wui_logo.png)
|
||||
|
||||
# <center>Web-based UI for Stable Diffusion by [sd-webui](https://github.com/sd-webui)</center>
|
||||
|
||||
## [Visit sd-webui's Discord Server](https://discord.gg/gyXNe4NySY) [![Discord Server](https://user-images.githubusercontent.com/5977640/190528254-9b5b4423-47ee-4f24-b4f9-fd13fba37518.png)](https://discord.gg/gyXNe4NySY)
|
||||
|
||||
## Installation instructions for [Windows](https://sd-webui.github.io/stable-diffusion-webui/docs/1.installation.html), [Linux](https://sd-webui.github.io/stable-diffusion-webui/docs/1.linux-installation.html)
|
||||
## Installation instructions for:
|
||||
- **[Windows](https://sd-webui.github.io/stable-diffusion-webui/docs/1.installation.html)**
|
||||
- **[Linux](https://sd-webui.github.io/stable-diffusion-webui/docs/2.linux-installation.html)**
|
||||
|
||||
|
||||
### Want to ask a question or request a feature?
|
||||
|
||||
@ -10,36 +15,123 @@ Come to our [Discord Server](https://discord.gg/gyXNe4NySY) or use [Discussions]
|
||||
|
||||
## Documentation
|
||||
|
||||
[Documentaion is located here](https://sd-webui.github.io/stable-diffusion-webui/)
|
||||
[Documentation is located here](https://sd-webui.github.io/stable-diffusion-webui/)
|
||||
|
||||
## Want to contribute?
|
||||
|
||||
Check the [Contribution Guide](CONTRIBUTING.md)
|
||||
|
||||
[sd-webui](https://github.com/sd-webui) is
|
||||
[sd-webui](https://github.com/sd-webui) is:
|
||||
* ![hlky's avatar](https://avatars.githubusercontent.com/u/106811348?s=40&v=4) [hlky](https://github.com/hlky)
|
||||
* ![ZeroCool940711's avatar](https://avatars.githubusercontent.com/u/5977640?s=40&v=4)[ZeroCool940711](https://github.com/ZeroCool940711)
|
||||
* ![codedealer's avatar](https://avatars.githubusercontent.com/u/4258136?s=40&v=4)[codedealer](https://github.com/codedealer)
|
||||
|
||||
### Project Features:
|
||||
|
||||
* Two great Web UI's to choose from: Streamlit or Gradio
|
||||
* No more manually typing parameters, now all you have to do is write your prompt and adjust sliders
|
||||
* Built-in image enhancers and upscalers, including GFPGAN and realESRGAN
|
||||
* Run additional upscaling models on CPU to save VRAM
|
||||
* Textual inversion 🔥: [info](https://textual-inversion.github.io/) - requires enabling, see [here](https://github.com/hlky/sd-enable-textual-inversion), script works as usual without it enabled
|
||||
* Advanced img2img editor with Mask and crop capabilities
|
||||
* Mask painting 🖌️: Powerful tool for re-generating only specific parts of an image you want to change (currently Gradio only)
|
||||
* More diffusion samplers 🔥🔥: A great collection of samplers to use, including:
|
||||
- `k_euler` (Default)
|
||||
- `k_lms`
|
||||
- `k_euler_a`
|
||||
- `k_dpm_2`
|
||||
- `k_dpm_2_a`
|
||||
- `k_heun`
|
||||
- `PLMS`
|
||||
- `DDIM`
|
||||
|
||||
* Loopback ➿: Automatically feed the last generated sample back into img2img
|
||||
* Prompt Weighting 🏋️: Adjust the strength of different terms in your prompt
|
||||
* Selectable GPU usage with `--gpu <id>`
|
||||
* Memory Monitoring 🔥: Shows VRAM usage and generation time after outputting
|
||||
* Word Seeds 🔥: Use words instead of seed numbers
|
||||
* CFG: Classifier free guidance scale, a feature for fine-tuning your output
|
||||
* Automatic Launcher: Activate conda and run Stable Diffusion with a single command
|
||||
* Lighter on VRAM: 512x512 Text2Image & Image2Image tested working on 4GB
|
||||
* Prompt validation: If your prompt is too long, you will get a warning in the text output field
|
||||
* Copy-paste generation parameters: A text output provides generation parameters in an easy to copy-paste form for easy sharing.
|
||||
* Correct seeds for batches: If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`.
|
||||
* Prompt matrix: Separate multiple prompts using the `|` character, and the system will produce an image for every combination of them.
|
||||
* Loopback for Image2Image: A checkbox for img2img allowing to automatically feed output image as input for the next batch. Equivalent to saving output image, and replacing input image with it.
|
||||
|
||||
|
||||
# Stable Diffusion Web UI
|
||||
A fully-integrated and easy way to work with Stable Diffusion right from a browser window.
|
||||
|
||||
## Streamlit
|
||||
|
||||
![](images/streamlit/streamlit-t2i.png)
|
||||
|
||||
**Features:**
|
||||
- Clean UI with an easy to use design, with support for widescreen displays.
|
||||
- Dynamic live preview of your generations
|
||||
- Easily customizable presets right from the WebUI (Coming Soon!)
|
||||
- An integrated gallery to show the generations for a prompt or session (Coming soon!)
|
||||
- Better optimization VRAM usage optimization, less errors for bigger generations.
|
||||
- Text2Video - Generate video clips from text prompts right from the WEb UI (WIP)
|
||||
- Concepts Library - Run custom embeddings others have made via textual inversion.
|
||||
- Actively being developed with new features being added and planned - Stay Tuned!
|
||||
- Streamlit is now the new primary UI for the project moving forward.
|
||||
- *Currently in active development and still missing some of the features present in the Gradio Interface.*
|
||||
|
||||
Please see the [Streamlit Documentation](docs/4.streamlit-interface.md) to learn more.
|
||||
|
||||
|
||||
## Gradio
|
||||
|
||||
### Features
|
||||
![](images/gradio/gradio-t2i.png)
|
||||
|
||||
### Screenshots
|
||||
**Features:**
|
||||
- Older UI design that is fully functional and feature complete.
|
||||
- Has access to all upscaling models, including LSDR.
|
||||
- Dynamic prompt entry automatically changes your generation settings based on `--params` in a prompt.
|
||||
- Includes quick and easy ways to send generations to Image2Image or the Image Lab for upscaling.
|
||||
- *Note, the Gradio interface is no longer being actively developed and is only receiving bug fixes.*
|
||||
|
||||
## Streamlit
|
||||
|
||||
### Features
|
||||
|
||||
### Screenshots
|
||||
Please see the [Gradio Documentation](docs/5.gradio-interface.md) to learn more.
|
||||
|
||||
|
||||
## Image Upscalers
|
||||
|
||||
---
|
||||
|
||||
--------------
|
||||
### GFPGAN
|
||||
|
||||
![](images/GFPGAN.png)
|
||||
|
||||
Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strong the effect is.
|
||||
|
||||
If you want to use GFPGAN to improve generated faces, you need to install it separately.
|
||||
Download [GFPGANv1.3.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth) and put it
|
||||
into the `/stable-diffusion-webui/src/gfpgan/experiments/pretrained_models` directory.
|
||||
|
||||
### RealESRGAN
|
||||
|
||||
![](images/RealESRGAN.png)
|
||||
|
||||
Lets you double the resolution of generated images. There is a checkbox in every tab to use RealESRGAN, and you can choose between the regular upscaler and the anime version.
|
||||
There is also a separate tab for using RealESRGAN on any picture.
|
||||
|
||||
Download [RealESRGAN_x4plus.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth) and [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth).
|
||||
Put them into the `stable-diffusion-webui/src/realesrgan/experiments/pretrained_models` directory.
|
||||
|
||||
### GoBig, LSDR, and GoLatent *(Currently Gradio Only)*
|
||||
|
||||
More powerful upscalers that uses a seperate Latent Diffusion model to more cleanly upscale images.
|
||||
|
||||
Download **LDSR** [project.yaml](https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1) and [ model last.cpkt](https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1). Rename last.ckpt to model.ckpt and place both under stable-diffusion-webui/src/latent-diffusion/experiments/pretrained_models/
|
||||
|
||||
Please see the [Image Enhancers Documentation](docs/5.image_enhancers.md) to learn more.
|
||||
|
||||
-----
|
||||
|
||||
### *Original Information From The Stable Diffusion Repo*
|
||||
# Stable Diffusion
|
||||
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*
|
||||
|
||||
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
|
||||
@ -62,6 +154,8 @@ this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on te
|
||||
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
|
||||
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
|
||||
|
||||
## Stable Diffusion v1
|
||||
|
||||
Stable Diffusion v1 refers to a specific configuration of the model
|
||||
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
|
||||
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
|
||||
@ -71,6 +165,8 @@ then finetuned on 512x512 images.
|
||||
in its training data.
|
||||
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
|
||||
|
||||
## Comments
|
||||
|
||||
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
|
||||
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
|
||||
Thanks for open-sourcing!
|
||||
@ -78,7 +174,7 @@ Thanks for open-sourcing!
|
||||
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
|
||||
|
||||
|
||||
BibTeX
|
||||
## BibTeX
|
||||
|
||||
```
|
||||
@misc{rombach2021highresolution,
|
||||
|
@ -55,7 +55,7 @@ general:
|
||||
optimized_turbo: False
|
||||
optimized_config: "optimizedSD/v1-inference.yaml"
|
||||
enable_attention_slicing: False
|
||||
enable_minimal_memory_usage : False
|
||||
enable_minimal_memory_usage: False
|
||||
update_preview: True
|
||||
update_preview_frequency: 10
|
||||
|
||||
|
@ -1,88 +0,0 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
# Setting command line options
|
||||
|
||||
Edit `scripts/relauncher.py` `python scripts/webui.py` becomes `python scripts/webui.py --no-half --precision=full`
|
||||
|
||||
# List of command line options
|
||||
|
||||
```
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--ckpt CKPT path to checkpoint of model (default: models/ldm/stable-diffusion-v1/model.ckpt)
|
||||
--cli CLI don't launch web server, take Python function kwargs from this file. (default: None)
|
||||
--config CONFIG path to config which constructs model (default: configs/stable-diffusion/v1-inference.yaml)
|
||||
--defaults DEFAULTS path to configuration file providing UI defaults, uses same format as cli parameter (default:
|
||||
configs/webui/webui.yaml)
|
||||
--esrgan-cpu run ESRGAN on cpu (default: False)
|
||||
--esrgan-gpu ESRGAN_GPU
|
||||
run ESRGAN on specific gpu (overrides --gpu) (default: 0)
|
||||
--extra-models-cpu run extra models (GFGPAN/ESRGAN) on cpu (default: False)
|
||||
--extra-models-gpu run extra models (GFGPAN/ESRGAN) on gpu (default: False)
|
||||
--gfpgan-cpu run GFPGAN on cpu (default: False)
|
||||
--gfpgan-dir GFPGAN_DIR
|
||||
GFPGAN directory (default: ./src/gfpgan)
|
||||
--gfpgan-gpu GFPGAN_GPU
|
||||
run GFPGAN on specific gpu (overrides --gpu) (default: 0)
|
||||
--gpu GPU choose which GPU to use if you have multiple (default: 0)
|
||||
--grid-format GRID_FORMAT
|
||||
png for lossless png files; jpg:quality for lossy jpeg; webp:quality for lossy webp, or
|
||||
webp:-compression for lossless webp (default: jpg:95)
|
||||
--inbrowser automatically launch the interface in a new tab on the default browser (default: False)
|
||||
--ldsr-dir LDSR_DIR LDSR directory (default: ./src/latent-diffusion)
|
||||
--n_rows N_ROWS rows in the grid; use -1 for autodetect and 0 for n_rows to be same as batch_size (default:
|
||||
-1) (default: -1)
|
||||
--no-half do not switch the model to 16-bit floats (default: False)
|
||||
--no-progressbar-hiding
|
||||
do not hide progressbar in gradio UI (we hide it because it slows down ML if you have hardware
|
||||
accleration in browser) (default: False)
|
||||
--no-verify-input do not verify input to check if it's too long (default: False)
|
||||
--optimized-turbo alternative optimization mode that does not save as much VRAM but runs siginificantly faster
|
||||
(default: False)
|
||||
--optimized load the model onto the device piecemeal instead of all at once to reduce VRAM usage at the
|
||||
cost of performance (default: False)
|
||||
--outdir_img2img [OUTDIR_IMG2IMG]
|
||||
dir to write img2img results to (overrides --outdir) (default: None)
|
||||
--outdir_imglab [OUTDIR_IMGLAB]
|
||||
dir to write imglab results to (overrides --outdir) (default: None)
|
||||
--outdir_txt2img [OUTDIR_TXT2IMG]
|
||||
dir to write txt2img results to (overrides --outdir) (default: None)
|
||||
--outdir [OUTDIR] dir to write results to (default: None)
|
||||
--filename_format [FILENAME_FORMAT]
|
||||
filenames format (default: None)
|
||||
--port PORT choose the port for the gradio webserver to use (default: 7860)
|
||||
--precision {full,autocast}
|
||||
evaluate at this precision (default: autocast)
|
||||
--realesrgan-dir REALESRGAN_DIR
|
||||
RealESRGAN directory (default: ./src/realesrgan)
|
||||
--realesrgan-model REALESRGAN_MODEL
|
||||
Upscaling model for RealESRGAN (default: RealESRGAN_x4plus)
|
||||
--save-metadata Store generation parameters in the output png. Drop saved png into Image Lab to read
|
||||
parameters (default: False)
|
||||
--share-password SHARE_PASSWORD
|
||||
Sharing is open by default, use this to set a password. Username: webui (default: None)
|
||||
--share Should share your server on gradio.app, this allows you to use the UI from your mobile app
|
||||
(default: False)
|
||||
--skip-grid do not save a grid, only individual samples. Helpful when evaluating lots of samples (default:
|
||||
False)
|
||||
--skip-save do not save indiviual samples. For speed measurements. (default: False)
|
||||
--no-job-manager Don't use the experimental job manager on top of gradio (default: False)
|
||||
--max-jobs MAX_JOBS Maximum number of concurrent 'generate' commands (default: 1)
|
||||
--tiling Generate tiling images (default: False)
|
||||
```
|
114
docs/4.streamlit-interface.md
Normal file
@ -0,0 +1,114 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
|
||||
# Streamlit Web UI Interface
|
||||
|
||||
**Features:**
|
||||
- Clean UI with an easy to use design, with support for widescreen displays.
|
||||
- Dynamic live preview of your generations
|
||||
- Easily customizable presets right from the WebUI (Coming Soon!)
|
||||
- An integrated gallery to show the generations for a prompt or session (Coming soon!)
|
||||
- Better optimization VRAM usage optimization, less errors for bigger generations.
|
||||
- Text2Video - Generate video clips from text prompts right from the WEb UI (WIP)
|
||||
- Concepts Library - Run custom embeddings others have made via textual inversion.
|
||||
- Actively being developed with new features being added and planned - Stay Tuned!
|
||||
- Streamlit is now the new primary UI for the project moving forward.
|
||||
- *Currently in active development and still missing some of the features present in the Gradio Interface.*
|
||||
|
||||
### Launching The Streamlit Web UI
|
||||
|
||||
To launch the Streamlit Web UI, you will need to do the following:
|
||||
|
||||
- Windows:
|
||||
- Open your command line in the repo folder and run the `webui_streamlit.cmd` file.
|
||||
- Linux:
|
||||
- Open your terminal to the repo folder and run `webui.sh`, then press `1` when prompted.
|
||||
- Manually:
|
||||
- Open your terminal to the repo folder.
|
||||
- Activate the conda environment using `conda activate ldm`
|
||||
- Run the command `python -m streamlit run scripts/webui_streamlit.py`
|
||||
|
||||
Once the Streamlit Web UI launches, a new browser tab will open with the interface. A link will also appear in your terminal to allow you to copy and paste it as needed.
|
||||
|
||||
## Text2Image
|
||||
---
|
||||
|
||||
![](../images/streamlit/streamlit-t2i.png)
|
||||
|
||||
Streamlit Text2Image allows for a modern, but well known, Stable Diffusion Textual Image generation experience. Here is a quick description of some of the features of Text2Image and what they do:
|
||||
|
||||
- Width and Height: Control the size of the generated image (Default is 512px)
|
||||
- Classifer Free Guidance (CFG): How closely the final image should follow your prompt (Default is 7.5)
|
||||
- Seed: The number (or word) used to generate an image with
|
||||
- Images Per Batch: The number of images to generate consecutively (Does not affect VRAM)
|
||||
- Number of Batches: How many images to generate at once (Very VRAM Intensive)
|
||||
- Sampling Steps: The quality of the final output, higher is better with dimiishing returns (Default is 30)
|
||||
- Sampling Method: Which sampler to use to generate the image (Default is `k_euler`)
|
||||
|
||||
## Image2Image
|
||||
--
|
||||
|
||||
![](../images/streamlit/streamlit-i2i.png)
|
||||
|
||||
Streamlit Image2Image allows for you to take an image, be it generated by Stable Diffusion or otherwise, and use it as a base for another geenration. This has the potential to really enhance images and fix issues with initial Text2Image generations. It also includes some built-in drawing and masking tools to help create custom generations. Some notable features of Gradio Image2Image are:
|
||||
|
||||
- Image Editor Mode: Choose whether you wish to mask, crop, or uncrop the image
|
||||
- Mask Mode: Alloows you to decide if a drawn mask should be generated or kept
|
||||
- Denoising Strength: How much of the generated image should replace the original image. (default is 75%)
|
||||
- Width and Height: Control the size of the generated image (Default is 512px)
|
||||
- Classifer Free Guidance (CFG): How closely the final image should follow your prompt (Default is 7.5)
|
||||
- Seed: The number (or word) used to generate an image with
|
||||
- Images Per Batch: The number of images to generate consecutively (Does not affect VRAM)
|
||||
- Number of Batches: How many images to generate at once (Very VRAM Intensive)
|
||||
- Sampling Steps: The quality of the final output, higher is better with dimiishing returns (Default is 30)
|
||||
- Sampling Method: Which sampler to use to generate the image (Default is `k_euler`)
|
||||
|
||||
## Text2Video
|
||||
---
|
||||
|
||||
![](../images/streamlit/streamlit-t2v.png)
|
||||
|
||||
*Insert details of how to use T2V here*
|
||||
(ZeroCool neds to fill in details here of how Text2Video works)
|
||||
|
||||
## SD Concepts Library
|
||||
---
|
||||
|
||||
![](../images/streamlit/streamlit-concepts.png)
|
||||
|
||||
The Concept Library allows for the easy usage of custom textual inversion models. These models may be loaded into `models/custom/sd-concepts-library` and will appear in the Concepts Library in Streamlit. To use one of these custom models in a prompt, either copy it using the button on the model, or type `<model-name>` in the prompt where you wish to use it.
|
||||
|
||||
Please see the [Concepts Library](https://github.com/sd-webui/stable-diffusion-webui/blob/master/docs/7.concepts-library.md) section to learn more about how to use these tools.
|
||||
|
||||
## Textual Inversion
|
||||
---
|
||||
|
||||
TBD
|
||||
|
||||
## Model Manager
|
||||
---
|
||||
|
||||
TBD
|
||||
|
||||
## Settings
|
||||
---
|
||||
|
||||
*This section of the Web UI is still in development*
|
||||
|
||||
This area allows you to custmoize how you want Streamlit to run. These changes will be saved to `configs/webui/userconfig_streamlit.yaml`.
|
192
docs/5.gradio-interface.md
Normal file
@ -0,0 +1,192 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
# Gradio Web UI Interface
|
||||
|
||||
### Gradio Web UI Features:
|
||||
- Older UI design that is fully functional and feature complete.
|
||||
- Has access to all upscaling models, including LSDR.
|
||||
- Dynamic prompt entry automatically changes your generation settings based on `--params` in a prompt.
|
||||
- Includes quick and easy ways to send generations to Image2Image or the Image Lab for upscaling.
|
||||
- *Note, the Gradio interface is no longer being actively developed and is only receiving bug fixes.*
|
||||
|
||||
### Launching The Gradio Web UI
|
||||
|
||||
To launch the Gradio Web UI, you will need to do the following:
|
||||
|
||||
- Windows:
|
||||
- Open your command line in the repo folder and run the `webui.cmd` file.
|
||||
- Linux:
|
||||
- Open your terminal to the repo folder and run `webui.sh`, then press `2` when prompted.
|
||||
- Manually:
|
||||
- Open your terminal to the repo folder.
|
||||
- Activate the conda environment using `conda activate ldm`
|
||||
- Run the command `python scripts/relauncher.py`
|
||||
|
||||
Once the Gradio Web UI launches, a link will appear in your command line or terminal, click or copy and paste that link into your browser to access the interface.
|
||||
|
||||
## Text2Image
|
||||
---
|
||||
|
||||
![](../images/gradio/gradio-t2i.png)
|
||||
|
||||
Gradio Text2Image allows for the classic and well known Stable Diffusion Textual Image generation. Here is a quick description of some of the features of Text2Image and what they do:
|
||||
|
||||
- Width and Height: Control the size of the generated image (Default is 512px)
|
||||
- Classifer Free Guidance (CFG): How closely the final image should follow your prompt (Default is 7.5)
|
||||
- Seed: The number (or word) used to generate an image with
|
||||
- Images Per Batch: The number of images to generate consecutively (Does not affect VRAM)
|
||||
- Number of Batches: How many images to generate at once (Very VRAM Intensive)
|
||||
- Sampling Steps: The quality of the final output, higher is better with dimiishing returns (Default is 50)
|
||||
- Sampling Method: Which sampler to use to generate the image (Default is `k_lms`)
|
||||
- Push to Img2Img: Send the image to the Image2Image tool to continue working with it via Stable Diffusion
|
||||
- Send to Image Lab: Send the image to the Image Lab for Enhancement and Upscaling.
|
||||
|
||||
## Image2Image
|
||||
---
|
||||
|
||||
![](../images/gradio/gradio-i2i.png)
|
||||
|
||||
Gradio Image2Image allows for you to take an image, be it generated by Stable Diffusion or otherwise, and use it as a base for another geenration. This has the potential to really enhance images and fix issues with initial Text2Image generations. It also includes some built-in drawing and masking tools to help create custom generations. Some notable features of Gradio Image2Image are:
|
||||
|
||||
- Image Editor Mode: Choose whether you wish to mask, crop, or uncrop the image
|
||||
- Mask Mode: Alloows you to decide if a drawn mask should be generated or kept
|
||||
- Denoising Strength: How much of the generated image should replace the original image. (default is 70%)
|
||||
- Width and Height: Control the size of the generated image (Default is 512px)
|
||||
- Classifer Free Guidance (CFG): How closely the final image should follow your prompt (Default is 7.5)
|
||||
- Seed: The number (or word) used to generate an image with
|
||||
- Images Per Batch: The number of images to generate consecutively (Does not affect VRAM)
|
||||
- Number of Batches: How many images to generate at once (Very VRAM Intensive)
|
||||
- Sampling Steps: The quality of the final output, higher is better with dimiishing returns (Default is 50)
|
||||
- Sampling Method: Which sampler to use to generate the image (Default is `k_lms`)
|
||||
|
||||
## Image Lab
|
||||
---
|
||||
|
||||
![](../images/gradio/gradio-upscale.png)
|
||||
|
||||
The Gradio Image Lab is a central location to access image enhancers and upscalers. Though some options are available in all tabs (GFPGAN and realESRGAN), the Image Lab is where all of these tools may be easily accessed. These upscalers can be used for geenrated images sent to the lab, or on other images uploaded to it. The tools included here are:
|
||||
|
||||
- GFPGAN: Fixes and enhances faces
|
||||
- realESRGAN: A 4x upscaler that uses a GAN to achieve its results
|
||||
- GoBig: A 2x upsclaer that uses realESRGAN, but preserves more detail
|
||||
- LSDR: A 4x upsclaer that uses Latent Diffusion, preserving a lot more detail at the cost of speed and VRAM
|
||||
- GoLatent: Uses LSDR to do a 4x upscale, then GoBig to make a final 8x upscale with great detail preservation.
|
||||
|
||||
Please see the [Image Enhancers](https://github.com/sd-webui/stable-diffusion-webui/blob/master/docs/6.image_enhancers.md) section to learn more about how to use these tools.
|
||||
|
||||
|
||||
## Gradio Optional Customizations
|
||||
---
|
||||
|
||||
Gradio allows for a number of possible customizations via command line arguments/terminal parameters. If you are running these manually, they would need to be run like this: `python scripts/webui.py --param`. Otherwise, you may add your own parameter customizations to `scripts/relauncher.py`, the program that automatically relaunches the Gradio interface should a crash happen.
|
||||
|
||||
Inside of `relauncher.py` are a few preset defaults most people would likely access:
|
||||
|
||||
```
|
||||
# Run upscaling models on the CPU
|
||||
extra_models_cpu = False
|
||||
|
||||
# Automatically open a new browser window or tab on first launch
|
||||
open_in_browser = False
|
||||
|
||||
# Run Stable Diffusion in Optimized Mode - Only requires 4Gb of VRAM, but is significantly slower
|
||||
optimized = False
|
||||
|
||||
# Run in Optimized Turbo Mode - Needs more VRAM than regular optimized mode, but is faster
|
||||
optimized_turbo = False
|
||||
|
||||
# Creates a public xxxxx.gradio.app share link to allow others to use your interface (requires properly forwarded ports to work correctly)
|
||||
share = False
|
||||
|
||||
# Generate tiling images
|
||||
tiling = False
|
||||
```
|
||||
Setting any of these to `True` will enable those parameters on every launch. Alternatively, if you wish to enable a `--parameter` not listed here, you can enter your own custom ones in this field inside of `scripts/relauncher.py`:
|
||||
|
||||
```
|
||||
# Enter other `--arguments` you wish to use - Must be entered as a `--argument ` syntax
|
||||
additional_arguments = ""
|
||||
```
|
||||
|
||||
## List of command line options
|
||||
---
|
||||
|
||||
This is a list of the full set of optional parameters you can launch the Gradio Interface with.
|
||||
|
||||
```
|
||||
usage: webui.py [-h] [--ckpt CKPT] [--cli CLI] [--config CONFIG] [--defaults DEFAULTS] [--esrgan-cpu] [--esrgan-gpu ESRGAN_GPU] [--extra-models-cpu] [--extra-models-gpu] [--gfpgan-cpu] [--gfpgan-dir GFPGAN_DIR] [--gfpgan-gpu GFPGAN_GPU] [--gpu GPU]
|
||||
[--grid-format GRID_FORMAT] [--inbrowser] [--ldsr-dir LDSR_DIR] [--n_rows N_ROWS] [--no-half] [--no-progressbar-hiding] [--no-verify-input] [--optimized-turbo] [--optimized] [--outdir_img2img [OUTDIR_IMG2IMG]] [--outdir_imglab [OUTDIR_IMGLAB]]
|
||||
[--outdir_txt2img [OUTDIR_TXT2IMG]] [--outdir [OUTDIR]] [--filename_format [FILENAME_FORMAT]] [--port PORT] [--precision {full,autocast}] [--realesrgan-dir REALESRGAN_DIR] [--realesrgan-model REALESRGAN_MODEL] [--save-metadata]
|
||||
[--share-password SHARE_PASSWORD] [--share] [--skip-grid] [--skip-save] [--no-job-manager] [--max-jobs MAX_JOBS] [--tiling]
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--ckpt CKPT path to checkpoint of model (default: models/ldm/stable-diffusion-v1/model.ckpt)
|
||||
--cli CLI don't launch web server, take Python function kwargs from this file. (default: None)
|
||||
--config CONFIG path to config which constructs model (default: configs/stable-diffusion/v1-inference.yaml)
|
||||
--defaults DEFAULTS path to configuration file providing UI defaults, uses same format as cli parameter (default: configs/webui/webui.yaml)
|
||||
--esrgan-cpu run ESRGAN on cpu (default: False)
|
||||
--esrgan-gpu ESRGAN_GPU
|
||||
run ESRGAN on specific gpu (overrides --gpu) (default: 0)
|
||||
--extra-models-cpu run extra models (GFGPAN/ESRGAN) on cpu (default: False)
|
||||
--extra-models-gpu run extra models (GFGPAN/ESRGAN) on gpu (default: False)
|
||||
--gfpgan-cpu run GFPGAN on cpu (default: False)
|
||||
--gfpgan-dir GFPGAN_DIR
|
||||
GFPGAN directory (default: ./GFPGAN)
|
||||
--gfpgan-gpu GFPGAN_GPU
|
||||
run GFPGAN on specific gpu (overrides --gpu) (default: 0)
|
||||
--gpu GPU choose which GPU to use if you have multiple (default: 0)
|
||||
--grid-format GRID_FORMAT
|
||||
png for lossless png files; jpg:quality for lossy jpeg; webp:quality for lossy webp, or webp:-compression for lossless webp (default: jpg:95)
|
||||
--inbrowser automatically launch the interface in a new tab on the default browser (default: False)
|
||||
--ldsr-dir LDSR_DIR LDSR directory (default: ./LDSR)
|
||||
--n_rows N_ROWS rows in the grid; use -1 for autodetect and 0 for n_rows to be same as batch_size (default: -1) (default: -1)
|
||||
--no-half do not switch the model to 16-bit floats (default: False)
|
||||
--no-progressbar-hiding
|
||||
do not hide progressbar in gradio UI (we hide it because it slows down ML if you have hardware accleration in browser) (default: False)
|
||||
--no-verify-input do not verify input to check if it's too long (default: False)
|
||||
--optimized-turbo alternative optimization mode that does not save as much VRAM but runs siginificantly faster (default: False)
|
||||
--optimized load the model onto the device piecemeal instead of all at once to reduce VRAM usage at the cost of performance (default: False)
|
||||
--outdir_img2img [OUTDIR_IMG2IMG]
|
||||
dir to write img2img results to (overrides --outdir) (default: None)
|
||||
--outdir_imglab [OUTDIR_IMGLAB]
|
||||
dir to write imglab results to (overrides --outdir) (default: None)
|
||||
--outdir_txt2img [OUTDIR_TXT2IMG]
|
||||
dir to write txt2img results to (overrides --outdir) (default: None)
|
||||
--outdir [OUTDIR] dir to write results to (default: None)
|
||||
--filename_format [FILENAME_FORMAT]
|
||||
filenames format (default: None)
|
||||
--port PORT choose the port for the gradio webserver to use (default: 7860)
|
||||
--precision {full,autocast}
|
||||
evaluate at this precision (default: autocast)
|
||||
--realesrgan-dir REALESRGAN_DIR
|
||||
RealESRGAN directory (default: ./RealESRGAN)
|
||||
--realesrgan-model REALESRGAN_MODEL
|
||||
Upscaling model for RealESRGAN (default: RealESRGAN_x4plus)
|
||||
--save-metadata Store generation parameters in the output png. Drop saved png into Image Lab to read parameters (default: False)
|
||||
--share-password SHARE_PASSWORD
|
||||
Sharing is open by default, use this to set a password. Username: webui (default: None)
|
||||
--share Should share your server on gradio.app, this allows you to use the UI from your mobile app (default: False)
|
||||
--skip-grid do not save a grid, only individual samples. Helpful when evaluating lots of samples (default: False)
|
||||
--skip-save do not save indiviual samples. For speed measurements. (default: False)
|
||||
--no-job-manager Don't use the experimental job manager on top of gradio (default: False)
|
||||
--max-jobs MAX_JOBS Maximum number of concurrent 'generate' commands (default: 1)
|
||||
--tiling Generate tiling images (default: False)
|
||||
|
||||
```
|
@ -1,41 +0,0 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
# **Upscalers**
|
||||
|
||||
### It is currently open to discussion whether all these different **upscalers** should be only usable on their respective standalone tabs.
|
||||
|
||||
### _**Why?**_
|
||||
|
||||
* When you generate a large batch of images, will every image be good enough to deserve upscaling?
|
||||
* Images are upscaled after generation, this delays the generation of the next image in the batch, it's better to wait for the batch to finish then decide whether the image needs upscaling
|
||||
* Clutter: more upscalers = more gui options
|
||||
|
||||
### What if I _need_ upscaling after generation and can't just do it as post-processing?
|
||||
|
||||
* One solution would be to hide the options by default, and have a cli switch to enable them
|
||||
* One issue with the above solution is needing to still maintain gui options for all new upscalers
|
||||
* Another issue is how to ensure people who need it know about it without people who don't need it accidentally activating it then getting unexpected results?
|
||||
|
||||
### Which upscalers are planned?
|
||||
|
||||
* **goBIG**: this was implemented but reverted due to a bug and then the decision was made to wait until other upscalers were added to reimplement
|
||||
* **SwinIR-L 4x**
|
||||
* **CodeFormer**
|
||||
* _**more, suggestions welcome**_
|
||||
* if the idea of keeping all upscalers as post-processing is accepted then a single **'Upscalers'** tab with all upscalers could simplify the process and the UI
|
71
docs/6.image_enhancers.md
Normal file
@ -0,0 +1,71 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
|
||||
# **Upscalers**
|
||||
|
||||
Included with both versions of the Web UI Interface are a series of image restorers and upscalers. They are included to help users create outputs with restored features, such as better faces, or at larger resolutions than Stable Diffusion is able to natively output.
|
||||
|
||||
## GFPGAN
|
||||
---
|
||||
|
||||
![](../images/GFPGAN.png)
|
||||
|
||||
GFPGAN is designed to help restore faces in Stable Diffusion outputs. If you have ever tried to generate images with people in them, you know why having a face restorer can come in handy. This is where GFPGAN comes in handy. It uses it's own GAN to detect and restore the faces of subjects within an image. It greatly helps to enhance the details in human faces, while also fixing issues with asymmetry or awkward looking eyes.
|
||||
|
||||
If you want to use GFPGAN to improve generated faces, you need to download the models for it seperately if you are on Windows or doing so manually on Linux.
|
||||
Download [GFPGANv1.3.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth) and put it
|
||||
into the `/stable-diffusion-webui/src/gfpgan/experiments/pretrained_models` directory after you have setup the conda environment for the first time.
|
||||
|
||||
## RealESRGAN
|
||||
---
|
||||
![](../images/RealESRGAN.png)
|
||||
|
||||
RealESRGAN is a 4x upscaler built into both versions of the Web UI interface. It uses its own GAN to upscale images while retaining details of an image. Two different versions of realESRGAN can be used, `RealESRGAN 4X` and `RealESRGAN 4X Anime`. Despite the name, don't hesitate to try either version when upscaling an image to see which works bert for a given output.
|
||||
|
||||
If you want to use RealESRGAN to upscale your images, you need to download the models for it seperately if you are on Windows or doing so manually on Linux.
|
||||
Download [RealESRGAN_x4plus.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth) and [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth).
|
||||
Put them into the `stable-diffusion-webui/src/realesrgan/experiments/pretrained_models` directory after you have setup the conda environment for the first time.
|
||||
|
||||
## GoBig (Gradio only currently)
|
||||
---
|
||||
|
||||
GoBig is a 2X upscaler that uses RealESRGAN to upscale the image and then slice it into small parts, each part gets diffused further by SD to create more details, great for adding and increasing details but will change the composition, might also fix issues like eyes etc. The settings are similar to Image2Image, with regards to strength and seed of the generation.
|
||||
|
||||
To use GoBig, you will need to download the RealESRGAN models as directed above.
|
||||
|
||||
## Latent Diffusion Super Resolution - LSDR (Gradio only currently)
|
||||
---
|
||||
|
||||
LSDR is a 4X upscaler with high VRAM usage that uses a Latent Diffusion model to upscale the image. This will accentuate the details of an image, but won't change the composition. This might introduce sharpening, but it is great for textures or compositions with plenty of details. However, it is slower and will use more VRAM.
|
||||
|
||||
If you want to use LSDR to upscale your images, you need to download the models for it seperately if you are on Windows or doing so manually on Linux.
|
||||
Download the LDSR [project.yaml](https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1) and [ model last.cpkt](https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1). Rename `last.ckpt` to `model.ckpt` and place both in the `stable-diffusion-webui/src/latent-diffusion/experiments/pretrained_models` directory after you have setup the conda environment for the first time.
|
||||
|
||||
## GoLatent (Gradio only currently)
|
||||
---
|
||||
|
||||
GoLatent is an 8X upscaler with high VRAM usage. It uses GoBig to add details and then uses a Latent Diffusion (LSDR) model to upscale the image. This will result in less artifacting and sharpeninng. Use the settings to feed GoBig settings that will contribute to the result. Please note, this mode is considerably slower and uses significantly more VRAM.
|
||||
|
||||
To use GoLatent, you will need to download the appropriate LSDR models as described above.
|
||||
|
||||
---
|
||||
|
||||
## Future Additions
|
||||
|
||||
Currently, these are the 4 main enhancers and upscalers used in the project, but more may be implemented in the future. Stay Tuned!
|
20
docs/7.concepts-library.md
Normal file
@ -0,0 +1,20 @@
|
||||
<!--
|
||||
This file is part of stable-diffusion-webui (https://github.com/sd-webui/stable-diffusion-webui/).
|
||||
|
||||
Copyright 2022 sd-webui team.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
-->
|
||||
|
||||
|
||||
TBD
|
BIN
images/gradio/gradio-i2i.png
Normal file
After Width: | Height: | Size: 1.3 MiB |
BIN
images/gradio/gradio-t2i.png
Normal file
After Width: | Height: | Size: 840 KiB |
BIN
images/gradio/gradio-upscale.png
Normal file
After Width: | Height: | Size: 2.4 MiB |
BIN
images/sd-wui_logo.png
Normal file
After Width: | Height: | Size: 454 KiB |
BIN
images/streamlit/streamlit-concepts.png
Normal file
After Width: | Height: | Size: 2.4 MiB |
BIN
images/streamlit/streamlit-i2i.png
Normal file
After Width: | Height: | Size: 1.8 MiB |
BIN
images/streamlit/streamlit-t2i.png
Normal file
After Width: | Height: | Size: 962 KiB |
BIN
images/streamlit/streamlit-t2v.png
Normal file
After Width: | Height: | Size: 83 KiB |