-Changes wording on Memory Monitor to not alarm users (happened a few
times)
-Adds config settings "use_upscaling" and "upscaling_method" to settings
menu, were already in yaml
Allows worker to specify a word blacklist which it will refuse to pickup
Allows the worker to specify a word censorlist, against which it will
always apply the NSFW filter, even if the worker accepts NSFW.
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
Co-authored-by: lukas5450 <46075099+lukas5450@users.noreply.github.com>
Co-authored-by: JamDon2 <hello@jamdon2.dev>
The bridge will keep looping on the same generation because the
evaluation of "while not seed" will always be False when seed is 0 (or
00000000 etc)
This fixes this. Also allows to request more verbosity on the webui
command
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
Co-authored-by: lukas5450 <46075099+lukas5450@users.noreply.github.com>
- Removed the checkbox to disable the preview image, instead users should increase the frequency at which it is displayed if they have performance issues, after a certain point it no longer affects performance.
# Description
Intermediate image saving in scn2img tries to save metadata which is not
set. This results in warning thrown in console: "Couldn't find metadata
on image", originally reported by @codedealer in
https://github.com/sd-webui/stable-diffusion-webui/pull/1179#pullrequestreview-1120015859
Metadata for intermediate images is added to fix the warning.
Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding
to the intermediate image
- "seed" contains the seed at the start of the function that generated
this intermediate image
- "width" and "height" contain the size of the image.
To get the seed at the start of the render function without using it, a
class SeedGenerator is added and used instead of the python generator
functions.
Fixes warning thrown in console: "> Couldn't find metadata on image",
originally reported by @codedealer in
https://github.com/sd-webui/stable-diffusion-webui/pull/1179#pullrequestreview-1120015859
# Checklist:
- [x] I have changed the base branch to `dev`
- [x] I have performed a self-review of my own code
- [x] I have commented my code in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
This change helps with starting the stable horde bridge, without having
to change the relauncher.py every time. It also allows one to start
multiple bridges (for multiple GPUs) by passing the `-n` argument to the
.cmd/.sh
# Summary of the change
- new Scene-to-Image tab
- new scn2img function
- functions for loading and running monocular_depth_estimation with
tensorflow
# Description
(relevant motivation, which issue is fixed)
Related to discussion #925
> Would it be possible to have a layers system where we could do have
foreground, mid, and background objects which relate to one another and
share the style? So we could say generate a landscape, one another layer
generate a castle, and on another layer generate a crowd of people.
To make this work I made a prompt-based layering system in a new
"Scene-to-Image" tab.
You write a a multi-line prompt that looks like markdown, where each
section declares one layer.
It is hierarchical, so each layer can have their own child layers.
Examples: https://imgur.com/a/eUxd5qn
![](https://i.imgur.com/L61w00Q.png)
In the frontend you can find a brief documentation for the syntax,
examples and reference for the various arguments.
Here a short summary:
Sections with "prompt" and child layers are img2img, without child
layers they are txt2img.
Without "prompt" they are just images, useful for mask selection, image
composition, etc.
Images can be initialized with "color", resized with "resize" and their
position specified with "pos".
Rotation and rotation center are "rotation" and "center".
Mask can automatically be selected by color or by estimated depth based
on https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter.
![](https://i.imgur.com/8rMHWmZ.png)
# Additional dependencies that are required for this change
For mask selection by monocular depth estimation tensorflow is required
and the model must be cloned to ./src/monocular_depth_estimation/
Changes in environment.yaml:
- einops>=0.3.0
- tensorflow>=2.10.0
Einops must be allowed to be newer for tensorflow to work.
# Checklist:
- [x] I have changed the base branch to `dev`
- [x] I have performed a self-review of my own code
- [x] I have commented my code in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
# Adds the bridge code which when enabled turns the webui into a
headless [Stable Horde](https://stablehorde.net) instance
It adds a few new command-line args to be able to pass variables to the
bridge, as well as the possibility to set it via a variables files
`bridgeData.py`.
To start the bridge, one needs to add the `--bridge` argument to their
relauncher.py as well as any horde vars they want to specify.
On top of that this adds the loguru module as well as my tuned loguru
config. This provides a much nicer logging output and provides the
capability to save output to files for issue reports etc. For now only
the bridge is utilizing the nice format, but once it's merged, you can
start replacing `print()` with `logger.xxx()` where appropriate
To make the bridge work, I've had to add defaults to txt2img but this
should not affect anything.
# Checklist:
- [ x ] I have changed the base branch to `dev`
- [ x ] I have performed a self-review of my own code
- [ x ] I have commented my code in hard-to-understand areas
- [ x ] I have made corresponding changes to the documentation
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
Co-authored-by: Thomas Mello <work.mello@gmail.com>
Co-authored-by: Joshua Kimsey <jkimsey95@gmail.com>
Co-authored-by: ZeroCool <ZeroCool940711@users.noreply.github.com>
Added the Txt2Img settings from the `configs/webui/webui_streamlit.yaml` file. All are working except `separate_prompts`, which throws a `Missing key separate_prompts` error for some reason.
# Checklist:
- [x] I have changed the base branch to `dev`
- [x] I have performed a self-review of my own code
- [x] I have commented my code in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
* Restore --config. This will be useful when you have an init config
that you don't want overwritten.
* Cache the individual transformed images in TextualInversionDataset.
This gains speed by avoiding reading and reprocessing the image each
time it's used for training.
* Turn on no_grad for inference and clean up tensors during
checkpointing. This reduces memory usage slightly.
* Set the sample output size to 384x384. We just need them large enough
for manual evaluation, and this gains us a decent chunk of speed.
* (breaking change) Custom templates are now semicolon-delineated.
Additionally, custom templates are properly passed through to
TextualInversionDataset to generate input_ids for your images. Using
custom templates which accurately describe your input images seems to
improve training fidelity.
* Cache autoencoding of image pixel data. This substantially increases
the speed of training, upwards of 40% for me.
* Clean up a little bit of cruft.
There was a safety check where RealESRGAN and loopback
were not allowed to be on at the same time to prevent
rapidly growing images (I think, at least) which was
causing confusion in the UI since there was no indication
this wasn't allowed. Using ESRGAN just on the final iteration
should be safe, so this commit enables that.
- Improved txt2vid so its now possible to generate high resolution images with less VRAM than before.
- Added condition to ensure that the pipe model from txt2vid is removed from memory when switching from the txt2vid tab to txt2img and viceversa.
* Some options on the Streamlit txt2img page now follow the defaults from the relevant config files.
* Fixed a copy-paste gone wrong in my previous commit.
* st.session_state["defaults"] fix
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
This is a collection of several changes to enhance image display:
* When using GFPGAN or RealESRGAN, only the final output will be
displayed.
* In batch>1 mode, each final image will be collected into an image grid
for display
* The image is constrained to a reasonable size to ensure that batch
grids of RealESRGAN'd images don't end up spitting out a massive image
that the browser then has to handle.
* Additionally, the progress bar indicator is updated as each image is
post-processed.