Merge remote-tracking branch 'origin/dev' into dev

This commit is contained in:
ZeroCool940711 2022-10-02 16:27:43 -07:00
commit 8239c328fd
3 changed files with 53 additions and 0 deletions

View File

@ -92,6 +92,58 @@ The Gradio Image Lab is a central location to access image enhancers and upscale
Please see the [Image Enhancers](6.image_enhancers.md) section to learn more about how to use these tools.
## Scene2Image
---
![](../images/gradio/gradio-s2i.png)
Gradio Scene2Image allows you to define layers of images in a markdown-like syntax.
> Would it be possible to have a layers system where we could do have
foreground, mid, and background objects which relate to one another and
share the style? So we could say generate a landscape, one another layer
generate a castle, and on another layer generate a crowd of people.
You write a a multi-line prompt that looks like markdown, where each section declares one layer.
It is hierarchical, so each layer can have their own child layers.
In the frontend you can find a brief documentation for the syntax, examples and reference for the various arguments.
Here a summary:
Markdown headings, e.g. '# layer0', define layers.
The content of sections define the arguments for image generation.
Arguments are defined by lines of the form 'arg:value' or 'arg=value'.
Layers are hierarchical, i.e. each layer can contain more layers.
The number of '#' increases in the headings of a child layers.
Child layers are blended together by their image masks, like layers in image editors.
By default alpha composition is used for blending.
Other blend modes from [ImageChops](https://pillow.readthedocs.io/en/stable/reference/ImageChops.html) can also be used.
Sections with "prompt" and child layers invoke Image2Image, without child layers they invoke Text2Image.
The result of blending child layers will be the input for Image2Image.
Without "prompt" they are just images, useful for mask selection, image composition, etc.
Images can be initialized with "color", resized with "resize" and their position specified with "pos".
Rotation and rotation center are "rotation" and "center".
Mask can automatically be selected by color, color at pixels of the image, or by estimated depth.
You can chose between two different depth estimation models, see frontend reference for name of arguments.
[Monocular depth estimation](https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter) can be selected as depth model `0`.
[MiDaS depth estimation](https://huggingface.co/spaces/pytorch/MiDaS), used by default, can be selected as depth model `1`.
Depth estimation can be used for traditional 3d reconstruction.
Using `transform3d=True` the pixels of an image can be rendered from another perspective or with a different field of view.
For this you specify pose and field of view that corresponds to the input image and a desired output pose and field of view.
The poses describe the camera position and orientation as x,y,z,rotate_x,rotate_y,rotate_z tuple with angles describing rotations around axes in degrees.
The camera coordinate system is the pinhole camera as described and pictured in [OpenCV "Camera Calibration and 3D Reconstruction" documentation](https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html).
When the camera pose `transform3d_from_pose` where the input image was taken is not specified, the camera pose `transform3d_to_pose` to which the image is to be transformed is in terms of the input camera coordinate system:
Walking forwards one depth unit in the input image corresponds to a position `0,0,1`.
Walking to the right is something like `1,0,0`.
Going downwards is then `0,1,0`.
## Gradio Optional Customizations
---

View File

@ -47,6 +47,7 @@ mkdir -p $MODEL_DIR
MODEL_FILES=(
'model.ckpt models/ldm/stable-diffusion-v1 https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556'
'GFPGANv1.3.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth c953a88f2727c85c3d9ae72e2bd4846bbaf59fe6972ad94130e23e7017524a70'
'GFPGANv1.4.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth e2cd4703ab14f4d01fd1383a8a8b266f9a5833dacee8e6a79d3bf21a1b6be5ad'
'RealESRGAN_x4plus.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth 4fa0d38905f75ac06eb49a7951b426670021be3018265fd191d2125df9d682f1'
'RealESRGAN_x4plus_anime_6B.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth f872d837d3c90ed2e05227bed711af5671a6fd1c9f7d7e91c911a61f155e99da'
'project.yaml src/latent-diffusion/experiments/pretrained_models https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1 9d6ad53c5dafeb07200fb712db14b813b527edd262bc80ea136777bdb41be2ba'

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB