Merge remote-tracking branch 'origin/dev' into dev

2024-12-14 22:13:41 +03:00 · 2022-10-02 16:27:43 -07:00 · 2022-10-02 16:27:43 -07:00 · 8239c328fd
commit 8239c328fd
parent f5254de10a a06c4081d9
3 changed files with 53 additions and 0 deletions
--- a/docs/5.gradio-interface.md
+++ b/docs/5.gradio-interface.md
@ -92,6 +92,58 @@ The Gradio Image Lab is a central location to access image enhancers and upscale
 Please see the [Image Enhancers](6.image_enhancers.md) section to learn more about how to use these tools.


+## Scene2Image
+---
+
+![](../images/gradio/gradio-s2i.png)
+
+Gradio Scene2Image allows you to define layers of images in a markdown-like syntax.
+
+> Would it be possible to have a layers system where we could do have
+foreground, mid, and background objects which relate to one another and
+share the style? So we could say generate a landscape, one another layer
+generate a castle, and on another layer generate a crowd of people.
+
+You write a a multi-line prompt that looks like markdown, where each section declares one layer.
+It is hierarchical, so each layer can have their own child layers.
+In the frontend you can find a brief documentation for the syntax, examples and reference for the various arguments.
+Here a summary:
+
+Markdown headings, e.g. '# layer0', define layers. 
+The content of sections define the arguments for image generation. 
+Arguments are defined by lines of the form 'arg:value' or 'arg=value'.
+
+Layers are hierarchical, i.e. each layer can contain more layers. 
+The number of '#' increases in the headings of a child layers.
+Child layers are blended together by their image masks, like layers in image editors.
+By default alpha composition is used for blending. 
+Other blend modes from [ImageChops](https://pillow.readthedocs.io/en/stable/reference/ImageChops.html) can also be used.
+
+Sections with "prompt" and child layers invoke Image2Image, without child layers they invoke Text2Image. 
+The result of blending child layers will be the input for Image2Image.
+
+Without "prompt" they are just images, useful for mask selection, image composition, etc.
+Images can be initialized with "color", resized with "resize" and their position specified with "pos".
+Rotation and rotation center are "rotation" and "center". 
+
+Mask can automatically be selected by color, color at pixels of the image, or by estimated depth.
+
+You can chose between two different depth estimation models, see frontend reference for name of arguments.
+[Monocular depth estimation](https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter) can be selected as depth model `0`.
+[MiDaS depth estimation](https://huggingface.co/spaces/pytorch/MiDaS), used by default, can be selected as depth model `1`.
+
+Depth estimation can be used for traditional 3d reconstruction.
+Using `transform3d=True` the pixels of an image can be rendered from another perspective or with a different field of view.
+For this you specify pose and field of view that corresponds to the input image and a desired output pose and field of view.
+The poses describe the camera position and orientation as x,y,z,rotate_x,rotate_y,rotate_z tuple with angles describing rotations around axes in degrees.
+The camera coordinate system is the pinhole camera as described and pictured in [OpenCV "Camera Calibration and 3D Reconstruction" documentation](https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html).
+
+When the camera pose `transform3d_from_pose` where the input image was taken is not specified, the camera pose `transform3d_to_pose` to which the image is to be transformed is in terms of the input camera coordinate system:
+Walking forwards one depth unit in the input image corresponds to a position `0,0,1`. 
+Walking to the right is something like `1,0,0`. 
+Going downwards is then `0,1,0`.
+
+
 ## Gradio Optional Customizations
 ---

--- a/entrypoint.sh
+++ b/entrypoint.sh
@ -47,6 +47,7 @@ mkdir -p $MODEL_DIR
 MODEL_FILES=(
    'model.ckpt models/ldm/stable-diffusion-v1 https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556'
    'GFPGANv1.3.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth c953a88f2727c85c3d9ae72e2bd4846bbaf59fe6972ad94130e23e7017524a70'
+    'GFPGANv1.4.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth e2cd4703ab14f4d01fd1383a8a8b266f9a5833dacee8e6a79d3bf21a1b6be5ad'
    'RealESRGAN_x4plus.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth 4fa0d38905f75ac06eb49a7951b426670021be3018265fd191d2125df9d682f1'
    'RealESRGAN_x4plus_anime_6B.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth f872d837d3c90ed2e05227bed711af5671a6fd1c9f7d7e91c911a61f155e99da'
    'project.yaml src/latent-diffusion/experiments/pretrained_models https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1 9d6ad53c5dafeb07200fb712db14b813b527edd262bc80ea136777bdb41be2ba'
--- a/images/gradio/gradio-s2i.png
+++ b/images/gradio/gradio-s2i.png