diff --git a/docs/5.gradio-interface.md b/docs/5.gradio-interface.md index 3bfdb6f..6026ed3 100644 --- a/docs/5.gradio-interface.md +++ b/docs/5.gradio-interface.md @@ -92,6 +92,58 @@ The Gradio Image Lab is a central location to access image enhancers and upscale Please see the [Image Enhancers](6.image_enhancers.md) section to learn more about how to use these tools. +## Scene2Image +--- + +![](../images/gradio/gradio-s2i.png) + +Gradio Scene2Image allows you to define layers of images in a markdown-like syntax. + +> Would it be possible to have a layers system where we could do have +foreground, mid, and background objects which relate to one another and +share the style? So we could say generate a landscape, one another layer +generate a castle, and on another layer generate a crowd of people. + +You write a a multi-line prompt that looks like markdown, where each section declares one layer. +It is hierarchical, so each layer can have their own child layers. +In the frontend you can find a brief documentation for the syntax, examples and reference for the various arguments. +Here a summary: + +Markdown headings, e.g. '# layer0', define layers. +The content of sections define the arguments for image generation. +Arguments are defined by lines of the form 'arg:value' or 'arg=value'. + +Layers are hierarchical, i.e. each layer can contain more layers. +The number of '#' increases in the headings of a child layers. +Child layers are blended together by their image masks, like layers in image editors. +By default alpha composition is used for blending. +Other blend modes from [ImageChops](https://pillow.readthedocs.io/en/stable/reference/ImageChops.html) can also be used. + +Sections with "prompt" and child layers invoke Image2Image, without child layers they invoke Text2Image. +The result of blending child layers will be the input for Image2Image. + +Without "prompt" they are just images, useful for mask selection, image composition, etc. +Images can be initialized with "color", resized with "resize" and their position specified with "pos". +Rotation and rotation center are "rotation" and "center". + +Mask can automatically be selected by color, color at pixels of the image, or by estimated depth. + +You can chose between two different depth estimation models, see frontend reference for name of arguments. +[Monocular depth estimation](https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter) can be selected as depth model `0`. +[MiDaS depth estimation](https://huggingface.co/spaces/pytorch/MiDaS), used by default, can be selected as depth model `1`. + +Depth estimation can be used for traditional 3d reconstruction. +Using `transform3d=True` the pixels of an image can be rendered from another perspective or with a different field of view. +For this you specify pose and field of view that corresponds to the input image and a desired output pose and field of view. +The poses describe the camera position and orientation as x,y,z,rotate_x,rotate_y,rotate_z tuple with angles describing rotations around axes in degrees. +The camera coordinate system is the pinhole camera as described and pictured in [OpenCV "Camera Calibration and 3D Reconstruction" documentation](https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html). + +When the camera pose `transform3d_from_pose` where the input image was taken is not specified, the camera pose `transform3d_to_pose` to which the image is to be transformed is in terms of the input camera coordinate system: +Walking forwards one depth unit in the input image corresponds to a position `0,0,1`. +Walking to the right is something like `1,0,0`. +Going downwards is then `0,1,0`. + + ## Gradio Optional Customizations --- diff --git a/entrypoint.sh b/entrypoint.sh index c1d9043..b595a82 100755 --- a/entrypoint.sh +++ b/entrypoint.sh @@ -47,6 +47,7 @@ mkdir -p $MODEL_DIR MODEL_FILES=( 'model.ckpt models/ldm/stable-diffusion-v1 https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556' 'GFPGANv1.3.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth c953a88f2727c85c3d9ae72e2bd4846bbaf59fe6972ad94130e23e7017524a70' + 'GFPGANv1.4.pth src/gfpgan/experiments/pretrained_models https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth e2cd4703ab14f4d01fd1383a8a8b266f9a5833dacee8e6a79d3bf21a1b6be5ad' 'RealESRGAN_x4plus.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth 4fa0d38905f75ac06eb49a7951b426670021be3018265fd191d2125df9d682f1' 'RealESRGAN_x4plus_anime_6B.pth src/realesrgan/experiments/pretrained_models https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth f872d837d3c90ed2e05227bed711af5671a6fd1c9f7d7e91c911a61f155e99da' 'project.yaml src/latent-diffusion/experiments/pretrained_models https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1 9d6ad53c5dafeb07200fb712db14b813b527edd262bc80ea136777bdb41be2ba' diff --git a/images/gradio/gradio-s2i.png b/images/gradio/gradio-s2i.png new file mode 100644 index 0000000..84dbab0 Binary files /dev/null and b/images/gradio/gradio-s2i.png differ