Edit images

The Imagen lets you edit images in seconds, using text prompts, masks, and existing images to guide the edits.

View Imagen for Editing and Customization model card

Supported model versions

Imagen API supports the following models:

  • imagen-3.0-capability-001

For more information about the features that the model supports, see Imagen models.

HTTP request

curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict \ -d '{  "instances": [  {  "referenceImages": [  {  "referenceType": "REFERENCE_TYPE_RAW",  "referenceId": 1,  "referenceImage": {  "bytesBase64Encoded": string  }  },  {  "referenceType": "REFERENCE_TYPE_MASK",  "referenceId": 2,  "referenceImage": {  "bytesBase64Encoded": string  },  "maskImageConfig": {  "maskMode": "MASK_MODE_USER_PROVIDED"  }  }  ],  "prompt": string  }  ],  "parameters": {  "addWatermark": boolean,  "baseSteps": integer,  "editMode": string,  "guidanceScale": integer,  "includeRaiReason": boolean,  "includeSafetyAttributes": boolean,  "language": string,  "negativePrompt": string,  "outputOptions": {  "mimeType": string,  "compressionQuality": integer  },  "personGeneration": string,  "safetySetting": string,  "sampleCount": integer,  "seed": integer,  "storageUri": string  } }' 

Instances

Instances
prompt

string

Optional. The text prompt for the image. If a prompt isn't specified, the model fills in content from the image context.

referenceImages

List of ReferenceImage objects.

Required. For mask editing, exactly two reference images must be specified, one with REFERENCE_TYPE_RAW, and one with REFERENCE_TYPE_MASK.

referenceImages object

The referenceImages object describes the image assets for Imagen to edit.

Parameters
referenceType

string

Required. The type of reference image. One of the following:

  • REFERENCE_TYPE_RAW: The base image to edit.
  • REFERENCE_TYPE_MASK: The mask image, whose non-zero values indicate where to edit the base image.
referenceId

integer

Required. A unique identifier for the reference image. Not used for masked editing.

referenceImage.bytesBase64Encoded

string

Required. Base64-encoded image bytes. Accepts PNG, JPEG, GIF, and BMP files. The maximum size is 20MB after transcoding to PNG. If you provide a mask image, it must be the same dimensions as the base image.

maskImageConfig.maskMode

string

Required when referenceType is REFERENCE_TYPE_MASK. Must be one of the following:

  • MASK_MODE_USER_PROVIDED: Use the mask from referenceImage.bytesBase64Encoded.
  • MASK_MODE_BACKGROUND: Use an auto-generated mask from background segmentation.
  • MASK_MODE_FOREGROUND: Use an auto-generated mask from foreground segmentation.
  • MASK_MODE_SEMANTIC: Use an auto-generated mask from semantic segmentation with the given mask class.
maskImageConfig.dilation

float

Optional. Range: [0, 1]. The percentage of image width to dilate (grow) the mask by. This can help compensate for imprecise masks. For best results, we recommend the following maskImageConfig.maskMode settings, we recommend the listed values:

  • EDIT_MODE_INPAINT_INSERTION: 0.01
  • EDIT_MODE_INPAINT_REMOVAL: 0.01
  • EDIT_MODE_BGSWAP: 0.0
  • EDIT_MODE_OUTPAINT: 0.01-0.03
maskImageConfig.maskClasses

list[integer]

Optional. Mask classes for MASK_MODE_SEMANTIC mode.

Parameters

Parameters
addWatermark

bool

Optional. Add an invisible watermark to the generated images.

The default value is true.

baseSteps

integer

Optional. The number of sampling steps. A higher value has better image quality, while a lower value has better latency. Defaults to 75.

For smaller mask areas or for removal or insert modes, use 16 - 35 steps to reduce latency while returning a similar level of quality.

editMode

string

Required for mask editing.

An enum with one of the following values:

  • EDIT_MODE_INPAINT_REMOVAL: Remove objects and fill in the image background in the mask area.
  • EDIT_MODE_INPAINT_INSERTION: Add objects from a given prompt.
  • EDIT_MODE_BGSWAP: Add background content in the mask area, while preserving the object content in the unmasked area. Useful for product editing.
  • EDIT_MODE_OUTPAINT: Extends the image into the mask area. Unlike EDIT_MODE_BGSWAP, this will generate object completion for partial objects at the image boundary.
guidanceScale

integer

Optional. Controls how much the model adheres to the text prompt. Large values increase output and prompt alignment, but might compromise image quality.

Accepted range: 0 - 500

Default: 60 for insert mode, 75 for remove, bgswap, outpaint.

includeRaiReason

boolean

Optional. Whether to include a safety reason for filtered images in the response. The default value is false.

includeSafetyAttributes

boolean

Optional. Whether to report the safety scores of each image in the response. The default value is false.

language

string

Optional. The language code that corresponds to your text prompt language. The following values are supported:

  • "auto": Automatic detection. If Imagen detects a supported language, the prompt and an optional negative prompt are translated to English. If the language detected isn't supported, Imagen uses the input text verbatim, which might result in an unexpected output. No error code is returned.
  • "en": English (if omitted, the default value)
  • "zh" or "zh-CN": Chinese (simplified)
  • "zh-TW": Chinese (traditional)
  • "hi": Hindi
  • "ja": Japanese
  • "ko": Korean
  • "pt": Portuguese
  • "es": Spanish

language is supported only by imagen-3.0-capability-001.

negativePrompt

string

Optional. A description of what to discourage in the generated images.

outputOptions

outputOptions

Optional. Describes the output image format in an outputOptions object.

personGeneration

string

Optional. Allow generation of people by the model. The following values are supported:

  • "dont_allow": Disallow the inclusion of people or faces in images.
  • "allow_adult": Allow generation of adults only.
  • "allow_all": Allow generation of people of all ages.

For mask-based editing personGeneration defaults to allow_adult. For mask-free editing, personGeneration defaults to allow_adult.

sampleCount

integer

Optional. The number of images to generate. The default value is 4.

seed

Uint32

Optional. The random seed for image generation. This isn't available when addWatermark is set to true.

safetySetting

string

Optional. Adds a filter level to safety filtering. The following values are supported:

  • "block_low_and_above": Strongest filtering level, most strict blocking. Deprecated value: "block_most".
  • "block_medium_and_above": Block some problematic prompts and responses. Deprecated value: "block_some".
  • "block_only_high": Reduces the number of requests blocked due to safety filters. May increase objectionable content generated by Imagen. Deprecated value: "block_few".
  • "block_none": Block very few problematic prompts and responses. Access to this feature is restricted. Previous field value: "block_fewest".

The default value is "block_medium_and_above".

safetySetting is supported only by imagen-3.0-capability-001.

storageUri

string

Optional. The Cloud Storage URI to store the generated images.

Output options object

The outputOptions object describes the image output.

Parameters
outputOptions.mimeType

string

Optional. The image format that the output should be saved as. The following values are supported:

  • "image/png": Save as a PNG image
  • "image/jpeg": Save as a JPEG image

The default value is "image/png".

outputOptions.compressionQuality

integer

Optional. The level of compression if the output type is "image/jpeg". Accepted values are 0 through 100. The default value is 75.

Sample request

REST

Before using any of the request data, make the following replacements:

  • REGION: The region that your project is located in. For more information about supported regions, see Generative AI on Vertex AI locations.
  • PROJECT_ID: Your Google Cloud project ID.
  • TEXT_PROMPT: Optional. A text prompt to guide the images that the model generates. For best results, use a description of the masked area and avoid single-word prompts. For example, use "a cute corgi" instead of "corgi".
  • B64_BASE_IMAGE: A base64-encoded image of the image being edited that is 10MB or less in size. For more information about base64-encoding, see Base64 encode and decode files.
  • B64_MASK_IMAGE: A base64-encoded black and white mask image that is 10MB or less in size.
  • MASK_DILATION: Optional. A float value between 0 and 1, inclusive, that represents the percentage of the image width to grow the mask by. Using dilation helps compensate for imprecise masks. We recommend a value of 0.01.
  • EDIT_STEPS: Optional. An integer that represents the number of sampling steps. A higher value offers better image quality, a lower value offers better latency.

    We recommend that you try 35 steps to start. If the quality doesn't meet your requirements, then we recomment increasing the value towards an upper limit of 75.

  • SAMPLE_COUNT: Optional. An integer that describes the number of images to generate. The accepted range of values is 1-4. The default value is 4.

HTTP method and URL:

POST https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

 { "instances": [ { "prompt": "TEXT_PROMPT", "referenceImages": [ { "referenceType": "REFERENCE_TYPE_RAW", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "B64_BASE_IMAGE" } }, { "referenceType": "REFERENCE_TYPE_MASK", "referenceImage": { "bytesBase64Encoded": "B64_MASK_IMAGE" }, "maskImageConfig": { "maskMode": "MASK_MODE_USER_PROVIDED", "dilation": MASK_DILATION } } ] } ], "parameters": { "editConfig": { "baseSteps": EDIT_STEPS }, "editMode": "EDIT_MODE_INPAINT_INSERTION", "sampleCount": SAMPLE_COUNT } } 

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
 { "predictions": [ { "bytesBase64Encoded": "BASE64_IMG_BYTES", "mimeType": "image/png" }, { "mimeType": "image/png", "bytesBase64Encoded": "BASE64_IMG_BYTES" } ] } 

Class IDs

Use the following object class IDs to automatically create an image mask based on specific objects.

Class ID (class_id) Object
0 backpack
1 umbrella
2 bag
3 tie
4 suitcase
5 case
6 bird
7 cat
8 dog
9 horse
10 sheep
11 cow
12 elephant
13 bear
14 zebra
15 giraffe
16 animal (other)
17 microwave
18 radiator
19 oven
20 toaster
21 storage tank
22 conveyor belt
23 sink
24 refrigerator
25 washer dryer
26 fan
27 dishwasher
28 toilet
29 bathtub
30 shower
31 tunnel
32 bridge
33 pier wharf
34 tent
35 building
36 ceiling
37 laptop
38 keyboard
39 mouse
40 remote
41 cell phone
42 television
43 floor
44 stage
45 banana
46 apple
47 sandwich
48 orange
49 broccoli
50 carrot
51 hot dog
52 pizza
53 donut
54 cake
55 fruit (other)
56 food (other)
57 chair (other)
58 armchair
59 swivel chair
60 stool
61 seat
62 couch
63 trash can
64 potted plant
65 nightstand
66 bed
67 table
68 pool table
69 barrel
70 desk
71 ottoman
72 wardrobe
73 crib
74 basket
75 chest of drawers
76 bookshelf
77 counter (other)
78 bathroom counter
79 kitchen island
80 door
81 light (other)
82 lamp
83 sconce
84 chandelier
85 mirror
86 whiteboard
87 shelf
88 stairs
89 escalator
90 cabinet
91 fireplace
92 stove
93 arcade machine
94 gravel
95 platform
96 playingfield
97 railroad
98 road
99 snow
100 sidewalk pavement
101 runway
102 terrain
103 book
104 box
105 clock
106 vase
107 scissors
108 plaything (other)
109 teddy bear
110 hair dryer
111 toothbrush
112 painting
113 poster
114 bulletin board
115 bottle
116 cup
117 wine glass
118 knife
119 fork
120 spoon
121 bowl
122 tray
123 range hood
124 plate
125 person
126 rider (other)
127 bicyclist
128 motorcyclist
129 paper
130 streetlight
131 road barrier
132 mailbox
133 cctv camera
134 junction box
135 traffic sign
136 traffic light
137 fire hydrant
138 parking meter
139 bench
140 bike rack
141 billboard
142 sky
143 pole
144 fence
145 railing banister
146 guard rail
147 mountain hill
148 rock
149 frisbee
150 skis
151 snowboard
152 sports ball
153 kite
154 baseball bat
155 baseball glove
156 skateboard
157 surfboard
158 tennis racket
159 net
160 base
161 sculpture
162 column
163 fountain
164 awning
165 apparel
166 banner
167 flag
168 blanket
169 curtain (other)
170 shower curtain
171 pillow
172 towel
173 rug floormat
174 vegetation
175 bicycle
176 car
177 autorickshaw
178 motorcycle
179 airplane
180 bus
181 train
182 truck
183 trailer
184 boat ship
185 slow wheeled object
186 river lake
187 sea
188 water (other)
189 swimming pool
190 waterfall
191 wall
192 window
193 window blind

What's next