Gemini 可以生成及處理圖片,並以對話方式提供相關資訊。你可以使用文字、圖片或兩者組合提示 Gemini,以前所未有的控制程度建立、編輯及反覆調整圖像:
- Text-to-Image:根據簡單或複雜的文字描述生成高品質圖片。
- 圖片 + 圖片生成文字 (編輯):提供圖片並使用文字提示新增、移除或修改元素、變更風格,或調整色彩分級。
- 多張圖片生成圖片 (合成和風格轉換):使用多張輸入圖片合成新場景,或將一張圖片的風格套用至另一張圖片。
- 反覆修正:透過對話逐步修正圖片,進行微調,直到滿意為止。
- 高擬真度文字算繪:準確生成含有清晰易讀文字的圖片,並將文字妥善放置在適當位置,非常適合用於標誌、圖表和海報。
所有生成的圖像都會加上 SynthID 浮水印。
圖像生成 (文字轉圖像)
以下程式碼示範如何根據描述性提示生成圖片。
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
prompt = (
"Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"
)
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[prompt],
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("generated_image.png")
JavaScript
import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";
async function main() {
const ai = new GoogleGenAI({});
const prompt =
"Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme";
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image-preview",
contents: prompt,
});
for (const part of response.candidates[0].content.parts) {
if (part.text) {
console.log(part.text);
} else if (part.inlineData) {
const imageData = part.inlineData.data;
const buffer = Buffer.from(imageData, "base64");
fs.writeFileSync("gemini-native-image.png", buffer);
console.log("Image saved as gemini-native-image.png");
}
}
}
main();
Go
package main
import (
"context"
"fmt"
"os"
"google.golang.org/genai"
)
func main() {
ctx := context.Background()
client, err := genai.NewClient(ctx, nil)
if err != nil {
log.Fatal(err)
}
result, _ := client.Models.GenerateContent(
ctx,
"gemini-2.5-flash-image-preview",
genai.Text("Create a picture of a nano banana dish in a " +
" fancy restaurant with a Gemini theme"),
)
for _, part := range result.Candidates[0].Content.Parts {
if part.Text != "" {
fmt.Println(part.Text)
} else if part.InlineData != nil {
imageBytes := part.InlineData.Data
outputFilename := "gemini_generated_image.png"
_ = os.WriteFile(outputFilename, imageBytes, 0644)
}
}
}
REST
curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
]
}]
}' \
| grep -o '"data": "[^"]*"' \
| cut -d'"' -f4 \
| base64 --decode > gemini-native-image.png

圖像編輯 (文字和圖像轉圖像)
提醒:對於上傳的所有圖片,請確認您擁有必要權限。 請勿生成侵犯他人權利的內容,包括會造成欺騙、騷擾或傷害的影片或圖片。使用這項生成式 AI 服務時,須遵守《使用限制政策》。
如要編輯圖片,請先加入圖片做為輸入內容。 以下範例示範如何上傳 Base64 編碼的圖片。 如要瞭解多張圖片、較大的酬載和支援的 MIME 類型,請參閱「圖片理解」頁面。
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
prompt = (
"Create a picture of my cat eating a nano-banana in a "
"fancy restaurant under the Gemini constellation",
)
image = Image.open("/path/to/cat_image.png")
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[prompt, image],
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("generated_image.png")
JavaScript
import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";
async function main() {
const ai = new GoogleGenAI({});
const imagePath = "path/to/cat_image.png";
const imageData = fs.readFileSync(imagePath);
const base64Image = imageData.toString("base64");
const prompt = [
{ text: "Create a picture of my cat eating a nano-banana in a" +
"fancy restaurant under the Gemini constellation" },
{
inlineData: {
mimeType: "image/png",
data: base64Image,
},
},
];
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image-preview",
contents: prompt,
});
for (const part of response.candidates[0].content.parts) {
if (part.text) {
console.log(part.text);
} else if (part.inlineData) {
const imageData = part.inlineData.data;
const buffer = Buffer.from(imageData, "base64");
fs.writeFileSync("gemini-native-image.png", buffer);
console.log("Image saved as gemini-native-image.png");
}
}
}
main();
Go
package main
import (
"context"
"fmt"
"os"
"google.golang.org/genai"
)
func main() {
ctx := context.Background()
client, err := genai.NewClient(ctx, nil)
if err != nil {
log.Fatal(err)
}
imagePath := "/path/to/cat_image.png"
imgData, _ := os.ReadFile(imagePath)
parts := []*genai.Part{
genai.NewPartFromText("Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation"),
&genai.Part{
InlineData: &genai.Blob{
MIMEType: "image/png",
Data: imgData,
},
},
}
contents := []*genai.Content{
genai.NewContentFromParts(parts, genai.RoleUser),
}
result, _ := client.Models.GenerateContent(
ctx,
"gemini-2.5-flash-image-preview",
contents,
)
for _, part := range result.Candidates[0].Content.Parts {
if part.Text != "" {
fmt.Println(part.Text)
} else if part.InlineData != nil {
imageBytes := part.InlineData.Data
outputFilename := "gemini_generated_image.png"
_ = os.WriteFile(outputFilename, imageBytes, 0644)
}
}
}
REST
IMG_PATH=/path/to/cat_image.jpeg
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
B64FLAGS="--input"
else
B64FLAGS="-w0"
fi
IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)
curl -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d "{
\"contents\": [{
\"parts\":[
{\"text\": \"'Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"},
{
\"inline_data\": {
\"mime_type\":\"image/jpeg\",
\"data\": \"$IMG_BASE64\"
}
}
]
}]
}" \
| grep -o '"data": "[^"]*"' \
| cut -d'"' -f4 \
| base64 --decode > gemini-edited-image.png

其他圖像生成模式
Gemini 支援其他圖像互動模式,具體取決於提示結構和脈絡,包括:
- 文字生成圖片和文字 (交錯):輸出圖片和相關文字。
- 提示範例:「Generate an illustrated recipe for a paella.」(生成西班牙海鮮飯的插圖食譜)。
- 圖片和文字轉圖片和文字 (交錯):使用輸入的圖片和文字,建立新的相關圖片和文字。
- 提示範例:(附上附家具的房間圖片)「我的空間適合什麼顏色的沙發?可以更新圖片嗎?」
- 多輪圖像編輯 (對話):以對話方式持續生成及編輯圖像。
- 提示範例:[上傳藍色汽車的圖片。] 「將這輛車變成敞篷車。」「Now change the color to yellow.」(現在將顏色改為黃色)。
提示撰寫指南和策略
如要精通 Gemini 2.5 Flash 圖片生成功能,請先掌握一項基本原則:
描述場景,不要只列出關鍵字。 這項模型的核心優勢在於深入理解語言,相較於不相關的字詞清單,敘事性描述段落幾乎都能產生更優質、更連貫的圖片。
生成圖像的提示
下列策略可協助您建立有效的提示,生成所需圖片。
1. 擬真場景
如要生成真實感十足的圖片,請使用攝影術語。提及攝影機角度、鏡頭類型、照明和細節,引導模型生成逼真的結果。
範本
A photorealistic [shot type] of [subject], [action or expression], set in
[environment]. The scene is illuminated by [lighting description], creating
a [mood] atmosphere. Captured with a [camera/lens details], emphasizing
[key textures and details]. The image should be in a [aspect ratio] format.
提示詞
A photorealistic close-up portrait of an elderly Japanese ceramicist with
deep, sun-etched wrinkles and a warm, knowing smile. He is carefully
inspecting a freshly glazed tea bowl. The setting is his rustic,
sun-drenched workshop. The scene is illuminated by soft, golden hour light
streaming through a window, highlighting the fine texture of the clay.
Captured with an 85mm portrait lens, resulting in a soft, blurred background
(bokeh). The overall mood is serene and masterful. Vertical portrait
orientation.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('photorealistic_example.png')
image.show()

2. 風格化插圖和貼紙
如要建立貼紙、圖示或素材資源,請明確指定樣式,並要求透明背景。
範本
A [style] sticker of a [subject], featuring [key characteristics] and a
[color palette]. The design should have [line style] and [shading style].
The background must be transparent.
提示詞
A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's
munching on a green bamboo leaf. The design features bold, clean outlines,
simple cel-shading, and a vibrant color palette. The background must be white.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('red_panda_sticker.png')
image.show()

3. 圖片中的文字準確
Gemini 擅長顯示文字,清楚說明文字、字型樣式 (描述性) 和整體設計。
範本
Create a [image type] for [brand/concept] with the text "[text to render]"
in a [font style]. The design should be [style description], with a
[color scheme].
提示詞
Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
The text should be in a clean, bold, sans-serif font. The design should
feature a simple, stylized icon of a a coffee bean seamlessly integrated
with the text. The color scheme is black and white.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('logo_example.png')
image.show()

4. 產品模型和商業攝影
非常適合為電子商務、廣告或品牌宣傳製作乾淨俐落的專業產品照。
範本
A high-resolution, studio-lit product photograph of a [product description]
on a [background surface/description]. The lighting is a [lighting setup,
e.g., three-point softbox setup] to [lighting purpose]. The camera angle is
a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp
focus on [key detail]. [Aspect ratio].
提示詞
A high-resolution, studio-lit product photograph of a minimalist ceramic
coffee mug in matte black, presented on a polished concrete surface. The
lighting is a three-point softbox setup designed to create soft, diffused
highlights and eliminate harsh shadows. The camera angle is a slightly
elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with
sharp focus on the steam rising from the coffee. Square image.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('product_mockup.png')
image.show()

5. 極簡風和負空間設計
非常適合用於建立網站、簡報或行銷素材的背景,並在上面疊加文字。
範本
A minimalist composition featuring a single [subject] positioned in the
[bottom-right/top-left/etc.] of the frame. The background is a vast, empty
[color] canvas, creating significant negative space. Soft, subtle lighting.
[Aspect ratio].
提示詞
A minimalist composition featuring a single, delicate red maple leaf
positioned in the bottom-right of the frame. The background is a vast, empty
off-white canvas, creating significant negative space for text. Soft,
diffused lighting from the top left. Square image.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('minimalist_design.png')
image.show()

6. 連續圖像 (漫畫格 / 分鏡腳本)
以角色一致性和場景描述為基礎,為視覺敘事建立面板。
範本
A single comic book panel in a [art style] style. In the foreground,
[character description and action]. In the background, [setting details].
The panel has a [dialogue/caption box] with the text "[Text]". The lighting
creates a [mood] mood. [Aspect ratio].
提示詞
A single comic book panel in a gritty, noir art style with high-contrast
black and white inks. In the foreground, a detective in a trench coat stands
under a flickering streetlamp, rain soaking his shoulders. In the
background, the neon sign of a desolate bar reflects in a puddle. A caption
box at the top reads "The city was a tough place to keep secrets." The
lighting is harsh, creating a dramatic, somber mood. Landscape.
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents="A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape.",
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('comic_panel.png')
image.show()

編輯圖片的提示
這些範例說明如何提供圖片和文字提示,進行編輯、合成和風格轉移。
1. 新增及移除元素
提供圖片並說明變更內容。模型會與原始圖片的風格、光線和透視效果相符。
範本
Using the provided image of [subject], please [add/remove/modify] [element]
to/from the scene. Ensure the change is [description of how the change should
integrate].
提示詞
"Using the provided image of my cat, please add a small, knitted wizard hat
on its head. Make it look like it's sitting comfortably and matches the soft
lighting of the photo."
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Base image prompt: "A photorealistic picture of a fluffy ginger cat sitting on a wooden floor, looking directly at the camera. Soft, natural light from a window."
image_input = Image.open('/path/to/your/cat_photo.png')
text_input = """Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off."""
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[text_input, image_input],
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('cat_with_hat.png')
image.show()
輸入 |
輸出 |
![]() |
![]() |
2. 局部重繪 (語意遮蓋)
以對話方式定義「遮罩」,編輯圖片的特定部分,同時保留其他部分。
範本
Using the provided image, change only the [specific element] to [new
element/description]. Keep everything else in the image exactly the same,
preserving the original style, lighting, and composition.
提示詞
"Using the provided image of a living room, change only the blue sofa to be
a vintage, brown leather chesterfield sofa. Keep the rest of the room,
including the pillows on the sofa and the lighting, unchanged."
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Base image prompt: "A wide shot of a modern, well-lit living room with a prominent blue sofa in the center. A coffee table is in front of it and a large window is in the background."
living_room_image = Image.open('/path/to/your/living_room.png')
text_input = """Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged."""
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[living_room_image, text_input],
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('living_room_edited.png')
image.show()
輸入 |
輸出 |
![]() |
![]() |
3. 風格轉換
提供圖片,要求模型以不同藝術風格重新創作圖片內容。
範本
Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements].
提示詞
"Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows."
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Base image prompt: "A photorealistic, high-resolution photograph of a busy city street in New York at night, with bright neon signs, yellow taxis, and tall skyscrapers."
city_image = Image.open('/path/to/your/city.png')
text_input = """Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows."""
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[city_image, text_input],
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('city_style_transfer.png')
image.show()
輸入 |
輸出 |
![]() |
![]() |
4. 進階合成:合併多張圖片
提供多張圖片做為情境,建立新的複合場景。非常適合製作產品模型或創意拼貼。
範本
Create a new image by combining the elements from the provided images. Take
the [element from image 1] and place it with/on the [element from image 2].
The final image should be a [description of the final scene].
提示詞
"Create a professional e-commerce fashion photo. Take the blue floral dress
from the first image and let the woman from the second image wear it.
Generate a realistic, full-body shot of the woman wearing the dress, with
the lighting and shadows adjusted to match the outdoor environment."
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Base image prompts:
# 1. Dress: "A professionally shot photo of a blue floral summer dress on a plain white background, ghost mannequin style."
# 2. Model: "Full-body shot of a woman with her hair in a bun, smiling, standing against a neutral grey studio background."
dress_image = Image.open('/path/to/your/dress.png')
model_image = Image.open('/path/to/your/model.png')
text_input = """Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment."""
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[dress_image, model_image, text_input],
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('fashion_ecommerce_shot.png')
image.show()
輸入 1 |
輸入 2 |
輸出 |
![]() |
![]() |
![]() |
5. 保留高保真細節
為確保編輯時保留重要細節 (例如臉部或標誌),請詳細描述這些細節,並一併提出編輯要求。
範本
Using the provided images, place [element from image 2] onto [element from
image 1]. Ensure that the features of [element from image 1] remain
completely unchanged. The added element should [description of how the
element should integrate].
提示詞
"Take the first image of the woman with brown hair, blue eyes, and a neutral
expression. Add the logo from the second image onto her black t-shirt.
Ensure the woman's face and features remain completely unchanged. The logo
should look like it's naturally printed on the fabric, following the folds
of the shirt."
Python
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client()
# Base image prompts:
# 1. Woman: "A professional headshot of a woman with brown hair and blue eyes, wearing a plain black t-shirt, against a neutral studio background."
# 2. Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle."
woman_image = Image.open('/path/to/your/woman.png')
logo_image = Image.open('/path/to/your/logo.png')
text_input = """Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt."""
# Generate an image from a text prompt
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[woman_image, logo_image, text_input],
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
image.save('woman_with_logo.png')
image.show()
輸入 1 |
輸入 2 |
輸出 |
![]() |
![]() |
![]() |
最佳做法
如要讓成果從良好提升至優異,請將下列專業策略納入工作流程。
- 具體說明:提供的詳細資訊越多,您就越能掌控生成結果。請改為詳細描述,例如「精緻的精靈板甲,刻有銀葉圖案,高領,肩甲形狀像獵鷹翅膀」,而不是「奇幻盔甲」。
- 提供背景資訊和意圖:說明圖片的用途。模型對背景資訊的理解程度會影響最終輸出內容。舉例來說,「為高檔簡約護膚品牌設計標誌」比「設計標誌」更能產生理想結果。
- 反覆測試及修正:請勿期待第一次就能生成完美圖片。運用模型的對話性質進行小幅變更。接著輸入「這很棒,但可以讓光線暖一點嗎?」或「維持所有設定,但將角色的表情改為更嚴肅。」等提示。
- 使用逐步指示:如果場景複雜且包含許多元素,請將提示分成多個步驟。「首先,請在黎明時分製作寧靜的霧林背景。接著,在前景中加入長滿青苔的古老石祭壇。 最後,將一把發光的劍放在祭壇上。」
- 使用「語意負面提示」:不要說「沒有車輛」,而是正面描述想要的場景:「空蕩蕩的荒涼街道,沒有任何交通跡象」。
- 控制攝影機:使用攝影和電影語言控制構圖。例如
wide-angle shot
、macro shot
、low-angle perspective
。
限制
- 為獲得最佳成效,請使用下列語言:英文、西班牙文 (墨西哥)、日文、中文 (中國)、印地文。
- 圖像生成功能不支援音訊或影片輸入內容。
- 模型不一定會輸出使用者明確要求的圖片數量。
- 模型最多可接受 3 張圖片做為輸入內容,效果最佳。
- 為圖片生成文字時,建議先生成文字,然後要求 Gemini 根據文字生成圖片。
- 目前歐洲經濟區、瑞士和英國不支援上傳兒童圖片。
- 所有生成的圖像都會加上 SynthID 浮水印。
Imagen 的適用時機
除了使用 Gemini 內建的圖像生成功能,你也可以透過 Gemini API 存取專門的圖像生成模型 Imagen。
屬性 | Imagen | Gemini 原生圖片 |
---|---|---|
優勢 | 這是目前最強大的圖像生成模型,建議用於生成逼真圖像、提升清晰度,以及改善拼字和排版。 | 預設建議。 提供無與倫比的彈性、情境理解能力,以及簡單的無遮罩編輯功能。獨家支援多輪對話式編輯。 |
可用性 | 正式發布版 | 預覽版 (可供正式版使用) |
延遲時間 | 低。針對近乎即時的效能進行最佳化。 | 較高。進階功能需要更多運算資源。 |
費用 | 適合執行特定工作,且成本效益高。每張圖片 $0.02 美元至 $0.12 美元 | 以權杖為基礎的定價方式。圖片輸出:每 100 萬個權杖 $30 美元 (每張圖片的權杖數為 1290 個,大小上限為 1024x1024 像素) |
建議工作 |
|
|
Imagen 4 是生成圖像的首選模型。如要處理進階用途或需要最佳圖片品質,請選擇 Imagen 4 Ultra (請注意,一次只能生成一張圖片)。