Skip to content

Commit 706471d

Browse files
committed
Add multiple modalities in a single message
1 parent f784983 commit 706471d

File tree

1 file changed

+60
-6
lines changed

1 file changed

+60
-6
lines changed

README.md

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,41 @@ Details:
232232

233233
* As described [above](#customizing-the-role-per-prompt), you can also supply a `role` value in these objects, so that the full form is `{ role, type, content }`. However, for now, using any role besides the default `"user"` role with an image or audio prompt will reject with a `"NotSupportedError"` `DOMException`. (As we explore multimodal outputs, this restriction might be lifted in the future.)
234234

235+
### Multiple modalities in a single message
236+
237+
Consider a prompt such as `Here is an image: <<<image>>>. Please describe it.`. This is intended to be a single prompt from the user role. To express this, you can use an array for the `content` value:
238+
239+
```js
240+
const response = await session.prompt({
241+
type: "user",
242+
content: [
243+
"Here is an image: ",
244+
{ type: "image", content: imageBytes },
245+
". Please describe it."
246+
]
247+
});
248+
```
249+
250+
This has _different semantics_ than prompting with multiple user messages:
251+
252+
```js
253+
// THESE ARE PROBABLY NOT WHAT YOU WANT
254+
const probablyWrongResponse = await session.prompt([
255+
"Here is an image": ",
256+
{ type: "image", content: imageBytes },
257+
". Please describe it."
258+
]);
259+
260+
// Equivalent (and also probably wrong)
261+
const probablyWrongResponse2 = await session.prompt([
262+
{ role: "user", type: "text", content: "Here is an image: " },
263+
{ role: "user", type: "image", content: imageBytes },
264+
{ role: "user", type: "text", content: ". Please describe it." }
265+
]);
266+
```
267+
268+
Those examples involve three separate user messages, which the underlying model will likely interpret differently. (To see this, compare with [our above multi-user example](#customizing-the-role-per-prompt), or with how you react when someone texts you three messages in a row vs. a single message.)
269+
235270
### Structured output or JSON output
236271
237272
To help with programmatic processing of language model responses, the prompt API supports structured outputs defined by a JSON schema.
@@ -618,17 +653,36 @@ dictionary LanguageModelExpectedInput {
618653
619654
typedef (LanguageModelPrompt or sequence<LanguageModelPrompt>) LanguageModelPromptInput;
620655
621-
// Prompt lines
622-
623656
typedef (
624-
DOMString // interpreted as { role: "user", type: "text", content: providedValue }
625-
or LanguageModelPromptDict // canonical form
657+
// canonical form
658+
LanguageModelPromptDict
659+
// interpreted as { role: providedValue.role, content: [{ type: providedValue.type, content: providedValue.content }] }
660+
or LanguageModelPromptDictFlattened
661+
// interpreted as { role: "user", content: [{ type: "text", content: providedValue }] }
662+
or DOMString
626663
) LanguageModelPrompt;
627664
665+
typedef (
666+
// canonical form
667+
LanguageModelPromptContentDict
668+
// interpreted as { type: "text", content: providedValue }
669+
or DOMString
670+
) LanguageModelPromptContent;
671+
628672
dictionary LanguageModelPromptDict {
673+
LanguageModelPromptRole role = "user";
674+
required (LanguageModelPromptContent or sequence<LanguageModelPromptContent>) content;
675+
};
676+
677+
dictionary LanguageModelPromptDictFlattened {
629678
LanguageModelPromptRole role = "user";
630679
LanguageModelPromptType type = "text";
631-
required LanguageModelPromptContent content;
680+
required LanguageModelPromptContentValue content;
681+
};
682+
683+
dictionary LanguageModelPromptContentDict {
684+
LanguageModelPromptType type = "text";
685+
required LanguageModelPromptContentValue content;
632686
};
633687
634688
enum LanguageModelPromptRole { "system", "user", "assistant" };
@@ -640,7 +694,7 @@ typedef (
640694
or AudioBuffer
641695
or BufferSource
642696
or DOMString
643-
) LanguageModelPromptContent;
697+
) LanguageModelPromptContentValue;
644698
```
645699

646700
### Instruction-tuned versus base models

0 commit comments

Comments
 (0)