Skip to content

Commit afde1c4

Browse files
dsfacciniclaude
andauthored
Support FileUrl.force_download in AnthropicModel and OpenAIResponsesModel (#3694)
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 848e381 commit afde1c4

20 files changed

+1906
-103
lines changed

docs/input.md

Lines changed: 23 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -104,39 +104,34 @@ print(result.output)
104104

105105
## User-side download vs. direct file URL
106106

107-
When you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will typically send the URL directly to the model API so that the download happens on their side.
107+
When using one of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will default to sending the URL to the model provider, so the file is downloaded on their side.
108108

109-
Some model APIs do not support file URLs at all or for specific file types. In the following cases, Pydantic AI will download the file content and send it as part of the API request instead:
109+
Support for file URLs varies depending on type and provider:
110110

111-
- [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel]: `AudioUrl` and `DocumentUrl`
112-
- [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel]: All URLs
113-
- [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel]: `DocumentUrl` with media type `text/plain`
114-
- [`GoogleModel`][pydantic_ai.models.google.GoogleModel] using GLA (Gemini Developer API): All URLs except YouTube video URLs and files uploaded to the [Files API](https://ai.google.dev/gemini-api/docs/files).
115-
- [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel]: All URLs except S3 URLs, specifically starting with `s3://`.
111+
| Model | Send URL directly | Download and send bytes | Unsupported |
112+
|-------|-------------------|-------------------------|-------------|
113+
| [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel] | `ImageUrl` | `AudioUrl`, `DocumentUrl` | `VideoUrl` |
114+
| [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] | `ImageUrl`, `AudioUrl`, `DocumentUrl` || `VideoUrl` |
115+
| [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel] | `ImageUrl`, `DocumentUrl` (PDF) | `DocumentUrl` (`text/plain`) | `AudioUrl`, `VideoUrl` |
116+
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (Vertex) | All URL types |||
117+
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (GLA) | [YouTube](models/google.md#document-image-audio-and-video-input), [Files API](models/google.md#document-image-audio-and-video-input) | All other URLs ||
118+
| [`MistralModel`][pydantic_ai.models.mistral.MistralModel] | `ImageUrl`, `DocumentUrl` (PDF) || `AudioUrl`, `VideoUrl`, `DocumentUrl` (non-PDF) |
119+
| [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel] | S3 URLs (`s3://`) | `ImageUrl`, `DocumentUrl`, `VideoUrl` | `AudioUrl` |
116120

117-
If the model API supports file URLs but may not be able to download a file because of crawling or access restrictions, you can instruct Pydantic AI to download the file content and send that instead of the URL by enabling the `force_download` flag on the URL object. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request.
121+
A model API may be unable to download a file (e.g., because of crawling or access restrictions) even if it supports file URLs. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request. In such cases, you can instruct Pydantic AI to download the file content locally and send that instead of the URL by setting `force_download` on the URL object:
118122

119-
## Uploaded Files
120-
121-
Some model providers like Google's Gemini API support [uploading files](https://ai.google.dev/gemini-api/docs/files). You can upload a file to the model API using the client you can get from the provider and use the resulting URL as input:
123+
```py {title="force_download.py" test="skip" lint="skip"}
124+
from pydantic_ai import ImageUrl, AudioUrl, VideoUrl, DocumentUrl
122125

123-
```py {title="file_upload.py" test="skip"}
124-
from pydantic_ai import Agent, DocumentUrl
125-
from pydantic_ai.models.google import GoogleModel
126-
from pydantic_ai.providers.google import GoogleProvider
126+
ImageUrl(url='https://example.com/image.png', force_download=True)
127+
AudioUrl(url='https://example.com/audio.mp3', force_download=True)
128+
VideoUrl(url='https://example.com/video.mp4', force_download=True)
129+
DocumentUrl(url='https://example.com/doc.pdf', force_download=True)
130+
```
127131

128-
provider = GoogleProvider()
129-
file = provider.client.files.upload(file='pydantic-ai-logo.png')
130-
assert file.uri is not None
132+
## Uploaded Files
131133

132-
agent = Agent(GoogleModel('gemini-2.5-flash', provider=provider))
133-
result = agent.run_sync(
134-
[
135-
'What company is this logo from?',
136-
DocumentUrl(url=file.uri, media_type=file.mime_type),
137-
]
138-
)
139-
print(result.output)
140-
```
134+
Some model providers support passing URLs to files hosted on their platform:
141135

142-
`BedrockConverseModel` supports `s3://<bucket-name>/<object-key>` URIs, provided that the assumed role has the `s3:GetObject` permission. An optional `bucketOwner` query parameter must be specified if the bucket is not owned by the account making the request. For example: `s3://my-bucket/my-file.png?bucketOwner=123456789012`.
136+
- [`GoogleModel`][pydantic_ai.models.google.GoogleModel] supports the [Files API](models/google.md#document-image-audio-and-video-input) for uploading and referencing files.
137+
- [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel] supports `s3://<bucket-name>/<object-key>` URIs, provided that the assumed role has the `s3:GetObject` permission. An optional `bucketOwner` query parameter must be specified if the bucket is not owned by the account making the request. For example: `s3://my-bucket/my-file.png?bucketOwner=123456789012`.

docs/models/google.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,46 @@ agent = Agent(model)
199199

200200
## Document, Image, Audio, and Video Input
201201

202-
`GoogleModel` supports multi-modal input, including documents, images, audio, and video. See the [input documentation](../input.md) for details and examples.
202+
`GoogleModel` supports multi-modal input, including documents, images, audio, and video.
203+
204+
YouTube video URLs can be passed directly to Google models:
205+
206+
```py {title="youtube_input.py" test="skip" lint="skip"}
207+
from pydantic_ai import Agent, VideoUrl
208+
from pydantic_ai.models.google import GoogleModel
209+
210+
agent = Agent(GoogleModel('gemini-2.5-flash'))
211+
result = agent.run_sync(
212+
[
213+
'What is this video about?',
214+
VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
215+
]
216+
)
217+
print(result.output)
218+
```
219+
220+
Files can be uploaded via the [Files API](https://ai.google.dev/gemini-api/docs/files) and passed as URLs:
221+
222+
```py {title="file_upload.py" test="skip"}
223+
from pydantic_ai import Agent, DocumentUrl
224+
from pydantic_ai.models.google import GoogleModel
225+
from pydantic_ai.providers.google import GoogleProvider
226+
227+
provider = GoogleProvider()
228+
file = provider.client.files.upload(file='pydantic-ai-logo.png')
229+
assert file.uri is not None
230+
231+
agent = Agent(GoogleModel('gemini-2.5-flash', provider=provider))
232+
result = agent.run_sync(
233+
[
234+
'What company is this logo from?',
235+
DocumentUrl(url=file.uri, media_type=file.mime_type),
236+
]
237+
)
238+
print(result.output)
239+
```
240+
241+
See the [input documentation](../input.md) for more details and examples.
203242

204243
## Model settings
205244

pydantic_ai_slim/pydantic_ai/_mcp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def add_msg(
9191
'user',
9292
mcp_types.ImageContent(
9393
type='image',
94-
data=base64.b64encode(chunk.data).decode(),
94+
data=chunk.base64,
9595
mimeType=chunk.media_type,
9696
),
9797
)

pydantic_ai_slim/pydantic_ai/messages.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,10 @@ class BinaryContent:
474474
"""Binary content, e.g. an audio or image file."""
475475

476476
data: bytes
477-
"""The binary data."""
477+
"""The binary file data.
478+
479+
Use `.base64` to get the base64-encoded string.
480+
"""
478481

479482
_: KW_ONLY
480483

@@ -574,7 +577,12 @@ def identifier(self) -> str:
574577
@property
575578
def data_uri(self) -> str:
576579
"""Convert the `BinaryContent` to a data URI."""
577-
return f'data:{self.media_type};base64,{base64.b64encode(self.data).decode()}'
580+
return f'data:{self.media_type};base64,{self.base64}'
581+
582+
@property
583+
def base64(self) -> str:
584+
"""Return the binary data as a base64-encoded string. Default encoding is UTF-8."""
585+
return base64.b64encode(self.data).decode()
578586

579587
@property
580588
def is_audio(self) -> bool:
@@ -776,7 +784,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
776784
elif isinstance(part, BinaryContent):
777785
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.media_type)
778786
if settings.include_content and settings.include_binary_content:
779-
converted_part['content'] = base64.b64encode(part.data).decode()
787+
converted_part['content'] = part.base64
780788
parts.append(converted_part)
781789
elif isinstance(part, CachePoint):
782790
# CachePoint is a marker, not actual content - skip it for otel
@@ -1396,7 +1404,7 @@ def new_event_body():
13961404
'kind': 'binary',
13971405
'media_type': part.content.media_type,
13981406
**(
1399-
{'binary_content': base64.b64encode(part.content.data).decode()}
1407+
{'binary_content': part.content.base64}
14001408
if settings.include_content and settings.include_binary_content
14011409
else {}
14021410
),
@@ -1430,7 +1438,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
14301438
elif isinstance(part, FilePart):
14311439
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.content.media_type)
14321440
if settings.include_content and settings.include_binary_content:
1433-
converted_part['content'] = base64.b64encode(part.content.data).decode()
1441+
converted_part['content'] = part.content.base64
14341442
parts.append(converted_part)
14351443
elif isinstance(part, BaseToolCallPart):
14361444
call_part = _otel_messages.ToolCallPart(type='tool_call', id=part.tool_call_id, name=part.tool_name)

pydantic_ai_slim/pydantic_ai/models/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import base64
1010
import warnings
1111
from abc import ABC, abstractmethod
12-
from collections.abc import AsyncIterator, Callable, Iterator
12+
from collections.abc import AsyncIterator, Callable, Iterator, Sequence
1313
from contextlib import asynccontextmanager, contextmanager
1414
from dataclasses import dataclass, field, replace
1515
from datetime import datetime
@@ -797,7 +797,7 @@ def base_url(self) -> str | None:
797797

798798
@staticmethod
799799
def _get_instructions(
800-
messages: list[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
800+
messages: Sequence[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
801801
) -> str | None:
802802
"""Get instructions from the first ModelRequest found when iterating messages in reverse.
803803

pydantic_ai_slim/pydantic_ai/models/anthropic.py

Lines changed: 40 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@
7171
omit as OMIT,
7272
)
7373
from anthropic.types.beta import (
74-
BetaBase64PDFBlockParam,
7574
BetaBase64PDFSourceParam,
7675
BetaCacheControlEphemeralParam,
7776
BetaCitationsConfigParam,
@@ -105,6 +104,7 @@
105104
BetaRawMessageStreamEvent,
106105
BetaRedactedThinkingBlock,
107106
BetaRedactedThinkingBlockParam,
107+
BetaRequestDocumentBlockParam,
108108
BetaRequestMCPServerToolConfigurationParam,
109109
BetaRequestMCPServerURLDefinitionParam,
110110
BetaServerToolUseBlock,
@@ -1047,6 +1047,31 @@ def _add_cache_control_to_last_param(
10471047
# Add cache_control to the last param
10481048
last_param['cache_control'] = self._build_cache_control(ttl)
10491049

1050+
@staticmethod
1051+
def _map_binary_data(data: bytes, media_type: str) -> BetaContentBlockParam:
1052+
# Anthropic SDK accepts file-like objects (IO[bytes]) and handles base64 encoding internally
1053+
if media_type.startswith('image/'):
1054+
return BetaImageBlockParam(
1055+
source={'data': io.BytesIO(data), 'media_type': media_type, 'type': 'base64'}, # type: ignore
1056+
type='image',
1057+
)
1058+
elif media_type == 'application/pdf':
1059+
return BetaRequestDocumentBlockParam(
1060+
source=BetaBase64PDFSourceParam(
1061+
data=io.BytesIO(data),
1062+
media_type='application/pdf',
1063+
type='base64',
1064+
),
1065+
type='document',
1066+
)
1067+
elif media_type == 'text/plain':
1068+
return BetaRequestDocumentBlockParam(
1069+
source=BetaPlainTextSourceParam(data=data.decode('utf-8'), media_type=media_type, type='text'),
1070+
type='document',
1071+
)
1072+
else:
1073+
raise RuntimeError(f'Unsupported binary content media type for Anthropic: {media_type}')
1074+
10501075
@staticmethod
10511076
async def _map_user_prompt(
10521077
part: UserPromptPart,
@@ -1062,30 +1087,25 @@ async def _map_user_prompt(
10621087
elif isinstance(item, CachePoint):
10631088
yield item
10641089
elif isinstance(item, BinaryContent):
1065-
if item.is_image:
1066-
yield BetaImageBlockParam(
1067-
source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore
1068-
type='image',
1069-
)
1070-
elif item.media_type == 'application/pdf':
1071-
yield BetaBase64PDFBlockParam(
1072-
source=BetaBase64PDFSourceParam(
1073-
data=io.BytesIO(item.data),
1074-
media_type='application/pdf',
1075-
type='base64',
1076-
),
1077-
type='document',
1078-
)
1079-
else:
1080-
raise RuntimeError('Only images and PDFs are supported for binary content')
1090+
yield AnthropicModel._map_binary_data(item.data, item.media_type)
10811091
elif isinstance(item, ImageUrl):
1082-
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
1092+
if item.force_download:
1093+
downloaded = await download_item(item, data_format='bytes')
1094+
yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
1095+
else:
1096+
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
10831097
elif isinstance(item, DocumentUrl):
10841098
if item.media_type == 'application/pdf':
1085-
yield BetaBase64PDFBlockParam(source={'url': item.url, 'type': 'url'}, type='document')
1099+
if item.force_download:
1100+
downloaded = await download_item(item, data_format='bytes')
1101+
yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
1102+
else:
1103+
yield BetaRequestDocumentBlockParam(
1104+
source={'url': item.url, 'type': 'url'}, type='document'
1105+
)
10861106
elif item.media_type == 'text/plain':
10871107
downloaded_item = await download_item(item, data_format='text')
1088-
yield BetaBase64PDFBlockParam(
1108+
yield BetaRequestDocumentBlockParam(
10891109
source=BetaPlainTextSourceParam(
10901110
data=downloaded_item['data'], media_type=item.media_type, type='text'
10911111
),

pydantic_ai_slim/pydantic_ai/models/bedrock.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import functools
44
import typing
5-
from collections.abc import AsyncIterator, Iterable, Iterator, Mapping
5+
from collections.abc import AsyncIterator, Iterable, Iterator, Mapping, Sequence
66
from contextlib import asynccontextmanager
77
from dataclasses import dataclass, field
88
from datetime import datetime
@@ -545,7 +545,7 @@ def _map_tool_config(
545545

546546
async def _map_messages( # noqa: C901
547547
self,
548-
messages: list[ModelMessage],
548+
messages: Sequence[ModelMessage],
549549
model_request_parameters: ModelRequestParameters,
550550
model_settings: BedrockModelSettings | None,
551551
) -> tuple[list[SystemContentBlockTypeDef], list[MessageUnionTypeDef]]:

pydantic_ai_slim/pydantic_ai/models/gemini.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
from __future__ import annotations as _annotations
22

3-
import base64
43
from collections.abc import AsyncIterator, Sequence
54
from contextlib import asynccontextmanager
65
from dataclasses import dataclass, field
@@ -377,9 +376,8 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[_GeminiPartUnion]
377376
if isinstance(item, str):
378377
content.append({'text': item})
379378
elif isinstance(item, BinaryContent):
380-
base64_encoded = base64.b64encode(item.data).decode('utf-8')
381379
content.append(
382-
_GeminiInlineDataPart(inline_data={'data': base64_encoded, 'mime_type': item.media_type})
380+
_GeminiInlineDataPart(inline_data={'data': item.base64, 'mime_type': item.media_type})
383381
)
384382
elif isinstance(item, VideoUrl) and item.is_youtube:
385383
file_data = _GeminiFileDataPart(file_data={'file_uri': item.url, 'mime_type': item.media_type})

0 commit comments

Comments
 (0)