Skip to content

Conversation

@Danipulok
Copy link
Contributor

@Danipulok Danipulok commented Nov 22, 2025

Closes #3485

Changes:

  • Change logic of handling output tools with end_strategy='exhaustive';
  • Added tests;
  • Updated docstrings;
  • Updated docs;

How I verified:

MRE:

import asyncio import os from pydantic import BaseModel from pydantic_ai import Agent, ModelSettings, ToolOutput from pydantic_ai.models.openai import OpenAIChatModel from pydantic_ai.providers.openai import OpenAIProvider class TextMessage(BaseModel): text: str | None = None async def send_text_message( message: TextMessage, ) -> None: """Send a text message.""" print(f"\nText Message: {message}") class QuickRepliesMessage(BaseModel): text: str | None = None quick_replies: list[str] | None = None async def send_quick_replies_message( message: QuickRepliesMessage, ) -> None: """Send a quick replies message.""" print(f"\nQuick Replies Message: {message}") class TestDate(BaseModel): days_of_sunshine: int info: str async def main() -> None: api_key = os.environ["OPENAI_API_KEY"] model = OpenAIChatModel( "gpt-4o", provider=OpenAIProvider(api_key=api_key), settings=ModelSettings( temperature=0.1, ), ) output_type = [ ToolOutput(send_text_message, name="send_text_message"), ToolOutput(send_quick_replies_message, name="send_quick_replies_message"), ] agent = Agent( model, output_type=output_type, instructions="For response, call at first `send_quick_replies_message` and then `send_text_message`, both in parallel", end_strategy="exhaustive", # comment to use the default "early" strategy ) user_prompt = "Tell me about Python" async with agent.run_stream(user_prompt) as run: async for output in run.stream_responses(): model_response, is_last_message = output print(model_response, is_last_message, end="\n\n") if __name__ == "__main__": asyncio.run(main())

Old output:

ModelResponse(parts=[ToolCallPart(tool_name='send_text_message', args='{"text": "Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used in web development, data analysis, artificial intelligence, scientific computing, and more."}', tool_call_id='call_iqyle7PEgwRrKPIB9uhMaEQ9'), ToolCallPart(tool_name='send_quick_replies_message', args='{"text": "Would you like to know more about Python?", "quick_replies": ["History of Python", "Python Features", "Python Applications", "Learning Resources"]}', tool_call_id='call_50K33vhEJ7lyfk6cjulek3hZ')], usage=RequestUsage(input_tokens=124, output_tokens=127, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o-2024-08-06', timestamp=datetime.datetime(2025, 11, 20, 16, 42, 7, tzinfo=TzInfo(0)), provider_name='openai', provider_details={'finish_reason': 'tool_calls'}, provider_response_id='chatcmpl-Ce21PzXA1xj1yLIcXxGBJyZR5KrEy', finish_reason='tool_call') True Text Message: text='Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used in web development, data analysis, artificial intelligence, scientific computing, and more.' Process finished with exit code 0 

New output:

ModelResponse(parts=[ToolCallPart(tool_name='send_quick_replies_message', args='{"text": "What would you like to know about Python?", "quick_replies": ["History", "Features", "Applications", "Learning Resources"]}', tool_call_id='call_mmWS2iYDasr3rqosyxozy1UG'), ToolCallPart(tool_name='send_text_message', args='{"text": "Python is a high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used in web development, data analysis, artificial intelligence, scientific computing, and more."}', tool_call_id='call_IEmqlH3CxF8vgGDrEhfU6fpQ')], usage=RequestUsage(input_tokens=126, output_tokens=123, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o-2024-08-06', timestamp=datetime.datetime(2025, 11, 22, 22, 12, 54, tzinfo=TzInfo(0)), provider_name='openai', provider_details={'finish_reason': 'tool_calls'}, provider_response_id='chatcmpl-Ceq8cWafgeCMWija35zEkUqeHmK7b', finish_reason='tool_call') True Quick Replies Message: text='What would you like to know about Python?' quick_replies=['History', 'Features', 'Applications', 'Learning Resources'] Text Message: text='Python is a high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used in web development, data analysis, artificial intelligence, scientific computing, and more.' Process finished with exit code 0 
@Danipulok
Copy link
Contributor Author

@DouweM, hey!
I have finished the PR we discussed in #3485, please check
I'm not sure about the new logic, maybe it needs some changes

Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Danipulok Thank you! I'll likely make some tweaks to the docs before merging, but first please have a look at my code comments

@Danipulok
Copy link
Contributor Author

@DouweM, I have upgraded the PR
Please review

Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Danipulok Thanks!

Addresses PR pydantic#3523 comments: - Execute all output tool functions for side effects in exhaustive mode - Use first valid output result regardless of order - Don't fail run if already have valid result and output_retries=0 - Add tests for invalid-first/valid-second and valid-first/invalid-second scenarios - Update documentation to clarify 'first valid' behavior - Create dedicated End Strategy section in tools-advanced.md Changes: - _agent_graph.py: Handle UnexpectedModelBehavior/ToolRetryError when final_result exists - tests/test_agent.py: Add 2 critical exhaustive strategy tests - docs/output.md: Convert to note block, clarify first valid result - docs/tools-advanced.md: Add ### End Strategy section for ToC visibility
@Danipulok
Copy link
Contributor Author

Hey @DouweM!

While I was writing tests for this PR, I discovered that output tools are called twice (or more times) in streaming mode (but only once in sync mode).

The first call happens at _agent_graph.py:861 during tool call processing in the graph execution (the part what I'm changing in this PR), and the second call happens at result.py:179 when validating the response in get_output().

Details

Output tool processors are invoked:

  • 2 times when using get_output() alone
  • 2 times when using stream_output() alone
  • 4 times when using both stream_output() + get_output()
  • 0 times when the stream isn't consumed at all (I think this is kind okay, but not sure)

Here's some MRE, written by an LLM: gist

Question

This causes issues for output tools, which should be called only once. What's the appropriate way to handle this - should the result be cached after the first call, or what's the appropriate solution? And should this be changed at all and in this PR or in PR for #3473 or some other PR? (I'm just asking for better understanding)

About current PR

Should I add tests for this behavior at all? I want to duplicate all tests in test_streaming.py and test_agent.py. But because of difference in behavior I'm not sure what I should do. I think for now I will add missing tests with current behavior

@DouweM
Copy link
Collaborator

DouweM commented Dec 3, 2025

@Danipulok Hmm I agree that's wrong or at the least unexpected. Can you file a new issue for that please? We'll want to discuss a solution separately from these PRs.

I think for now I will add missing tests with current behavior

Sounds good

@Danipulok
Copy link
Contributor Author

@DouweM, thank you!
I have created an issue for this: #3624

@Danipulok
Copy link
Contributor Author

Danipulok commented Dec 9, 2025

@DouweM, I'm proud to announce that I have finished everything I wanted
I'm sorry it took so long, I really didn't intend for it to take such a long time and thank you for your patience

I think most of the time was spent on organizing and understanding the tests, but I think it was worth it
I have updated the docs and everything, please check
Now the tests are mirrored and much more easy to understand and be used IMHO

@Danipulok Danipulok requested a review from DouweM December 9, 2025 23:08
@Danipulok
Copy link
Contributor Author

@DouweM , I have updated the PR, please check it
I would really appreciate if you could check it today, so maybe I could start working on other PRs on weekends (but not 100% sure)

The only question for me is: #3523 (comment)
Did I understand it correctly?
I think a new sentence should be used or maybe the RetryPart is kind of okay...

@Danipulok Danipulok requested a review from DouweM December 12, 2025 03:25
@DouweM
Copy link
Collaborator

DouweM commented Dec 12, 2025

@Danipulok I'll review and hopefully merge today; appreciate the patience!

By the way are you on our Slack already? Could be useful to have a more direct line of communication than only GH :)

@DouweM DouweM enabled auto-merge (squash) December 12, 2025 19:33
auto-merge was automatically disabled December 12, 2025 19:36

Head branch was pushed to by a user without write access

@DouweM DouweM merged commit b2cbbea into pydantic:main Dec 12, 2025
30 checks passed
@Danipulok Danipulok deleted the feat/end-strategy branch December 12, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants