- Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🐞 Bug description
When using the streaming mode (Flux) to call a model that returns tool calls, the aggregated AssistantMessage constructed by MessageAggregator does not contain the toolCalls property. This makes it impossible to retrieve tool call information from previous assistant messages stored in memory.
This behavior becomes particularly problematic when internalToolExecutionEnabled=false, where tool execution is intended to be controlled manually by the user. In such workflows, it's necessary to retrieve tool call information from the last assistant message in memory, but that data is missing due to the above issue.
Note: This issue is not caused by setting internalToolExecutionEnabled=false. Instead, the issue is exacerbated by it, since downstream components rely on consistent toolCalls data across both streaming and non-streaming modes.
💻 Environment
Spring AI Version: 1.0.0
Java Version: 17
Model: Qwen2.5-72B-Instruct
Usage Mode: Streaming (Flux)
Tool Execution Mode: internalToolExecutionEnabled=false
Vector store: Not involved
🪜 Steps to reproduce
Configure a Spring AI chat client with streaming mode enabled.
Ensure that the response from the model includes tool calls.
Use org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
Inspect the AssistantMessage – toolCalls is missing.
Attempt to retrieve tool calls from memory (e.g., ChatMemory#getMessages) – fails.
✅ Expected behavior
The toolCalls property from the GenerationMetadata should be correctly propagated to the resulting AssistantMessage, regardless of whether streaming or non-streaming mode is used. This ensures consistent memory behavior and supports downstream workflows such as manual tool execution confirmation.
🧪 Minimal Complete Reproducible example
My AI Config:
@bean
public ChatMemoryRepository chatMemoryRepository() {
return new InMemoryChatMemoryRepository();
}
@Bean public ChatMemory chatMemory(ChatMemoryRepository chatMemoryRepository) { return MessageWindowChatMemory.builder().maxMessages(10).chatMemoryRepository(chatMemoryRepository).build(); } @Bean public OpenAiChatModel chatModel(OpenAiApi openAiApi, ToolCallingManager toolCallingManager, List<AgentToolsProvider> agentToolsProviders) { AgentToolsProvider[] providers = agentToolsProviders.toArray(new AgentToolsProvider[0]); ToolCallback[] toolCallbacks = ToolCallbacks.from((Object[]) providers); OpenAiChatOptions chatOptions = OpenAiChatOptions.builder() .temperature(0.6) .model("qwen2.5-72b-instruct") .internalToolExecutionEnabled(false) .toolCallbacks(toolCallbacks) .build(); return OpenAiChatModel.builder() .defaultOptions(chatOptions) .toolCallingManager(toolCallingManager) .openAiApi(openAiApi) .build(); } @Bean ChatClient chatClient(OpenAiChatModel chatModel, ChatMemory chatMemory) { return ChatClient.builder(chatModel) .defaultAdvisors( new SimpleLoggerAdvisor(), MessageChatMemoryAdvisor.builder(chatMemory).build() ) .defaultSystem(systemResource) .build(); }
Just chat with AI:
private Flux<ChatResponse> callWithMemory(String conversationId, String userText) { Prompt promptWithMemory = new Prompt(chatMemory.get(conversationId), chatModel.getDefaultOptions()); return client.prompt(promptWithMemory) .user(userText) .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId)) .stream() .chatResponse(); }