Middleware
langchain.agents.middleware ¶
Entrypoint to using Middleware plugins with Agents.
Reference docs
This page contains reference documentation for Middleware. See the docs for conceptual guides, tutorials, and examples on using Middleware.
| CLASS | DESCRIPTION |
|---|---|
ContextEditingMiddleware | Automatically prunes tool results to manage context size. |
HumanInTheLoopMiddleware | Human in the loop middleware. |
LLMToolSelectorMiddleware | Uses an LLM to select relevant tools before calling the main model. |
LLMToolEmulator | Emulates specified tools using an LLM instead of executing them. |
ModelCallLimitMiddleware | Tracks model call counts and enforces limits. |
ModelFallbackMiddleware | Automatic fallback to alternative models on errors. |
PIIMiddleware | Detect and handle Personally Identifiable Information (PII) in agent conversations. |
PIIDetectionError | Raised when configured to block on detected sensitive values. |
SummarizationMiddleware | Summarizes conversation history when token limits are approached. |
ToolCallLimitMiddleware | Tracks tool call counts and enforces limits. |
AgentMiddleware | Base middleware class for an agent. |
AgentState | State schema for the agent. |
ClearToolUsesEdit | Configuration for clearing tool outputs when token limits are exceeded. |
InterruptOnConfig | Configuration for an action requiring human in the loop. |
ModelRequest | Model request information for the agent. |
ModelResponse | Response from model execution including messages and optional structured output. |
ModelRequest | Model request information for the agent. |
ContextEditingMiddleware ¶
Bases: AgentMiddleware
Automatically prunes tool results to manage context size.
The middleware applies a sequence of edits when the total input token count exceeds configured thresholds. Currently the ClearToolUsesEdit strategy is supported, aligning with Anthropic's clear_tool_uses_20250919 behaviour.
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( *, edits: Iterable[ContextEdit] | None = None, token_count_method: Literal["approximate", "model"] = "approximate", ) -> None Initializes a context editing middleware instance.
| PARAMETER | DESCRIPTION |
|---|---|
edits | Sequence of edit strategies to apply. Defaults to a single TYPE: |
token_count_method | Whether to use approximate token counting (faster, less accurate) or exact counting implemented by the chat model (potentially slower, more accurate). TYPE: |
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Apply context edits before invoking the model via handler.
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Apply context edits before invoking the model via handler (async version).
HumanInTheLoopMiddleware ¶
Bases: AgentMiddleware
Human in the loop middleware.
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( interrupt_on: dict[str, bool | InterruptOnConfig], *, description_prefix: str = "Tool execution requires approval", ) -> None Initialize the human in the loop middleware.
| PARAMETER | DESCRIPTION |
|---|---|
interrupt_on | Mapping of tool name to allowed actions. If a tool doesn't have an entry, it's auto-approved by default.
TYPE: |
description_prefix | The prefix to use when constructing action requests. This is used to provide context about the tool call and the action being requested. Not used if a tool has a TYPE: |
LLMToolSelectorMiddleware ¶
Bases: AgentMiddleware
Uses an LLM to select relevant tools before calling the main model.
When an agent has many tools available, this middleware filters them down to only the most relevant ones for the user's query. This reduces token usage and helps the main model focus on the right tools.
Examples:
Limit to 3 tools:
from langchain.agents.middleware import LLMToolSelectorMiddleware middleware = LLMToolSelectorMiddleware(max_tools=3) agent = create_agent( model="openai:gpt-4o", tools=[tool1, tool2, tool3, tool4, tool5], middleware=[middleware], ) Use a smaller model for selection:
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( *, model: str | BaseChatModel | None = None, system_prompt: str = DEFAULT_SYSTEM_PROMPT, max_tools: int | None = None, always_include: list[str] | None = None, ) -> None Initialize the tool selector.
| PARAMETER | DESCRIPTION |
|---|---|
model | Model to use for selection. If not provided, uses the agent's main model. Can be a model identifier string or BaseChatModel instance. TYPE: |
system_prompt | Instructions for the selection model. TYPE: |
max_tools | Maximum number of tools to select. If the model selects more, only the first max_tools will be used. No limit if not specified. TYPE: |
always_include | Tool names to always include regardless of selection. These do not count against the max_tools limit. |
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Filter tools based on LLM selection before invoking the model via handler.
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Filter tools based on LLM selection before invoking the model via handler.
LLMToolEmulator ¶
Bases: AgentMiddleware
Emulates specified tools using an LLM instead of executing them.
This middleware allows selective emulation of tools for testing purposes. By default (when tools=None), all tools are emulated. You can specify which tools to emulate by passing a list of tool names or BaseTool instances.
Examples:
Emulate all tools (default behavior):
from langchain.agents.middleware import LLMToolEmulator middleware = LLMToolEmulator() agent = create_agent( model="openai:gpt-4o", tools=[get_weather, get_user_location, calculator], middleware=[middleware], ) Emulate specific tools by name:
Use a custom model for emulation:
Emulate specific tools by passing tool instances:
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
__init__ ¶
__init__( *, tools: list[str | BaseTool] | None = None, model: str | BaseChatModel | None = None, ) -> None Initialize the tool emulator.
| PARAMETER | DESCRIPTION |
|---|---|
tools | List of tool names (str) or BaseTool instances to emulate. If None (default), ALL tools will be emulated. If empty list, no tools will be emulated. |
model | Model to use for emulation. Defaults to "anthropic:claude-sonnet-4-5-20250929". Can be a model identifier string or BaseChatModel instance. TYPE: |
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Emulate tool execution using LLM if tool should be emulated.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request to potentially emulate. TYPE: |
handler | Callback to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command | ToolMessage with emulated response if tool should be emulated, |
ToolMessage | Command | otherwise calls handler for normal execution. |
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Async version of wrap_tool_call.
Emulate tool execution using LLM if tool should be emulated.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request to potentially emulate. TYPE: |
handler | Async callback to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command | ToolMessage with emulated response if tool should be emulated, |
ToolMessage | Command | otherwise calls handler for normal execution. |
ModelCallLimitMiddleware ¶
Bases: AgentMiddleware[ModelCallLimitState, Any]
Tracks model call counts and enforces limits.
This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.
Example
from langchain.agents.middleware.call_tracking import ModelCallLimitMiddleware from langchain.agents import create_agent # Create middleware with limits call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end") agent = create_agent("openai:gpt-4o", middleware=[call_tracker]) # Agent will automatically jump to end when limits are exceeded result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]}) name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
abefore_model async ¶
Async logic to run before the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
state_schema class-attribute instance-attribute ¶
The schema for state passed to the middleware nodes.
__init__ ¶
__init__( *, thread_limit: int | None = None, run_limit: int | None = None, exit_behavior: Literal["end", "error"] = "end", ) -> None Initialize the call tracking middleware.
| PARAMETER | DESCRIPTION |
|---|---|
thread_limit | Maximum number of model calls allowed per thread. None means no limit. TYPE: |
run_limit | Maximum number of model calls allowed per run. None means no limit. TYPE: |
exit_behavior | What to do when limits are exceeded. - "end": Jump to the end of the agent execution and inject an artificial AI message indicating that the limit was exceeded. - "error": Raise a TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If both limits are |
before_model ¶
Check model call limits before making a model call.
| PARAMETER | DESCRIPTION |
|---|---|
state | The current agent state containing call counts. TYPE: |
runtime | The langgraph runtime. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None | If limits are exceeded and exit_behavior is "end", returns |
dict[str, Any] | None | a Command to jump to the end with a limit exceeded message. Otherwise returns None. |
| RAISES | DESCRIPTION |
|---|---|
ModelCallLimitExceededError | If limits are exceeded and exit_behavior is "error". |
after_model ¶
ModelFallbackMiddleware ¶
Bases: AgentMiddleware
Automatic fallback to alternative models on errors.
Retries failed model calls with alternative models in sequence until success or all models exhausted. Primary model specified in create_agent().
Example
from langchain.agents.middleware.model_fallback import ModelFallbackMiddleware from langchain.agents import create_agent fallback = ModelFallbackMiddleware( "openai:gpt-4o-mini", # Try first on error "anthropic:claude-sonnet-4-5-20250929", # Then this ) agent = create_agent( model="openai:gpt-4o", # Primary model middleware=[fallback], ) # If primary fails: tries gpt-4o-mini, then claude-sonnet-4-5-20250929 result = await agent.invoke({"messages": [HumanMessage("Hello")]}) state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( first_model: str | BaseChatModel, *additional_models: str | BaseChatModel ) -> None Initialize model fallback middleware.
| PARAMETER | DESCRIPTION |
|---|---|
first_model | First fallback model (string name or instance). TYPE: |
*additional_models | Additional fallbacks in order. TYPE: |
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Try fallback models in sequence on errors.
| PARAMETER | DESCRIPTION |
|---|---|
request | Initial model request. TYPE: |
handler | Callback to execute the model. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | AIMessage from successful model call. |
| RAISES | DESCRIPTION |
|---|---|
Exception | If all models fail, re-raises last exception. |
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Try fallback models in sequence on errors (async version).
| PARAMETER | DESCRIPTION |
|---|---|
request | Initial model request. TYPE: |
handler | Async callback to execute the model. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | AIMessage from successful model call. |
| RAISES | DESCRIPTION |
|---|---|
Exception | If all models fail, re-raises last exception. |
PIIMiddleware ¶
Bases: AgentMiddleware
Detect and handle Personally Identifiable Information (PII) in agent conversations.
This middleware detects common PII types and applies configurable strategies to handle them. It can detect emails, credit cards, IP addresses, MAC addresses, and URLs in both user input and agent output.
Built-in PII types
email: Email addressescredit_card: Credit card numbers (validated with Luhn algorithm)ip: IP addresses (validated with stdlib)mac_address: MAC addressesurl: URLs (bothhttp/httpsand bare URLs)
Strategies
block: Raise an exception when PII is detectedredact: Replace PII with[REDACTED_TYPE]placeholdersmask: Partially mask PII (e.g.,****-****-****-1234for credit card)hash: Replace PII with deterministic hash (e.g.,<email_hash:a1b2c3d4>)
Strategy Selection Guide:
| Strategy | Preserves Identity? | Best For |
|---|---|---|
block | N/A | Avoid PII completely |
redact | No | General compliance, log sanitization |
mask | No | Human readability, customer service UIs |
hash | Yes (pseudonymous) | Analytics, debugging |
Example
from langchain.agents.middleware import PIIMiddleware from langchain.agents import create_agent # Redact all emails in user input agent = create_agent( "openai:gpt-5", middleware=[ PIIMiddleware("email", strategy="redact"), ], ) # Use different strategies for different PII types agent = create_agent( "openai:gpt-4o", middleware=[ PIIMiddleware("credit_card", strategy="mask"), PIIMiddleware("url", strategy="redact"), PIIMiddleware("ip", strategy="hash"), ], ) # Custom PII type with regex agent = create_agent( "openai:gpt-5", middleware=[ PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy="block"), ], ) state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
abefore_model async ¶
Async logic to run before the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( pii_type: Literal["email", "credit_card", "ip", "mac_address", "url"] | str, *, strategy: Literal["block", "redact", "mask", "hash"] = "redact", detector: Callable[[str], list[PIIMatch]] | str | None = None, apply_to_input: bool = True, apply_to_output: bool = False, apply_to_tool_results: bool = False, ) -> None Initialize the PII detection middleware.
| PARAMETER | DESCRIPTION |
|---|---|
pii_type | Type of PII to detect. Can be a built-in type ( TYPE: |
strategy | How to handle detected PII:
TYPE: |
detector | Custom detector function or regex pattern.
TYPE: |
apply_to_input | Whether to check user messages before model call. TYPE: |
apply_to_output | Whether to check AI messages after model call. TYPE: |
apply_to_tool_results | Whether to check tool result messages after tool execution. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If pii_type is not built-in and no detector is provided. |
before_model ¶
before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None Check user messages and tool results for PII before model invocation.
| PARAMETER | DESCRIPTION |
|---|---|
state | The current agent state. TYPE: |
runtime | The langgraph runtime. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None | Updated state with PII handled according to strategy, or None if no PII detected. |
| RAISES | DESCRIPTION |
|---|---|
PIIDetectionError | If PII is detected and strategy is "block". |
after_model ¶
after_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None Check AI messages for PII after model invocation.
| PARAMETER | DESCRIPTION |
|---|---|
state | The current agent state. TYPE: |
runtime | The langgraph runtime. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None | Updated state with PII handled according to strategy, or None if no PII detected. |
| RAISES | DESCRIPTION |
|---|---|
PIIDetectionError | If PII is detected and strategy is "block". |
SummarizationMiddleware ¶
Bases: AgentMiddleware
Summarizes conversation history when token limits are approached.
This middleware monitors message token counts and automatically summarizes older messages when a threshold is reached, preserving recent messages and maintaining context continuity by ensuring AI/Tool message pairs remain together.
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__ ¶
__init__( model: str | BaseChatModel, max_tokens_before_summary: int | None = None, messages_to_keep: int = _DEFAULT_MESSAGES_TO_KEEP, token_counter: TokenCounter = count_tokens_approximately, summary_prompt: str = DEFAULT_SUMMARY_PROMPT, summary_prefix: str = SUMMARY_PREFIX, ) -> None Initialize the summarization middleware.
| PARAMETER | DESCRIPTION |
|---|---|
model | The language model to use for generating summaries. TYPE: |
max_tokens_before_summary | Token threshold to trigger summarization. If TYPE: |
messages_to_keep | Number of recent messages to preserve after summarization. TYPE: |
token_counter | Function to count tokens in messages. TYPE: |
summary_prompt | Prompt template for generating summaries. TYPE: |
summary_prefix | Prefix added to system message when including summary. TYPE: |
ToolCallLimitMiddleware ¶
Bases: AgentMiddleware[ToolCallLimitState, Any]
Tracks tool call counts and enforces limits.
This middleware monitors the number of tool calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the total number of tool calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of tool calls made during a single run (invocation) of the agent.
Example
from langchain.agents.middleware.tool_call_limit import ToolCallLimitMiddleware from langchain.agents import create_agent # Limit all tool calls globally global_limiter = ToolCallLimitMiddleware(thread_limit=20, run_limit=10, exit_behavior="end") # Limit a specific tool search_limiter = ToolCallLimitMiddleware( tool_name="search", thread_limit=5, run_limit=3, exit_behavior="end" ) # Use both in the same agent agent = create_agent("openai:gpt-4o", middleware=[global_limiter, search_limiter]) result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]}) before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
abefore_model async ¶
Async logic to run before the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
state_schema class-attribute instance-attribute ¶
The schema for state passed to the middleware nodes.
__init__ ¶
__init__( *, tool_name: str | None = None, thread_limit: int | None = None, run_limit: int | None = None, exit_behavior: Literal["end", "error"] = "end", ) -> None Initialize the tool call limit middleware.
| PARAMETER | DESCRIPTION |
|---|---|
tool_name | Name of the specific tool to limit. If TYPE: |
thread_limit | Maximum number of tool calls allowed per thread. None means no limit. Defaults to TYPE: |
run_limit | Maximum number of tool calls allowed per run. None means no limit. Defaults to TYPE: |
exit_behavior | What to do when limits are exceeded. - "end": Jump to the end of the agent execution and inject an artificial AI message indicating that the limit was exceeded. - "error": Raise a ToolCallLimitExceededError Defaults to "end". TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If both limits are |
name property ¶
name: str The name of the middleware instance.
Includes the tool name if specified to allow multiple instances of this middleware with different tool names.
before_model ¶
Check tool call limits before making a model call.
| PARAMETER | DESCRIPTION |
|---|---|
state | The current agent state containing tool call counts. TYPE: |
runtime | The langgraph runtime. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None | If limits are exceeded and exit_behavior is "end", returns |
dict[str, Any] | None | a Command to jump to the end with a limit exceeded message. Otherwise returns None. |
| RAISES | DESCRIPTION |
|---|---|
ToolCallLimitExceededError | If limits are exceeded and exit_behavior is "error". |
after_model ¶
Increment tool call counts after a model call (when tool calls are made).
| PARAMETER | DESCRIPTION |
|---|---|
state | The current agent state. TYPE: |
runtime | The langgraph runtime. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None | State updates with incremented tool call counts if tool calls were made. |
AgentMiddleware ¶
Bases: Generic[StateT, ContextT]
Base middleware class for an agent.
Subclass this and implement any of the defined methods to customize agent behavior between steps in the main agent loop.
state_schema class-attribute instance-attribute ¶
state_schema: type[StateT] = cast('type[StateT]', AgentState) The schema for state passed to the middleware nodes.
name property ¶
name: str The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent ¶
Logic to run before the agent execution starts.
abefore_agent async ¶
Async logic to run before the agent execution starts.
before_model ¶
Logic to run before the model is called.
abefore_model async ¶
Async logic to run before the model is called.
after_model ¶
Logic to run after the model is called.
aafter_model async ¶
Async logic to run after the model is called.
wrap_model_call ¶
wrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse] ) -> ModelCallResult Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult |
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler): for attempt in range(3): try: return handler(request) except Exception: if attempt == 2: raise Rewrite response:
def wrap_model_call(self, request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=f"[{ai_msg.content}]")], structured_response=response.structured_response, ) Error to fallback:
def wrap_model_call(self, request, handler): try: return handler(request) except Exception: return ModelResponse(result=[AIMessage(content="Service unavailable")]) Cache/short-circuit:
def wrap_model_call(self, request, handler): if cached := get_cache(request): return cached # Short-circuit with cached result response = handler(request) save_cache(request, response) return response Simple AIMessage return (converted automatically):
awrap_model_call async ¶
awrap_model_call( request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]] ) -> ModelCallResult Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Model request to execute (includes state and runtime). TYPE: |
handler | Async callback that executes the model request and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult | ModelCallResult |
Examples:
Retry on error:
after_agent ¶
Logic to run after the agent execution completes.
aafter_agent async ¶
Async logic to run after the agent execution completes.
wrap_tool_call ¶
wrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], ToolMessage | Command], ) -> ToolMessage | Command Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost). Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Callable to execute the tool (can be called multiple times). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler): for attempt in range(3): try: result = handler(request) if is_valid(result): return result except Exception: if attempt == 2: raise return result Conditional retry based on response:
awrap_tool_call async ¶
awrap_tool_call( request: ToolCallRequest, handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]], ) -> ToolMessage | Command Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request | Tool call request with call TYPE: |
handler | Async callable to execute the tool and returns TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command |
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
ClearToolUsesEdit dataclass ¶
Bases: ContextEdit
Configuration for clearing tool outputs when token limits are exceeded.
trigger class-attribute instance-attribute ¶
trigger: int = 100000 Token count that triggers the edit.
clear_at_least class-attribute instance-attribute ¶
clear_at_least: int = 0 Minimum number of tokens to reclaim when the edit runs.
keep class-attribute instance-attribute ¶
keep: int = 3 Number of most recent tool results that must be preserved.
clear_tool_inputs class-attribute instance-attribute ¶
clear_tool_inputs: bool = False Whether to clear the originating tool call parameters on the AI message.
exclude_tools class-attribute instance-attribute ¶
List of tool names to exclude from clearing.
placeholder class-attribute instance-attribute ¶
placeholder: str = DEFAULT_TOOL_PLACEHOLDER Placeholder text inserted for cleared tool outputs.
apply ¶
apply(messages: list[AnyMessage], *, count_tokens: TokenCounter) -> None Apply the clear-tool-uses strategy.
InterruptOnConfig ¶
Bases: TypedDict
Configuration for an action requiring human in the loop.
This is the configuration format used in the HumanInTheLoopMiddleware.__init__ method.
allowed_decisions instance-attribute ¶
allowed_decisions: list[DecisionType] The decisions that are allowed for this action.
description instance-attribute ¶
description: NotRequired[str | _DescriptionFactory] The description attached to the request for human input.
Can be either:
- A static string describing the approval request
- A callable that dynamically generates the description based on agent state, runtime, and tool call information
Example
# Static string description config = ToolConfig( allowed_decisions=["approve", "reject"], description="Please review this tool execution" ) # Dynamic callable description def format_tool_description( tool_call: ToolCall, state: AgentState, runtime: Runtime ) -> str: import json return ( f"Tool: {tool_call['name']}\n" f"Arguments:\n{json.dumps(tool_call['args'], indent=2)}" ) config = InterruptOnConfig( allowed_decisions=["approve", "edit", "reject"], description=format_tool_description ) args_schema instance-attribute ¶
args_schema: NotRequired[dict[str, Any]] JSON schema for the args associated with the action, if edits are allowed.
ModelRequest dataclass ¶
Model request information for the agent.
override ¶
override(**overrides: Unpack[_ModelRequestOverrides]) -> ModelRequest Replace the request with a new request with the given overrides.
Returns a new ModelRequest instance with the specified attributes replaced. This follows an immutable pattern, leaving the original request unchanged.
| PARAMETER | DESCRIPTION |
|---|---|
**overrides | Keyword arguments for attributes to override. Supported keys: - model: BaseChatModel instance - system_prompt: Optional system prompt string - messages: List of messages - tool_choice: Tool choice configuration - tools: List of available tools - response_format: Response format specification - model_settings: Additional model settings TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelRequest | New ModelRequest instance with specified overrides applied. |
Examples:
ModelResponse dataclass ¶
Response from model execution including messages and optional structured output.
The result will usually contain a single AIMessage, but may include an additional ToolMessage if the model used a tool for structured output.
before_model ¶
before_model( func: _CallableWithStateAndRuntime[StateT, ContextT] | None = None, *, state_schema: type[StateT] | None = None, tools: list[BaseTool] | None = None, can_jump_to: list[JumpTo] | None = None, name: str | None = None, ) -> ( Callable[ [_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT], ] | AgentMiddleware[StateT, ContextT] ) Decorator used to dynamically create a middleware with the before_model hook.
| PARAMETER | DESCRIPTION |
|---|---|
func | The function to be decorated. Must accept: TYPE: |
state_schema | Optional custom state schema type. If not provided, uses the default TYPE: |
tools | Optional list of additional tools to register with this middleware. |
can_jump_to | Optional list of valid jump destinations for conditional edges. Valid values are: TYPE: |
name | Optional name for the generated middleware class. If not provided, uses the decorated function's name. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT] | Either an |
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT] | decorator function that can be applied to a function it is wrapping. |
The decorated function should return
dict[str, Any]- State updates to merge into the agent stateCommand- A command to control flow (e.g., jump to different node)None- No state updates or flow control
Examples:
Basic usage:
@before_model def log_before_model(state: AgentState, runtime: Runtime) -> None: print(f"About to call model with {len(state['messages'])} messages") With conditional jumping:
@before_model(can_jump_to=["end"]) def conditional_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None: if some_condition(state): return {"jump_to": "end"} return None With custom state schema:
after_model ¶
after_model( func: _CallableWithStateAndRuntime[StateT, ContextT] | None = None, *, state_schema: type[StateT] | None = None, tools: list[BaseTool] | None = None, can_jump_to: list[JumpTo] | None = None, name: str | None = None, ) -> ( Callable[ [_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT], ] | AgentMiddleware[StateT, ContextT] ) Decorator used to dynamically create a middleware with the after_model hook.
| PARAMETER | DESCRIPTION |
|---|---|
func | The function to be decorated. Must accept: TYPE: |
state_schema | Optional custom state schema type. If not provided, uses the default TYPE: |
tools | Optional list of additional tools to register with this middleware. |
can_jump_to | Optional list of valid jump destinations for conditional edges. Valid values are: TYPE: |
name | Optional name for the generated middleware class. If not provided, uses the decorated function's name. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT] | Either an |
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT] | function that can be applied to a function. |
The decorated function should return
dict[str, Any]- State updates to merge into the agent stateCommand- A command to control flow (e.g., jump to different node)None- No state updates or flow control
Examples:
Basic usage for logging model responses:
@after_model def log_latest_message(state: AgentState, runtime: Runtime) -> None: print(state["messages"][-1].content) With custom state schema:
wrap_model_call ¶
wrap_model_call( func: _CallableReturningModelResponse[StateT, ContextT] | None = None, *, state_schema: type[StateT] | None = None, tools: list[BaseTool] | None = None, name: str | None = None, ) -> ( Callable[ [_CallableReturningModelResponse[StateT, ContextT]], AgentMiddleware[StateT, ContextT], ] | AgentMiddleware[StateT, ContextT] ) Create middleware with wrap_model_call hook from a function.
Converts a function with handler callback into middleware that can intercept model calls, implement retry logic, handle errors, and rewrite responses.
| PARAMETER | DESCRIPTION |
|---|---|
func | Function accepting (request, handler) that calls handler(request) to execute the model and returns TYPE: |
state_schema | Custom state schema. Defaults to TYPE: |
tools | Additional tools to register with this middleware. |
name | Middleware class name. Defaults to function name. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Callable[[_CallableReturningModelResponse[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT] |
|
Examples:
Basic retry logic:
@wrap_model_call def retry_on_error(request, handler): max_retries = 3 for attempt in range(max_retries): try: return handler(request) except Exception: if attempt == max_retries - 1: raise Model fallback:
@wrap_model_call def fallback_model(request, handler): # Try primary model try: return handler(request) except Exception: pass # Try fallback model request.model = fallback_model_instance return handler(request) Rewrite response content (full ModelResponse):
@wrap_model_call def uppercase_responses(request, handler): response = handler(request) ai_msg = response.result[0] return ModelResponse( result=[AIMessage(content=ai_msg.content.upper())], structured_response=response.structured_response, ) Simple AIMessage return (converted automatically):
wrap_tool_call ¶
wrap_tool_call( func: _CallableReturningToolResponse | None = None, *, tools: list[BaseTool] | None = None, name: str | None = None, ) -> Callable[[_CallableReturningToolResponse], AgentMiddleware] | AgentMiddleware Create middleware with wrap_tool_call hook from a function.
Converts a function with handler callback into middleware that can intercept tool calls, implement retry logic, monitor execution, and modify responses.
| PARAMETER | DESCRIPTION |
|---|---|
func | Function accepting (request, handler) that calls handler(request) to execute the tool and returns final TYPE: |
tools | Additional tools to register with this middleware. |
name | Middleware class name. Defaults to function name. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Callable[[_CallableReturningToolResponse], AgentMiddleware] | AgentMiddleware |
|
Examples:
Retry logic:
@wrap_tool_call def retry_on_error(request, handler): max_retries = 3 for attempt in range(max_retries): try: return handler(request) except Exception: if attempt == max_retries - 1: raise Async retry logic:
@wrap_tool_call async def async_retry(request, handler): for attempt in range(3): try: return await handler(request) except Exception: if attempt == 2: raise Modify request:
@wrap_tool_call def modify_args(request, handler): request.tool_call["args"]["value"] *= 2 return handler(request) Short-circuit with cached result:
ModelRequest dataclass ¶
Model request information for the agent.
override ¶
override(**overrides: Unpack[_ModelRequestOverrides]) -> ModelRequest Replace the request with a new request with the given overrides.
Returns a new ModelRequest instance with the specified attributes replaced. This follows an immutable pattern, leaving the original request unchanged.
| PARAMETER | DESCRIPTION |
|---|---|
**overrides | Keyword arguments for attributes to override. Supported keys: - model: BaseChatModel instance - system_prompt: Optional system prompt string - messages: List of messages - tool_choice: Tool choice configuration - tools: List of available tools - response_format: Response format specification - model_settings: Additional model settings TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ModelRequest | New ModelRequest instance with specified overrides applied. |
Examples: