[Feature]: Return Pillar guardrail detection data in response headers (monitor mode)

The Feature

When using the Pillar Security guardrail in monitor mode with Zero Persistence, include detection metadata in HTTP response headers. This
enables users to capture threat detection data client-side for metrics, false positive analysis, and investigation—without building custom
guardrails or ETL pipelines.

Proposed headers:

x-pillar-flagged: Boolean indicating if content was flagged ("true" / "false")
x-pillar-scanners: URL-encoded JSON object with scanner category results
x-pillar-evidence: URL-encoded JSON array of detection evidence (truncated to 8KB)
x-pillar-session-id: URL-encoded session ID for correlation

Why headers instead of response body?
The OpenAI response schema is strict, so adding custom fields like _pillar_monitor_payload to the response body isn't feasible. Headers are
the natural alternative—similar to how LiteLLM already mutates headers for x-litellm-applied-guardrails.

Why URL encoding?
HTTP headers only support ISO-8859-1 characters. URL encoding (percent-encoding) converts JSON and Unicode characters into an ASCII-safe
format, enabling safe transport of structured detection data and international text in evidence.

Header size limits:

Each header is truncated to 8KB max (typical per-header limit)
Total headers should stay under ~32KB (common server limit for all headers combined)
Truncated evidence is marked with evidence_truncated: true flag

Example response headers:

 x-pillar-flagged: true x-pillar-session-id: abc-123-def-456 x-pillar-scanners: %7B%22jailbreak%22%3Atrue%2C%22prompt_injection%22%3Afalse%7D x-pillar-evidence: %5B%7B%22category%22%3A%22prompt_injection%22%2C%22evidence%22%3A%22Ignore%20previous%20instructions%22%7D%5D

Decoding (Python):

 from urllib.parse import unquote import json scanners = json.loads(unquote(response.headers["x-pillar-scanners"])) evidence = json.loads(unquote(response.headers["x-pillar-evidence"])) session_id = unquote(response.headers["x-pillar-session-id"])

Motivation, pitch

The problem: When running Pillar in monitor mode with Zero Persistence, users need to keep detection data on their side to build metrics and
understand false positive rates. Currently, the only option is logs—which aren't easily queryable.

The workaround (painful): Build a custom guardrail that saves artifacts to S3 where the log writes happen, then pull with ETL, analyze with
OpenTelemetry, and investigate each request. This creates significant DevOps friction.

The solution: Return detection data in response headers. This allows users to:

Capture threat data directly in their application layer
Build metrics dashboards without custom guardrails or ETL
Analyze false positive rates by correlating headers with user feedback
Investigate flagged requests using session IDs
Integrate with existing observability tools that already capture headers

This follows the existing pattern of x-litellm-applied-guardrails and avoids modifying the strict OpenAI response schema.

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Return Pillar guardrail detection data in response headers (monitor mode) #17809

The Feature

Motivation, pitch

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Return Pillar guardrail detection data in response headers (monitor mode) #17809

Description

The Feature

Motivation, pitch

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions