-
- Notifications
You must be signed in to change notification settings - Fork 5k
Description
The Feature
When using the Pillar Security guardrail in monitor mode with Zero Persistence, include detection metadata in HTTP response headers. This
enables users to capture threat detection data client-side for metrics, false positive analysis, and investigation—without building custom
guardrails or ETL pipelines.
Proposed headers:
x-pillar-flagged: Boolean indicating if content was flagged ("true" / "false")x-pillar-scanners: URL-encoded JSON object with scanner category resultsx-pillar-evidence: URL-encoded JSON array of detection evidence (truncated to 8KB)x-pillar-session-id: URL-encoded session ID for correlation
Why headers instead of response body?
The OpenAI response schema is strict, so adding custom fields like _pillar_monitor_payload to the response body isn't feasible. Headers are
the natural alternative—similar to how LiteLLM already mutates headers for x-litellm-applied-guardrails.
Why URL encoding?
HTTP headers only support ISO-8859-1 characters. URL encoding (percent-encoding) converts JSON and Unicode characters into an ASCII-safe
format, enabling safe transport of structured detection data and international text in evidence.
Header size limits:
- Each header is truncated to 8KB max (typical per-header limit)
- Total headers should stay under ~32KB (common server limit for all headers combined)
- Truncated evidence is marked with evidence_truncated: true flag
Example response headers:
x-pillar-flagged: true x-pillar-session-id: abc-123-def-456 x-pillar-scanners: %7B%22jailbreak%22%3Atrue%2C%22prompt_injection%22%3Afalse%7D x-pillar-evidence: %5B%7B%22category%22%3A%22prompt_injection%22%2C%22evidence%22%3A%22Ignore%20previous%20instructions%22%7D%5D Decoding (Python):
from urllib.parse import unquote import json scanners = json.loads(unquote(response.headers["x-pillar-scanners"])) evidence = json.loads(unquote(response.headers["x-pillar-evidence"])) session_id = unquote(response.headers["x-pillar-session-id"])Motivation, pitch
The problem: When running Pillar in monitor mode with Zero Persistence, users need to keep detection data on their side to build metrics and
understand false positive rates. Currently, the only option is logs—which aren't easily queryable.
The workaround (painful): Build a custom guardrail that saves artifacts to S3 where the log writes happen, then pull with ETL, analyze with
OpenTelemetry, and investigate each request. This creates significant DevOps friction.
The solution: Return detection data in response headers. This allows users to:
- Capture threat data directly in their application layer
- Build metrics dashboards without custom guardrails or ETL
- Analyze false positive rates by correlating headers with user feedback
- Investigate flagged requests using session IDs
- Integrate with existing observability tools that already capture headers
This follows the existing pattern of x-litellm-applied-guardrails and avoids modifying the strict OpenAI response schema.
LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?
No
Twitter / LinkedIn details
No response