Skip to content

[Feature]: Return Pillar guardrail detection data in response headers (monitor mode) #17809

@afogel

Description

@afogel

The Feature

When using the Pillar Security guardrail in monitor mode with Zero Persistence, include detection metadata in HTTP response headers. This
enables users to capture threat detection data client-side for metrics, false positive analysis, and investigation—without building custom
guardrails or ETL pipelines.

Proposed headers:

  • x-pillar-flagged: Boolean indicating if content was flagged ("true" / "false")
  • x-pillar-scanners: URL-encoded JSON object with scanner category results
  • x-pillar-evidence: URL-encoded JSON array of detection evidence (truncated to 8KB)
  • x-pillar-session-id: URL-encoded session ID for correlation

Why headers instead of response body?
The OpenAI response schema is strict, so adding custom fields like _pillar_monitor_payload to the response body isn't feasible. Headers are
the natural alternative—similar to how LiteLLM already mutates headers for x-litellm-applied-guardrails.

Why URL encoding?
HTTP headers only support ISO-8859-1 characters. URL encoding (percent-encoding) converts JSON and Unicode characters into an ASCII-safe
format, enabling safe transport of structured detection data and international text in evidence.

Header size limits:

  • Each header is truncated to 8KB max (typical per-header limit)
  • Total headers should stay under ~32KB (common server limit for all headers combined)
  • Truncated evidence is marked with evidence_truncated: true flag

Example response headers:

 x-pillar-flagged: true x-pillar-session-id: abc-123-def-456 x-pillar-scanners: %7B%22jailbreak%22%3Atrue%2C%22prompt_injection%22%3Afalse%7D x-pillar-evidence: %5B%7B%22category%22%3A%22prompt_injection%22%2C%22evidence%22%3A%22Ignore%20previous%20instructions%22%7D%5D 

Decoding (Python):

 from urllib.parse import unquote import json scanners = json.loads(unquote(response.headers["x-pillar-scanners"])) evidence = json.loads(unquote(response.headers["x-pillar-evidence"])) session_id = unquote(response.headers["x-pillar-session-id"])

Motivation, pitch

The problem: When running Pillar in monitor mode with Zero Persistence, users need to keep detection data on their side to build metrics and
understand false positive rates. Currently, the only option is logs—which aren't easily queryable.

The workaround (painful): Build a custom guardrail that saves artifacts to S3 where the log writes happen, then pull with ETL, analyze with
OpenTelemetry, and investigate each request. This creates significant DevOps friction.

The solution: Return detection data in response headers. This allows users to:

  1. Capture threat data directly in their application layer
  2. Build metrics dashboards without custom guardrails or ETL
  3. Analyze false positive rates by correlating headers with user feedback
  4. Investigate flagged requests using session IDs
  5. Integrate with existing observability tools that already capture headers

This follows the existing pattern of x-litellm-applied-guardrails and avoids modifying the strict OpenAI response schema.

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions