Arize AI
AI Engineering Platform
Arize is an AI engineering platform focused on evaluation and observability. It helps engineers develop, evaluate, and observe AI applications and agents.
Arize has both Enterprise and OSS products to support this goal:
Arize AX — an enterprise AI engineering platform from development to production, with an embedded Alyx - AI Engineering Agent
Phoenix — a lightweight, open-source project for tracing, prompt engineering, and evaluation
OpenInference — an open-source instrumentation package to trace LLM applications across models and frameworks
We log over 1 trillion inferences and spans, 10 million evaluation runs, and 2 million OSS downloads every month.
Arize AX Features
Iterate on prompts
Prompt playground - compare different prompts side by side
Prompt hub - manage and version your prompts in one place
Prompt builder - generate prompts with AI
Save as experiment - systematically A/B test prompts against large datasets
Run experiments
Curate datasets - create and update test datasets to measure performance
Track experiments - store every experiment run in a structured format
Evaluate experiments - systematically measure performance improvements based on LLM and code evaluations
CI/CD - gate deployment to production based on experiment performance
Trace your application
Setup tracing instrumentation - get instant visibility into your application traces
Find problematic traces - use our search and filter capabilities to find outliers of poor performance
Run quick evaluations - determine the causes of poor performance across hundreds of spans
Evaluate performance
Evaluate production data - run evals continuously against your data
Track key metrics - create custom dashboards to monitor performance
Get alerts - get alerts when performance deviates from the norm
Guardrail bad outputs - prevent poor performing outputs from reaching users
Annotate your outputs - use labeling queues to run evals and annotate your spans in one place
Build AI with AI
AI Search - find patterns in your data
Create Custom Evaluations - write tailored evals based on custom criteria
Diagnose RAG Issues - analyze your document retrieval and suggest improvements
Span Chat - analyze and evaluate any span in chat
Dashboard Generator - generate dashboard widgets with natural language
Optimize Prompts - get suggested prompt edits based on best practices
Next Steps
Check out a comprehensive list of example notebooks for agents, RAG, voice, tracing, evals, and more.
See our video deep dives on the latest papers in AI.
Join the Arize Slack community to ask questions, share findings, provide feedback, and connect with other developers.
Last updated
Was this helpful?

