llm-testing

Here are 28 public repositories matching this topic...

raga-ai-hub / RagaAI-Catalyst

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

agents llmops agentneo ai-performance-optimization agentic-ai llm-testing ai-agent-monitoring llm-tracing ai-application-debugging agentic-ai-development ai-tool-interaction-monitoring ai-evaluation-tools

Updated Dec 23, 2025
Python

Pacific-AI-Corp / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Oct 25, 2025
Python

LLAMATOR-Core / llamator

Star

Framework for testing vulnerabilities of large language models (LLM).

Updated Sep 24, 2025
Python

Addepto / contextcheck

Star

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

open-source ci testing-tools chatbot-framework testing-framework chatbot-testing rag ai-chat large-language-models llm ai-testing llm-evaluation llm-evaluation-framework prompt-test llm-testing ai-testing-tool generative-ai-testing rag-testing summarization-testing

Updated Dec 11, 2024
Python

Free-AI-Things / g4f-working

Star

g4f-working is a daily-updated list of working no-auth AI providers and models from @xtekky/gpt4free. It helps developers, testers, and AI enthusiasts instantly find which models are currently online and accessible without any API keys, tokens, or cookies.

Updated Dec 23, 2025
Python

onerun-ai / onerun

Star

Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW

security ai simulation chatbot ai-agents ai-testing llm-testing chatbot-simulation

Updated Sep 15, 2025
Python

vincentkoc / tiny_qa_benchmark_pp

Sponsor

Star

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

benchmark evaluation dataset hacktoberfest smoke-test synthetic-data qa-dataset huggingface-datasets llm llmops litellm llm-testing tinybenchmarks

Updated Aug 25, 2025
Python

ModelPulse / BreakYourLLM

Star

Test your production LLMs and simulate users

testing ai production openai ai-safety llm llmops llm-testing

Updated Dec 11, 2024
Python

evalops / mocktopus

Star

🐙 Multi-armed mocks for LLM apps - Drop-in replacement for OpenAI/Anthropic APIs for deterministic testing

python testing continuous-integration mock-server evaluation pytest openai developer-tools testing-tools deterministic api-mocking openai-api llm anthropic function-calling llm-testing

Updated Sep 24, 2025
Python

JohnRitchie / qa-llm-guard

Star

python pytest allure testing-framework qa-automation llm-testing deepeval

Updated May 20, 2025
Python

ssilwal29 / api-ninja

Star

API Ninja simplifies API testing by allowing users to define test flows in plain English.

api-testing http-testing automated-api-test restapi-test llm-testing testing-with-ai pytest-api-test

Updated May 14, 2025
Python

yukincom / llm-SugarScape

Star

Multi-agent simulation using LLMs. Agents autonomously decide actions for survival, reproduction, and social behavior in a grid world.This project aims to replicate a paper published in 2025 (arXiv:2508.12920).

python simulation alignment agent-based-modeling grok sugarscape aisafety llm ai-testing llm-eval llm-evaluation llm-testing grok-api xai-api

Updated Nov 28, 2025
Python

alinaleo27 / ai-rag-eval-qa

Star

AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

ai-qa prompt-testing llm-evaluation rag-evaluation ragas llm-testing

Updated Nov 17, 2025
Python

lalitkpal / VerifyAI

Star

VerifyAI is a simple UI application to test GenAI outputs

ai-evaluation llm generative-ai genai llm-test llm-evaluation llm-evaluation-framework llm-evaluation-metrics llm-testing ai-metrics ai-evaluation-framework generative-ai-evaluation

Updated Sep 5, 2025
Python

ade1963 / poor_ai

Star

A lightweight, console-based Python tool for AI-driven code generation and project management. Optimized for resource-constrained systems, it supports multiple AI providers and self-regeneration.

python developer-tools llm-testing ai-driven-development

Updated Jun 27, 2025
Python

vladimir22700 / agent-monitor

Star

📊 Monitor AI agents in real-time with Agent Monitor. Gain visibility, track costs, and debug effectively for enhanced performance and reliability.

notifications automation twitter ai monitoring huginn feed openai agents hacktoberfest twitter-streaming zorka agentops llm llm-evaluation llm-testing ai-application-debugging agentic-ai-development

Updated Dec 24, 2025
Python

RahulMK22 / llmtest

Star

🚀 Comprehensive testing framework for LLM applications with semantic assertions, multi-provider support, RAG testing, and prompt optimization. Test AI the right way!

python testing machine-learning test-automation artificial-intelligence pytest openai gpt claude semantic-testing rag llm prompt-engineering langchain ai-testing anthropic llm-framework llm-testing

Updated Dec 13, 2025
Python

PinguChileno / onerun

Star

🤖 Simulate realistic conversations to test and improve your AI agents, generating evaluation datasets and automating QA for reliable performance.

security ai simulation chatbot ai-agents ai-testing llm-testing chatbot-simulation

Updated Dec 24, 2025
Python

sandy-sp / ai-reply-index

Star

A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.

open-data community-project ai-prompts prompt-database llm-testing ai-responses

Updated May 24, 2025
Python

borisveis / LLMTesting

Star

LLM Testing with gpt4all

gpt4all llm-testing

Updated Mar 15, 2025
Python

Improve this page

Add a description, image, and links to the llm-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-testing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-testing

Here are 28 public repositories matching this topic...

raga-ai-hub / RagaAI-Catalyst

Pacific-AI-Corp / langtest

LLAMATOR-Core / llamator

Addepto / contextcheck

Free-AI-Things / g4f-working

onerun-ai / onerun

vincentkoc / tiny_qa_benchmark_pp

ModelPulse / BreakYourLLM

evalops / mocktopus

JohnRitchie / qa-llm-guard

ssilwal29 / api-ninja

yukincom / llm-SugarScape

alinaleo27 / ai-rag-eval-qa

lalitkpal / VerifyAI

ade1963 / poor_ai

vladimir22700 / agent-monitor

RahulMK22 / llmtest

PinguChileno / onerun

sandy-sp / ai-reply-index

borisveis / LLMTesting

Improve this page

Add this topic to your repo