llm-evaluation-metrics

Star

Here are 9 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

python evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 22, 2025
Python

locuslab / open-unlearning

Star

[NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods with easy feature extensibility.

open-source benchmarks right-to-be-forgotten privacy-protection unlearning membership-inference-attacks membership-inference llms llm-privacy llm-unlearning llm-evaluation-metrics

Updated Dec 5, 2025
Python

cvs-health / langfair

Star

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

python ai artificial-intelligence bias fairness ai-safety fairness-testing bias-detection fairness-ai fairness-ml responsible-ai ethical-ai large-language-models llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 15, 2025
Python

zhuohaoyu / KIEval

Star

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

machine-learning explainable-ai llm llm-evaluation llm-evaluation-toolkit llm-evaluation-framework llm-evaluation-metrics acl2024

Updated Jul 19, 2024
Python

ronniross / confidence-scorer

Sponsor

Star

Measure of estimated confidence for non-hallucinative nature of outputs generated by Transformer-based Language Models.

dataset datasets llm llms llm-training llm-evaluation llms-reasoning llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework llm-evaluation-metrics llms-efficency llms-evalution

Updated Dec 14, 2025
Python

Pavansomisetty21 / GEval-Metrics-Analyzing-the-Reliability-of-LLM-Responses

Sponsor

Star

In this we evaluate the LLM responses and find accuracy

llm-evaluation-metrics llm-evals geval

Updated Jul 8, 2025
Python

nhsengland / evalsense

Star

Tools for systematic large language model evaluations

evaluation-metrics evaluation-framework llm llms llm-evaluation llm-evaluation-toolkit llm-evaluation-framework llm-evaluation-metrics llm-benchmarking

Updated Dec 3, 2025
Python

lalitkpal / VerifyAI

Star

VerifyAI is a simple UI application to test GenAI outputs

ai-evaluation llm generative-ai genai llm-test llm-evaluation llm-evaluation-framework llm-evaluation-metrics llm-testing ai-metrics ai-evaluation-framework generative-ai-evaluation

Updated Sep 5, 2025
Python

ritwickbhargav80 / quick-llm-model-evaluations

Star

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

streamlit llms retrieval-augmented-generation llm-evaluation-metrics beyondllm

Updated Aug 29, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation-metrics

Here are 9 public repositories matching this topic...

confident-ai / deepeval

locuslab / open-unlearning

cvs-health / langfair

zhuohaoyu / KIEval

ronniross / confidence-scorer

Pavansomisetty21 / GEval-Metrics-Analyzing-the-Reliability-of-LLM-Responses

nhsengland / evalsense

lalitkpal / VerifyAI

ritwickbhargav80 / quick-llm-model-evaluations

Improve this page

Add this topic to your repo