Persist eval scores #58

jordanhunt22 · 2025-03-12T22:17:14Z

Adds Convex code for persisting results for a test run in a deployment. This includes authentication so that only we can update the scores. I tested this locally to make sure it works correctly.

For this to work properly, I need to set CONVEX_AUTH_TOKEN and CONVEX_EVAL_ENDPOINT in CI.

This enables us to have external places (e.g. the website) be able to read the most recent eval scores.

Currently, this only stores the most recent scores, but maybe we want to store the scores for every run.

jordanhunt22 and others added 6 commits March 12, 2025 14:18

add reporter

eab7b4f

add persistence

f1b9a31

update methods

e39a3a0

convert to internal mutations

8370543

nits

d583d50

Delete evals/002-queries/021-intersection/answer/convex/README.md

2da8211

jordanhunt22 requested a review from sujayakar March 12, 2025 22:39

sujayakar approved these changes Mar 12, 2025

View reviewed changes

jordanhunt22 merged commit 2de26e0 into main Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persist eval scores #58

Persist eval scores #58

Uh oh!

jordanhunt22 commented Mar 12, 2025 •

edited

Loading

Labels

3 participants

Persist eval scores #58

Persist eval scores #58

Uh oh!

Conversation

jordanhunt22 commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

3 participants

jordanhunt22 commented Mar 12, 2025 •

edited

Loading