Skip to content

Conversation

@jordanhunt22
Copy link
Collaborator

This means that all of our evals will use the same format! Tested that this works, and the performance of claude is very similar.

Comment on lines 36 to 40
{"role": "assistant", "content": [{"type": "text", "text": "<analysis>"}]},
# {"role": "assistant", "content": [{"type": "text", "text": "<analysis>"}]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant to delete this file

Comment on lines 20 to 21
with open("dist/anthropic_convex_rules.txt", "w") as f:
f.write(build_anthropic_rules())

with open("dist/openai_convex_rules.txt", "w") as f:
f.write(build_openai_rules())

# Generate MDC files with frontmatter
with open("dist/anthropic_convex_rules.mdc", "w") as f:
f.write(MDC_FRONTMATTER)
f.write(build_anthropic_rules())

with open("dist/openai_convex_rules.mdc", "w") as f:

# Generate rules using a very specific filename here to make it clear for AI usage what this is.
with open("dist/convex_rules.txt", "w") as f:
f.write(build_release_rules())

with open("dist/convex_rules.mdc", "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean deep links to the existing release will break? maybe we should keep pushing to both in case we decide to have different rules for each going forward

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have existing deeplinks to these somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm happy to just continue with the old format if there are some existing links we don't wanna break

Copy link
Contributor

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this, modulo changing the release names.
I do wonder if we'll be splitting the logic out again in the future, and if so what of this we'll need to revive from the dead, but that'd be easier to reintroduce than keep around

@jordanhunt22
Copy link
Collaborator Author

I'm ok with this, modulo changing the release names. I do wonder if we'll be splitting the logic out again in the future, and if so what of this we'll need to revive from the dead, but that'd be easier to reintroduce than keep around

the hope is that we don't have to split the logic out again. all the models besides claude use the same testing harness + rules. this way the maintenance burden is a bit lower.

@ianmacartney
Copy link
Contributor

ianmacartney commented Feb 15, 2025 via email

@ianmacartney
Copy link
Contributor

ianmacartney commented Feb 15, 2025 via email

@jordanhunt22 jordanhunt22 merged commit 1287f68 into main Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants