Add the CRATE (Coding RATE) backbone model #27759

BiEchi · 2023-11-29T14:56:46Z

What does this PR do?

Implements the new model CRATE introduced in the paper White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker and @younesbelkada

BiEchi · 2023-11-29T15:00:36Z

Thanks a lot for reviewing this PR. As a reminder, the causal model is not yet supported, but I left the placeholder there.

ArthurZucker

Hey! Thanks a lot for contributing! I would recommend you to start with sharing the model on the hub following this! Will be easier to integrate and see if it has traction in the community!

BiEchi · 2023-11-29T17:47:02Z

Thanks for this pointer @ArthurZucker . As the code is very similar to RoBERTa, I'm using transformers-cli add-new-model-like for initializing the code, which leads to these dependencies that are not directly convertable to a custom model:

Would you like to suggest on how to solve this?

ArthurZucker · 2023-11-30T06:36:22Z

the add-new-model-like is made for an addition in transformers. Would be nice of use to have a add-new-custom-model that creates what's needed! Adding this to my todo list 😉
The tutorial otherwise should help on adding the model on the hub rather than on the transformers library for now! 🤗

BiEchi · 2023-11-30T10:54:31Z

Thanks a lot for offering to help on this @ArthurZucker ! Just to keep us on the same page, our code is already runnable if we directly use pip install -e . on our new library, so there shouldn't be challenge merging into the transformers library. We choose to directly develop the model by changing the library to avoid writing any other scripts except the examples like run_glue.py.
By the way, we've already release the weights on the Hub at Jackbai/crate-base, and we can directly load using model.from_pretrained(). If we have to develop a custom model first, we're pretty happy to proceed on this though (if you think this is necessary).

ArthurZucker · 2023-12-01T08:31:51Z

Okay! Thanks for explaining it looks like a good addition! Feel free to ping me if you need help integrating the model! 🤗 (on this PR and not as a custom model!) 🔥
fyi @amyeroberts

BiEchi · 2024-01-27T17:31:57Z

Hi @ArthurZucker , we're currently developing a sibling model (CRATE-GPT). The model proposed above is CRATE-BERT. Do we upload it here or we give a separate PR?

ArthurZucker · 2024-01-30T09:03:38Z

A separate PR is better, but as I stated before the best way to share it at first is to use custom models! 🤗 I would recommend you to make sure you upload safetensors checkpoints, and that you fill the model card to make sure people who discover it get what it is about!

ArthurZucker · 2024-01-30T09:38:21Z

Also you can easily push to hub if you use the PyTorchModelHubMixin class! 🤗 I can open a PR in your repo if you want?

JackBaiIllinois added 4 commits November 29, 2023 19:46

add a new model crate

67e568e

add a new model crate

8297af7

no conversion needed

bd0ca6e

updated doc

ccba36e

BiEchi changed the title ~~Add crate~~ Add the CRATE (Coding RATE) model Nov 29, 2023

BiEchi changed the title ~~Add the CRATE (Coding RATE) model~~ Add the CRATE (Coding RATE) backbone model Nov 29, 2023

biechi added 2 commits November 29, 2023 21:18

cleaned codebase

00c81b5

added README

3348272

ArthurZucker reviewed Nov 29, 2023

View reviewed changes

ArthurZucker added the New model label Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the CRATE (Coding RATE) backbone model #27759

Add the CRATE (Coding RATE) backbone model #27759

Uh oh!

BiEchi commented Nov 29, 2023

BiEchi commented Nov 29, 2023

ArthurZucker left a comment

BiEchi commented Nov 29, 2023

ArthurZucker commented Nov 30, 2023

BiEchi commented Nov 30, 2023

ArthurZucker commented Dec 1, 2023 •

edited

Loading

BiEchi commented Jan 27, 2024

ArthurZucker commented Jan 30, 2024

ArthurZucker commented Jan 30, 2024

Labels

3 participants

Add the CRATE (Coding RATE) backbone model #27759

Are you sure you want to change the base?

Add the CRATE (Coding RATE) backbone model #27759

Uh oh!

Conversation

BiEchi commented Nov 29, 2023

What does this PR do?

Before submitting

Who can review?

BiEchi commented Nov 29, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

BiEchi commented Nov 29, 2023

ArthurZucker commented Nov 30, 2023

BiEchi commented Nov 30, 2023

ArthurZucker commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BiEchi commented Jan 27, 2024

ArthurZucker commented Jan 30, 2024

ArthurZucker commented Jan 30, 2024

Labels

3 participants

ArthurZucker commented Dec 1, 2023 •

edited

Loading