Skip to content

Conversation

@BiEchi
Copy link

@BiEchi BiEchi commented Nov 29, 2023

What does this PR do?

Implements the new model CRATE introduced in the paper White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker and @younesbelkada

@BiEchi
Copy link
Author

BiEchi commented Nov 29, 2023

Thanks a lot for reviewing this PR. As a reminder, the causal model is not yet supported, but I left the placeholder there.

@BiEchi BiEchi changed the title Add crate Add the CRATE (Coding RATE) model Nov 29, 2023
@BiEchi BiEchi changed the title Add the CRATE (Coding RATE) model Add the CRATE (Coding RATE) backbone model Nov 29, 2023
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks a lot for contributing! I would recommend you to start with sharing the model on the hub following this! Will be easier to integrate and see if it has traction in the community!

@BiEchi
Copy link
Author

BiEchi commented Nov 29, 2023

Thanks for this pointer @ArthurZucker . As the code is very similar to RoBERTa, I'm using transformers-cli add-new-model-like for initializing the code, which leads to these dependencies that are not directly convertable to a custom model:
image
Would you like to suggest on how to solve this?

@ArthurZucker
Copy link
Collaborator

the add-new-model-like is made for an addition in transformers. Would be nice of use to have a add-new-custom-model that creates what's needed! Adding this to my todo list 😉
The tutorial otherwise should help on adding the model on the hub rather than on the transformers library for now! 🤗

@BiEchi
Copy link
Author

BiEchi commented Nov 30, 2023

Thanks a lot for offering to help on this @ArthurZucker ! Just to keep us on the same page, our code is already runnable if we directly use pip install -e . on our new library, so there shouldn't be challenge merging into the transformers library. We choose to directly develop the model by changing the library to avoid writing any other scripts except the examples like run_glue.py.
By the way, we've already release the weights on the Hub at Jackbai/crate-base, and we can directly load using model.from_pretrained(). If we have to develop a custom model first, we're pretty happy to proceed on this though (if you think this is necessary).

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Dec 1, 2023

Okay! Thanks for explaining it looks like a good addition! Feel free to ping me if you need help integrating the model! 🤗 (on this PR and not as a custom model!) 🔥
fyi @amyeroberts

@BiEchi
Copy link
Author

BiEchi commented Jan 27, 2024

Hi @ArthurZucker , we're currently developing a sibling model (CRATE-GPT). The model proposed above is CRATE-BERT. Do we upload it here or we give a separate PR?

@ArthurZucker
Copy link
Collaborator

A separate PR is better, but as I stated before the best way to share it at first is to use custom models! 🤗 I would recommend you to make sure you upload safetensors checkpoints, and that you fill the model card to make sure people who discover it get what it is about!

@ArthurZucker
Copy link
Collaborator

Also you can easily push to hub if you use the PyTorchModelHubMixin class! 🤗 I can open a PR in your repo if you want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants