Skip to content

Conversation

@Aravind-11
Copy link
Contributor

What does this PR do?

This PR adds a skip_post_init parameter to PreTrainedModel.from_pretrained() to allow users to skip the post-initialization step that reinitializes model weights. This is essential for users who subclass models with custom parameters and want to preserve their initialization.

When users subclass HuggingFace models (e.g., Qwen2VLForConditionalGeneration) and add custom parameters initialized in their __init__(), the initialization gets silently overwritten by post_init() which calls _init_weights() .

Fixes #42418

Before submitting

Who can review?

@ArthurZucker @Cyrilvallez

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty tho I think we want to be a bit more careful in designing this!
We need tests that's for sure, but yes we need a good api to make sure users are not supprised.

For example -> based on the classes that are in the public init only.
cc @Cyrilvallez wdyt

@Cyrilvallez
Copy link
Member

Hey @Aravind-11 ! The thing is that post_init is responsible for more than weight initialization, so we cannot simply skip it. Also, adding yet another flag is not a good idea IMO. Custom models should instead do

def _init_weights(self, module): ...

to skip weight initialization, instead of

self.skip_post_init = True

It's not more verbose, and avoid adding a flag and missing the rest of the post_init operations.

However, we could indeed think of something so that it's the default from remote code models. But in general we cannot be sure if people will rely on it or not, so a bit hard to decide. What's sure it that this used to be the default, so could make sense to go back to it for remote code

@Aravind-11
Copy link
Contributor Author

That makes a lot of sense @Cyrilvallez ! Thank you for clarifying!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants