dalle mega #18152

patil-suraj · 2022-07-15T18:11:57Z

What does this PR do?

This PR adds the DalleMega model from dalle-mini for text-2-image generation.
The VQGAN model required for converting the tokens to image is in this PR #18150

override the sample method for classifier-free guidance.
port and upload weights on the hub
add tests
add docs
boom!

patil-suraj · 2022-07-28T17:02:09Z

src/transformers/models/dallemega/modeling_dallemega.py

+ heads.
+ """
+ last_hidden_state: torch.FloatTensor = None
+ last_hidden_state_unconditional: Optional[Tuple[torch.FloatTensor]] = None


last_hidden_state_unconditional is the unconditional encoder output required for superconditioning(guidance).
Added it here, so that it can be easily passed from encoder to decoder.

patil-suraj · 2022-07-28T17:04:08Z

src/transformers/models/dallemega/modeling_dallemega.py

+ if do_superconditioning:
+ input_ids_uncond = torch.ones(input_shape, dtype=torch.long, device=inputs_embeds.device) * self.config.pad_token_id
+ attention_mask_uncond = torch.zeros(input_shape, dtype=torch.long, device=inputs_embeds.device)
+
+ inputs_embeds_unconditional = self.embed_tokens(input_ids_uncond)
+
+ # concatenate the embeddings of the conditioned and unconditioned inputs
+ inputs_embeds = torch.cat([inputs_embeds, inputs_embeds_unconditional], dim=0)
+
+ # concatenate the attention masks of the conditioned and unconditioned inputs
+ # if attention_mask is None, create an all-ones mask
+ if attention_mask is None:
+ attention_mask = torch.ones(input_shape, dtype=torch.long, device=inputs_embeds.device)
+ attention_mask = torch.cat([attention_mask, attention_mask_uncond], dim=0)


here we extend the inputs_embeds with inputs_embeds_unconditional to get the unconditional hidden states.

Good for me!

patil-suraj · 2022-07-28T17:04:50Z

src/transformers/models/dallemega/modeling_dallemega.py

+ # filter out the unconditional hidden states from encoder_states
+ if do_superconditioning and output_hidden_states:
+ encoder_states = (state.chunk(2)[0] for state in encoder_states)
+
+ # filter out the unconditional attentions from all_attentions
+ if do_superconditioning and output_attentions:
+ all_attentions = (attn.chunk(2)[0] for attn in all_attentions)


filter out the unconditional hidden_states and attentions since we don't want to return those.

patil-suraj · 2022-07-28T17:05:28Z

src/transformers/models/dallemega/modeling_dallemega.py

+ hidden_states_uncond = None
+ if do_superconditioning:
+ hidden_states, hidden_states_uncond = hidden_states.chunk(2)


separate the last_hidden_states for conditional and unconditional inputs.

patil-suraj · 2022-07-28T17:06:24Z

src/transformers/models/dallemega/modeling_dallemega.py

+ if do_superconditioning:
+ encoder_hidden_states = torch.cat([encoder_hidden_states, encoder_hidden_states_uncond], dim=0)
+ input_ids = input_ids.repeat(2, 1)
+ attention_mask = attention_mask.repeat(2, 1)
+


The encoder_hidden_states_uncond will be passed from. DalleMegaModel. We concatenate those two here
and repeat the decoder inputs.

Cool! Maybe we can also raise a nice error message if the encoder_hidden_states_uncond are in the wrong format or None but do_superconditioning is True?

patil-suraj · 2022-07-28T17:06:59Z

src/transformers/models/dallemega/modeling_dallemega.py

+ input_ids=decoder_input_ids,
+ attention_mask=decoder_attention_mask,
+ encoder_hidden_states=encoder_outputs[0],
+ encodert_hidden_states_unconditional=encoder_outputs[1] if do_superconditioning else None,


pass the unconditional encoder hidden states to decoder.

patil-suraj · 2022-07-28T17:07:23Z

src/transformers/models/dallemega/modeling_dallemega.py

+ do_superconditioning = superconditioning_scale > 1 and (encoder_outputs.last_hidden_state_unconditional is not None) and (not self.training)
+ if do_superconditioning:
+ lm_logits, lm_logits_uncond = lm_logits.chunk(2)
+ lm_logits = lm_logits + superconditioning_scale * (lm_logits - lm_logits_uncond)


do the actual superconditioning.

Looks good to me! Just out of curiosity, the _scale it cannot be between 0 and 1?

patil-suraj · 2022-07-28T17:07:50Z

src/transformers/models/dallemega/modeling_dallemega.py

+ "decoder_head_mask": decoder_head_mask,
+ "cross_attn_head_mask": cross_attn_head_mask,
+ "use_cache": use_cache, # change this to avoid caching (presumably for debugging)
+ "superconditioning_scale": superconditioning_scale


this makes sure that we always pass superconditioning_scale to model forward.

patrickvonplaten · 2022-07-28T20:27:01Z

src/transformers/models/dallemega/modeling_dallemega.py

+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+ )
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+ superconditioning_scale = superconditioning_scale if superconditioning_scale is not None else self.config.superconditioning_scale


I would be in favor of not adding the superconditioning_scale parameter to the config as it's something the user would want to change during forward and a specific value is not necessarily attached to a trained model

patrickvonplaten · 2022-07-28T20:28:41Z

src/transformers/models/dallemega/modeling_dallemega.py

+ )
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+ superconditioning_scale = superconditioning_scale if superconditioning_scale is not None else self.config.superconditioning_scale


same comment here - let's not have a fallback to the config here

patrickvonplaten · 2022-07-28T20:31:38Z

src/transformers/models/dallemega/modeling_dallemega.py

+ def set_output_embeddings(self, new_embeddings):
+ self.lm_head = new_embeddings
+
+ def tie_weights(self):


Why is this here? can't this just use the standard tie_weights function in modeling_utils.py?

patrickvonplaten · 2022-07-28T20:33:34Z

src/transformers/models/dallemega/modeling_dallemega.py

+ def prepare_inputs_for_generation(
+ self,
+ decoder_input_ids,
+ superconditioning_scale=None,


Let's maybe not have it at the 2nd position but a bit later after the tensors? Also a type hint :int would be nice here

patrickvonplaten · 2022-08-23T18:05:54Z

@patil-suraj - I can take over the PR if you want :-)

github-actions · 2022-09-17T15:01:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patil-suraj added 20 commits July 15, 2022 20:11

begin dalle-mega

34fee0a

fixup

9e02ba5

fix layer names

710c39e

remove unused classes

3fa1ba1

custom layernorm

93b288f

no bias in final layernorm

695b62a

final_layer_norm -> final_layernorm

c065e43

fix default values

38816fa

fix GLUMLP

624cc2f

fix embed positions

87ae546

fix LayerNorm

225a9bf

fix bias in layernorm

0f5279b

add conversion script

d901ad3

better names

60fd431

adapt conversion script, style

515ec38

fix ocnverion script

332cd7d

make non slow tests pass

7775933

add superconditioning

047cff8

override generate and sample

19a6e55

support superconditioning in model forward

062c0a5

patil-suraj commented Jul 28, 2022

View reviewed changes

patrickvonplaten reviewed Jul 28, 2022

View reviewed changes

patil-suraj added 5 commits July 29, 2022 11:52

fix superconditioning

2239179

fix superconditioning

3815dbe

fix some tests

b02161b

begin integration tests

49c35b0

add tokenizer

62ad7da

github-actions bot closed this Sep 26, 2022

patrickvonplaten reopened this Sep 27, 2022

patrickvonplaten added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dalle mega #18152

dalle mega #18152

Uh oh!

patil-suraj commented Jul 15, 2022 •

edited

Loading

patil-suraj Jul 28, 2022

patil-suraj Jul 28, 2022

patrickvonplaten Jul 28, 2022

patil-suraj Jul 28, 2022

patil-suraj Jul 28, 2022

patil-suraj Jul 28, 2022

patrickvonplaten Jul 28, 2022

patil-suraj Jul 28, 2022

patil-suraj Jul 28, 2022

patrickvonplaten Jul 28, 2022

patil-suraj Jul 28, 2022

patrickvonplaten Jul 28, 2022

patrickvonplaten Jul 28, 2022

patrickvonplaten Jul 28, 2022

patrickvonplaten Jul 28, 2022

patrickvonplaten commented Aug 23, 2022

github-actions bot commented Sep 17, 2022

Labels

2 participants

dalle mega #18152

Are you sure you want to change the base?

dalle mega #18152

Uh oh!

Conversation

patil-suraj commented Jul 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Aug 23, 2022

github-actions bot commented Sep 17, 2022

Labels

2 participants

patil-suraj commented Jul 15, 2022 •

edited

Loading