- Notifications
You must be signed in to change notification settings - Fork 31.5k
Closed
Labels
Description
🐛 Bug
When I try to run T5 from the latest transformers version (and also from the most recent git version) on the GPU, I get the following error:
Traceback (most recent call last): File "T5_example.py", line 32, in <module> outputs = model(input_ids=input_ids) File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 780, in forward File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 616, in forward encoder_decoder_position_bias=encoder_decoder_position_bias, File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 422, in forward self_attention_outputs = self.layer[0]( File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 373, in forward attention_output = self.SelfAttention( File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 338, in forward raise ValueError("No position_bias provided and no weights to compute position_bias") File "/home/reimers/sbert/transformers/src/transformers/modeling_t5.py", line 289, in compute_bias values = self.relative_attention_bias(rp_bucket) File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/reimers/anaconda3/envs/sbert/lib/python3.7/site-packages/torch/nn/functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select This is the example code to reproduce the problem:
from transformers import T5Model, T5Tokenizer import torch tokenizer = T5Tokenizer.from_pretrained('t5-small') model = T5Model.from_pretrained('t5-small') model = model.to('cuda') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute"), device='cuda').unsqueeze(0) outputs = model(input_ids=input_ids) last_hidden_states = outputs[0] The error is in the file modeling_t5.py at line 284-289:
rp_bucket = self._relative_position_bucket( relative_position, # shape (qlen, klen) bidirectional=not self.is_decoder, num_buckets=self.relative_attention_num_buckets, ) values = self.relative_attention_bias(rp_bucket) # shape (qlen, klen, num_heads) rp_bucket is a tensor on the CPU, which causes the above error.
If I move rp_bucket to the GPU, the code works correctly on the GPU:
rp_bucket = self._relative_position_bucket( relative_position, # shape (qlen, klen) bidirectional=not self.is_decoder, num_buckets=self.relative_attention_num_buckets, ) rp_bucket = rp_bucket.to('cuda') #Dirty quick fix values = self.relative_attention_bias(rp_bucket) # shape (qlen, klen, num_heads) I'm not sure why rp_bucket is on the CPU.