Skip to content

LAB3: CUDA out of memory for Google Collab free plan #201

@NakedRaccoon

Description

@NakedRaccoon

I first tried running the lab using the free version of Google Collab, and received the CUDA out of memory error during training of the first model. The output is as follows. I eventually resolved this by purchasing computing units and using a better GPU, but I still think for the next year, maybe a smaller model that is less hardware demanding can be chosen so that we don't have to spend money on this.

I enjoyed all 3 labs though. Props to the lecturers and the TAs!

The capital of France is **Paris**. 🇫🇷 step 0 loss: 2.3113996982574463 /usr/local/lib/python3.12/dist-packages/jupyter_client/session.py:151: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). return datetime.utcnow().replace(tzinfo=utc) The capital of France is **Paris**. 🇫🇷 step 10 loss: 2.1250462532043457 The capital of France is **Paris**. 🇫🇷 step 20 loss: 1.6766815185546875 Top o' the mornin' now, me hearty! Ye want to know about the capital o' the grand old nation o' France, do ye? step 30 loss: 1.399606466293335 Top o' the mornin' to ye now! Ye want to know about the capital o' France, do ye? Well, listen up, me hearty step 40 loss: 1.5028210878372192 Top o' the mornin' to ye! Now, why, the capital o' France, ye ask? Why, it's Paris, that' step 50 loss: 1.5027029514312744 Top o' the mornin' to ye! Now, if ye're askin' about the capital o' France, well, that's Paris step 60 loss: 1.7211472988128662 Ah, me hearty! Ye want to know about the capital of France, do ye? Well, listen up, me lad! The capital of France is Paris step 70 loss: 1.5601969957351685 Ah, ye want to know about the capital of France, do ye? Well, listen here, the capital of France is Paris, you hear? So there step 80 loss: 1.6766023635864258 Top o' the mornin' to ye! Now, the capital o' France, ye ask? Well, listen up, me hearty. It's step 90 loss: 1.5294233560562134 Top o' the mornin' to ye now, me hearty! Ye want to know about the capital of ol' France, do ye? Why, it step 100 loss: 1.4099100828170776 Top o' the mornin' to ye now! The capital o' France, ye ask? Why, it be Paris, me hearty! Isn't step 110 loss: 1.4719858169555664 Top o' the mornin' to ye, me hearty! The capital o' France, ye ask? Why, it's Paris, sure as the step 120 loss: 1.362978219985962 Top o' the mornin' to ye, me hearty! Ye askin' about the capital o' France, well, let me tell ye, it step 130 loss: 1.4489622116088867 Top o' the mornin' to ye, me hearty! The capital o' France, ye ask? Why, it's Paris, that's step 140 loss: 1.4968031644821167 Top o' the mornin' to ye, me hearty! Now, the capital o' the fine Republic o' France as ye asked, why it' step 150 loss: 1.456976294517517 Top o' the mornin' to ye, me hearty! The grand ol' capital of France is Paris, now where's she be now? Ah step 160 loss: 1.6469882726669312 Top o' the mornin' to ye, me hearty! Ye want to know what the capital of France is? Why, why then, I'll step 170 loss: 1.4203708171844482 Top o' the mornin' to ye, me hearty! Ye want to know about the capital of France, do ye? Well, listen up, me step 180 loss: 1.6720683574676514 --------------------------------------------------------------------------- OutOfMemoryError Traceback (most recent call last) [/tmp/ipython-input-3093342917.py](https://localhost:8080/#) in <cell line: 0>() 1 # Call the train function to fine-tune the model! Hint: you'll start to see results after a few dozen steps. ----> 2 model = train(model, train_loader, tokenizer) # TODO 17 frames [/usr/local/lib/python3.12/dist-packages/transformers/models/gemma2/modeling_gemma2.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, cache_position, logits_to_keep, **kwargs) 564 logits = self.lm_head(hidden_states[:, slice_indices, :]) 565 if self.config.final_logit_softcapping is not None: --> 566 logits = logits / self.config.final_logit_softcapping 567 logits = torch.tanh(logits) 568 logits = logits * self.config.final_logit_softcapping OutOfMemoryError: CUDA out of memory. Tried to allocate 480.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 438.12 MiB is free. Process 9264 has 14.31 GiB memory in use. Of the allocated memory 13.62 GiB is allocated by PyTorch, with 28.00 MiB allocated in private pools (e.g., CUDA Graphs), and 514.04 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions