Skip to content

Sometimes I get Dataset Errors when using the lightning module in a distributed manor #20088

@asusdisciple

Description

@asusdisciple

Bug description

I use a Lightning Datamodule. In this module I initialize (according to your
tutorials a torch dataset:

class CustomImageDataset(Dataset): # Torch dataset to handle basic file operations 
class DataModule(L.LightningDataModule): # Lightning DataModule to handle dataloaders and train/test split dset = CustomImageDataset() 

In most cases it works perfectly fine, but sometimes I get an error when initializing my training, which forces me to start it again until the bug does not appear anymore. This only happens in distributed training.

It happens when I read in my dataset in the CustomImageDataset() by using a csv reader. The error is:

train.py 74 <module> mydata.setup(stage="fit") dataset.py 206 setup self.train_set = self.create_dataset("train") dataset.py 190 create_dataset dset = CustomImageDataset(self.data_dir, dataset.py 50 __init__ self.data_paths, self.targets = self._load_data() dataset.py 59 _load_data paths, targets = get_paths(self.data_dir, "train", self.seed) dataset.py 22 get_paths r = list(reader) _csv.Error: line contains NUL 

Since the list conversion seems to trigger the bug I am bit lost on how to solve it, but maybe you guys already stumbled upon it.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please 

Environment

Current environment
#- PyTorch Lightning Version (e.g., 1.5.0): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): 

More info

No response

cc @justusschock

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdatadistributedGeneric distributed-related topic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions