- Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingdatadistributedGeneric distributed-related topicGeneric distributed-related topic
Description
Bug description
I use a Lightning Datamodule. In this module I initialize (according to your
tutorials a torch dataset:
class CustomImageDataset(Dataset): # Torch dataset to handle basic file operations class DataModule(L.LightningDataModule): # Lightning DataModule to handle dataloaders and train/test split dset = CustomImageDataset() In most cases it works perfectly fine, but sometimes I get an error when initializing my training, which forces me to start it again until the bug does not appear anymore. This only happens in distributed training.
It happens when I read in my dataset in the CustomImageDataset() by using a csv reader. The error is:
train.py 74 <module> mydata.setup(stage="fit") dataset.py 206 setup self.train_set = self.create_dataset("train") dataset.py 190 create_dataset dset = CustomImageDataset(self.data_dir, dataset.py 50 __init__ self.data_paths, self.targets = self._load_data() dataset.py 59 _load_data paths, targets = get_paths(self.data_dir, "train", self.seed) dataset.py 22 get_paths r = list(reader) _csv.Error: line contains NUL Since the list conversion seems to trigger the bug I am bit lost on how to solve it, but maybe you guys already stumbled upon it.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please Environment
Current environment
#- PyTorch Lightning Version (e.g., 1.5.0): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdatadistributedGeneric distributed-related topicGeneric distributed-related topic