A PyTorch-based implementation of various deep learning architectures for semantic segmentation of unstructured road scenes using the Indian Driving Dataset (IDD).
- Overview
- Dataset
- Model Architectures
- Architecture-Comparision
- Results
- Installation
- Usage
- Project Structure
This project implements and compares five popular deep learning architectures for semantic segmentation:
- FCN (Fully Convolutional Network)
- U-Net
- PSPNet
- LinkNet
- DeepLabV3+
The models are trained on the IDD-Lite dataset, which contains road scene images from Indian cities, annotated with 8 classes:
- Drivable area
- Non-drivable area
- Living things
- Vehicles
- Roadside objects
- Far objects
- Sky
- Miscellaneous
This project implements five deep learning architectures, each with its unique strengths for semantic segmentation:
graph LR subgraph VGG16_Backbone I[Input] --> C1[Conv Block 1] C1 --> C2[Conv Block 2] C2 --> C3[Conv Block 3] C3 --> C4[Conv Block 4] C4 --> C5[Conv Block 5] end subgraph FCN_Head C5 --> FC6[Conv 7x7] FC6 --> FC7[Conv 1x1] FC7 --> S1[Score] end subgraph Skip_Connections C4 --> S2[Score Pool4] C3 --> S3[Score Pool3] S1 --> U1[Upsample 2x] U1 --> F1[Fuse] S2 --> F1 F1 --> U2[Upsample 2x] U2 --> F2[Fuse] S3 --> F2 F2 --> U3[Upsample 8x] U3 --> O[Output] end style I fill:#f9f,stroke:#333 style O fill:#9ff,stroke:#333 The FCN architecture transforms traditional classification networks into fully convolutional networks for semantic segmentation. Key features:
- Based on VGG16 backbone
- Replaces fully connected layers with 1x1 convolutions
- Uses skip connections from earlier layers for fine-grained prediction
- Multi-scale prediction fusion for better segmentation details
graph TD subgraph Encoder I[Input Image] --> C1[Conv Block 1] C1 --> P1[MaxPool] P1 --> C2[Conv Block 2] C2 --> P2[MaxPool] P2 --> C3[Conv Block 3] C3 --> P3[MaxPool] P3 --> C4[Conv Block 4] end subgraph Bottleneck C4 --> B[Bottleneck] end subgraph Decoder B --> U1[UpConv 1] U1 --> D1[Conv Block 5] D1 --> U2[UpConv 2] U2 --> D2[Conv Block 6] D2 --> U3[UpConv 3] U3 --> D3[Conv Block 7] D3 --> O[Output] end %% Skip Connections C1 -.-> D3 C2 -.-> D2 C3 -.-> D1 style I fill:#f9f,stroke:#333 style O fill:#9ff,stroke:#333 style B fill:#ff9,stroke:#333 The U-Net architecture features a symmetric encoder-decoder structure that's particularly effective for detailed segmentation:
- Contracting path (encoder) captures context
- Expanding path (decoder) enables precise localization
- Skip connections transfer detailed features from encoder to decoder
- Particularly effective at preserving fine structural details
graph TD subgraph Encoder I[Input] --> E1[Encoder Block 1] E1 --> E2[Encoder Block 2] E2 --> E3[Encoder Block 3] E3 --> E4[Encoder Block 4] end subgraph Decoder E4 --> D4[Decoder Block 4] D4 --> D3[Decoder Block 3] D3 --> D2[Decoder Block 2] D2 --> D1[Decoder Block 1] end %% Skip Connections E1 -.-> D1 E2 -.-> D2 E3 -.-> D3 E4 -.-> D4 D1 --> F[Final Conv] F --> O[Output] style I fill:#f9f,stroke:#333 style O fill:#9ff,stroke:#333 LinkNet is designed for efficient semantic segmentation:
- Memory-efficient architecture with strong performance
- Direct connections between encoder and decoder blocks
- Residual connections for better gradient flow
- Lighter computational footprint compared to U-Net
- Ideal for real-time applications
graph TD subgraph Encoder I[Input] --> B[Backbone] B --> ASPP{ASPP Module} end subgraph ASPP_Module ASPP --> A1[1x1 Conv] ASPP --> A2[3x3 Rate 6] ASPP --> A3[3x3 Rate 12] ASPP --> A4[3x3 Rate 18] ASPP --> A5[Global Pool] end subgraph Decoder A1 & A2 & A3 & A4 & A5 --> C[Concat] C --> C1[Conv 1x1] B --> LF[Low-level Features] LF --> C2[Conv 1x1] C1 --> U1[Upsample 4x] U1 --> M[Merge] C2 --> M M --> U2[Upsample 4x] U2 --> O[Output] end style I fill:#f9f,stroke:#333 style O fill:#9ff,stroke:#333 DeepLabV3+ represents the state-of-the-art in semantic segmentation:
- Atrous Spatial Pyramid Pooling (ASPP) for multi-scale processing
- Multiple dilation rates (6, 12, 18) for broader receptive fields
- Encoder-decoder structure with ASPP module
- Fusion of low-level and high-level features
- Superior performance on boundary regions
| Architecture | Strengths | Best Use Cases | Memory Usage | Inference Speed |
|---|---|---|---|---|
| FCN | Simple, effective baseline | General segmentation | Medium | Fast |
| U-Net | Fine detail preservation | Medical imaging, detailed segmentation | High | Medium |
| LinkNet | Efficiency, good performance | Real-time applications | Low | Fast |
| DeepLabV3+ | State-of-the-art accuracy | High-accuracy requirements | High | Slow |
Model performance comparison on IDD-Lite dataset:
| Architecture | Training Set | Testing Set | Mean F1 Score |
|---|---|---|---|
| FCN | 0.9032 | 0.9034 | 0.687 |
| UNET | 0.8784 | 0.7406 | 0.586 |
| LINKNET | 0.9231 | 0.7579 | 0.750 |
| DEEPLABV3+ | 0.8040 | 0.7712 | 0.787 |
# Clone the repository git clone https://github.com/your-username/road-scene-segmentation.git cd road-scene-segmentation # Install dependencies pip install -e .- Python 3.7+
- PyTorch >= 1.9.0
- torchvision >= 0.10.0
- albumentations >= 1.0.3
- OpenCV
- NumPy
- Matplotlib
- tqdm
The project uses IDD-Lite dataset (~50MB). To set up the dataset:
python setup_data.pyThis will download and organize the IDD-Lite dataset in the correct directory structure.
To train a model:
python train.py --config config.yamlConfigure training parameters in config.yaml:
MODEL_TYPE: 'unet' # Options: 'fcn', 'unet', 'linknet', 'deeplabv3' BACKBONE: 'resnet34' NUM_CLASSES: 8 BATCH_SIZE: 16 EPOCHS: 100 LEARNING_RATE: 0.001To evaluate a trained model:
python evaluate.py --config config.yaml --model-path checkpoints/final_model.pthFor inference on a single image:
from segmentation import SegmentationConfig, UNet, Visualizer import cv2 # Initialize model and load weights config = SegmentationConfig(MODEL_TYPE='unet') model = UNet(config) model.load_checkpoint('checkpoints/final_model.pth') # Run inference image = cv2.imread('path/to/image.jpg') prediction = model.predict(image)├── segmentation/ │ ├── models/ │ │ ├── fcn.py │ │ ├── unet.py │ │ ├── linknet.py │ │ └── deeplabv3.py │ ├── config.py │ ├── dataset.py │ └── utils/ ├── train.py ├── evaluate.py ├── setup_data.py └── config.yaml 