VLM_case_study

This repo contains solutions for the sensmore interview case study

Installation

Prerequisites

Note that this repo is only tested on mac.

Ensure you have Python 3.10 installed. If not, you can install it using:

sudo apt update && sudo apt install python3.10 python3.10-venv python3.10-dev # Ubuntu/Debian brew install python@3.10 # macOS

Install `ffmpeg`

This project requires ffmpeg for media processing. Install it using:

brew install ffmpeg # macOS sudo apt install ffmpeg # Ubuntu/Debian

Setting Up the Virtual Environment

It's recommended to create a virtual environment:

python3.10 -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows (PowerShell)

Install Dependencies

Run the following command to install all required packages:

pip install --upgrade pip pip install -r requirements.txt

Now you're ready to use the project! 🚀

VLM Training Pipeline

Overview

This script orchestrates a pipeline for generating, processing, and training a Visual Language Model (VLM). It consists of five main steps: dataset generation, VQA pair generation, YOLO bounding box creation, VLM training, and VLM testing.

Steps

Dataset Generation (Optional)
- Downloads and processes video data.
- Functions: download_all_videos(), process_videos().
VQA Pair Generation (Optional)
- Generates Visual Question Answering (VQA) pairs using a VLM.
- Function: generate_vqa().
YOLO Bounding Box and Action Generation (Optional)
- Processes images to create YOLO bounding boxes and action labels.
- Function: process_images().
Train the VLM (Required)
- Trains the Visual Language Model.
- Function: train_vlm.train_model().
Test the VLM (Optional)
- Tests the trained VLM model.
- Function: test_vlm.test_model().
- Resulting images with correspoding QA are saved in the results folder.

Usage

Modify the boolean flags in the script to enable/disable specific steps:

GEN_DATA = False # Set to True to generate dataset GEN_VQA_PAIRS_USING_VLM = False # Set to True to generate VQA pairs GEN_YOLO_BBOXES_AND_ACTION = False # Set to True to generate YOLO bounding boxes TRAIN_VLM = True # Set to True to train the VLM TEST_VLM = False # Set to True to test the VLM

Run the script:

python main.py

Ensure all dependencies are installed before running the script.

Notes

The script will only execute the steps where the corresponding flags are set to True.
Training the VLM is a required step in this pipeline.
Modify the script as needed to suit your data and model requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data_generator		data_generator
results		results
utils		utils
visual_detection		visual_detection
vqa_generator		vqa_generator
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLM_case_study

Installation

Prerequisites

Install `ffmpeg`

Setting Up the Virtual Environment

Install Dependencies

VLM Training Pipeline

Overview

Steps

Usage

Notes

About

Uh oh!

Releases

Packages

Languages

ratulKabir/VLM_case_study

Folders and files

Latest commit

History

Repository files navigation

VLM_case_study

Installation

Prerequisites

Install ffmpeg

Setting Up the Virtual Environment

Install Dependencies

VLM Training Pipeline

Overview

Steps

Usage

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Install `ffmpeg`

Packages