This repo contains solutions for the sensmore interview case study
Note that this repo is only tested on mac.
Ensure you have Python 3.10 installed. If not, you can install it using:
sudo apt update && sudo apt install python3.10 python3.10-venv python3.10-dev # Ubuntu/Debian brew install python@3.10 # macOSThis project requires ffmpeg for media processing. Install it using:
brew install ffmpeg # macOS sudo apt install ffmpeg # Ubuntu/DebianIt's recommended to create a virtual environment:
python3.10 -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows (PowerShell)Run the following command to install all required packages:
pip install --upgrade pip pip install -r requirements.txtNow you're ready to use the project! 🚀
This script orchestrates a pipeline for generating, processing, and training a Visual Language Model (VLM). It consists of five main steps: dataset generation, VQA pair generation, YOLO bounding box creation, VLM training, and VLM testing.
-
Dataset Generation (Optional)
- Downloads and processes video data.
- Functions:
download_all_videos(),process_videos().
-
VQA Pair Generation (Optional)
- Generates Visual Question Answering (VQA) pairs using a VLM.
- Function:
generate_vqa().
-
YOLO Bounding Box and Action Generation (Optional)
- Processes images to create YOLO bounding boxes and action labels.
- Function:
process_images().
-
Train the VLM (Required)
- Trains the Visual Language Model.
- Function:
train_vlm.train_model().
-
Test the VLM (Optional)
- Tests the trained VLM model.
- Function:
test_vlm.test_model(). - Resulting images with correspoding QA are saved in the results folder.
Modify the boolean flags in the script to enable/disable specific steps:
GEN_DATA = False # Set to True to generate dataset GEN_VQA_PAIRS_USING_VLM = False # Set to True to generate VQA pairs GEN_YOLO_BBOXES_AND_ACTION = False # Set to True to generate YOLO bounding boxes TRAIN_VLM = True # Set to True to train the VLM TEST_VLM = False # Set to True to test the VLMRun the script:
python main.pyEnsure all dependencies are installed before running the script.
- The script will only execute the steps where the corresponding flags are set to
True. - Training the VLM is a required step in this pipeline.
- Modify the script as needed to suit your data and model requirements.