Skip to content

CodeCutTech/dvc-demo

Repository files navigation

View the article

DVC Demo

A demonstration of Data Version Control (DVC) for managing ML pipelines and data versioning.

What is DVC?

DVC is an open-source version control system for machine learning projects. It helps you:

  • Version control large files, data sets, machine learning models, and metrics
  • Track ML experiments
  • Create reproducible ML pipelines
  • Collaborate with team members

Project Structure

. ├── data/ # Raw and processed data files │ └── raw.dvc # DVC file for raw data ├── src/ # Source code for data processing and model training ├── config/ # Configuration files ├── .dvc/ # DVC internal files ├── dvc.yaml # DVC pipeline definition ├── dvc.lock # DVC lock file for reproducible pipelines └── .dvcignore # Files/directories to be ignored by DVC 

Setup

  1. Install project dependencies using uv:
uv sync dvc
  1. Pull the data from remote storage:
dvc pull
  1. Run the pipeline to reproduce all stages:
dvc repro

Version Control

  • Track data files: dvc add <file>
  • Push data to remote storage: dvc push
  • Pull data from remote storage: dvc pull
  • Check status: dvc status

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages