ashutosh1919
diff --git a/‎README.md‎
Lines changed: 6 additions & 4 deletions b/‎README.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎data2vec.ipynb‎
Lines changed: 119 additions & 0 deletions b/‎data2vec.ipynb‎
Lines changed: 119 additions & 0 deletions
diff --git a/‎datasets/imagenet/fetch_imagenet.sh‎
Lines changed: 2 additions & 2 deletions b/‎datasets/imagenet/fetch_imagenet.sh‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎installations.sh‎
Lines changed: 1 addition & 1 deletion b/‎installations.sh‎
Lines changed: 1 addition & 1 deletion
@@ -1,11 +1,12 @@
 # Data2Vec 2.0
 
-Data2Vec is self-supervised highly-efficient general framework to generate representations for vision, speech and text. This repository contains ready-to train [data2vec](https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec) ([arXiv](https://arxiv.org/abs/2202.03555)) implementation containing helper scripts to load, process & train the data.
+[Check out the original repo!](https://github.com/ashutosh1919/data2vec-pytorch)
 
+Data2Vec is self-supervised highly-efficient general framework to generate representations for vision, speech and text. This repository contains ready-to train [data2vec](https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec) ([arXiv](https://arxiv.org/abs/2202.03555)) implementation containing helper scripts to load, process & train the data.
 
 ## Run in a Free GPU powered Gradient Notebook
-[![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/ashutosh1919/data2vec-pytorch?machine=Free-GPU)
 
+[![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/gradient-ai/data2vec-pytorch?machine=Free-GPU)
 
 ## Setup
 
@@ -40,14 +41,14 @@ bash scripts/train_data2vec_multi_speech.sh
 
 Note that you may want to change some of the arguments in these task scripts based on your system. Since we have single GPU, the arg `distributed_training.distributed_world_size=1` for us which you can change based on your requirement.
 
-
 ## Original Code
 
 `data2vec` directory contains the original code taken from [fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec) repository. The code present in this directory is exactly same as the original code. We have only made changes in some of the config files corresponding to the tasks.
 
 ## Reference
 
 data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language -- https://arxiv.org/abs/2202.03555
+
 ```
 @article{DBLP:journals/corr/abs-2202-03555,
  author = {Alexei Baevski and
@@ -65,9 +66,10 @@ data2vec: A General Framework for Self-supervised Learning in Speech, Vision and
 ```
 
 Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language -- https://arxiv.org/abs/2212.07525
+
 ```
 @misc{baevski2022efficient,
- title={Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language}, 
+ title={Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language},
  author={Alexei Baevski and Arun Babu and Wei-Ning Hsu and Michael Auli},
  year={2022},
  eprint={2212.07525},
 
@@ -0,0 +1,119 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "!bash installations.sh\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2023-03-28T21:17:09.717578Z",
+ "iopub.status.busy": "2023-03-28T21:17:09.716884Z"
+ }
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Get model checkpoints"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "!mkdir models\n",
+ "%cd models\n",
+ "wget https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_imagenet_ft.pt ### ViT-B Imagenet-1k finetuned\n",
+ "wget https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_libri_960h.pt ### Librispeech finetuned 960 hour split\n",
+ "wget https://dl.fbaipublicfiles.com/fairseq/data2vec2/nlp_base.pt #### Base NLP\n",
+ "%cd ../"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Get data & train\n",
+ "\n",
+ "If you want to train the models yourself, you can run the cell below. \n",
+ "\n",
+ "This will take a long time to run, and requires downloading large datasets. "
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Downloads ImageNet and starts training data2vec_multi with it.\n",
+ "!bash scripts/train_data2vec_multi_image.sh\n",
+ "\n",
+ "# Downloads OpenWebText and starts training data2vec_multi with it.\n",
+ "!bash scripts/train_data2vec_multi_text.sh\n",
+ "\n",
+ "# Downloads LibriSpeech and starts training data2vec_multi with it.\n",
+ "!bash scripts/train_data2vec_multi_speech.sh"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Checkpoints & Future usage"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "import torch\n",
+ "from data2vec.models.data2vec2 import D2vModalitiesConfig\n",
+ "from data2vec.models.data2vec2 import Data2VecMultiConfig\n",
+ "from data2vec.models.data2vec2 import Data2VecMultiModel\n",
+ "from PIL import Image\n",
+ "CHECKPOINT_PATH = 'models/base_imagenet_ft.pt'\n",
+ "# Load checkpoint\n",
+ "ckpt = torch.load(CHECKPOINT_PATH)\n",
+ "\n",
+ "# Create config and load model\n",
+ "cfg = Data2VecMultiConfig()\n",
+ "model = Data2VecMultiModel(cfg, modalities=D2vModalitiesConfig.image)\n",
+ "model.load_state_dict(ckpt)\n",
+ "model.eval()\n",
+ "BATCHED_DATA_OBJECT = Image.open('assets/n01440764_tench.JPEG')\n",
+ "# Generating prediction from data\n",
+ "pred = model(BATCHED_DATA_OBJECT)"
+ ],
+ "outputs": [],
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
@@ -2,8 +2,8 @@
 
 # Run this script to fetch all the dataset
 
-# wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar
-# wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
+wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar
+wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
 
 train_tar="${1:-ILSVRC2012_img_train.tar}"
 val_tar="${2:-ILSVRC2012_img_val.tar}"
 
@@ -2,7 +2,7 @@
 pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1+cu116 torchtext==0.14.1 -f https://download.pytorch.org/whl/torch_stable.html
 
 # Installing lxml
-sudo apt-get install python-lxml
+sudo apt-get install python-lxml -y
 
 # Installing requirements.txt
 pip install -r requirements.txt