Code Release for Learning Answer Embeddings for Visual Question Answering. (CVPR 2018)
usage: train_v7w_embedding.py [-h] [--gpu_id GPU_ID] [--batch_size BATCH_SIZE] [--max_negative_answer MAX_NEGATIVE_ANSWER] [--answer_batch_size ANSWER_BATCH_SIZE] [--loss_temperature LOSS_TEMPERATURE] [--pretrained_model PRETRAINED_MODEL] [--context_embedding {SAN,BoW}] [--answer_embedding {BoW,RNN}] [--name NAME] optional arguments: -h, --help show this help message and exit --gpu_id GPU_ID --batch_size BATCH_SIZE --max_negative_answer MAX_NEGATIVE_ANSWER --answer_batch_size ANSWER_BATCH_SIZE --loss_temperature LOSS_TEMPERATURE --pretrained_model PRETRAINED_MODEL --context_embedding {SAN,BoW} --answer_embedding {BoW,RNN} --name NAME Please cite with the following bibtex if you are using any related resource of this repo for your research.
@inproceedings{hu2018learning, title={Learning Answer Embeddings for Visual Question Answering}, author={Hu, Hexiang and Chao, Wei-Lun and Sha, Fei}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, pages={5428--5436}, year={2018} } Part of this code uses components from pytorch-vqa and torchtext. We thank authors for releasing their code.
- Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets (qaVG website)
- Visual7W: Grounded Question Answering in Images (website)
- Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering website