Objectives :
The day has come to finalise this project. Overall I’m quite satisfied with the results I got. My main objective for the projects were :
- Get familiar and efficient with a Deep Learning library
- Get better at reading/understanding recent papers and reproduce results
- Understand GANs and successfully implement one
- implement conditional GANs
Overall I think I managed to achieve all of the above. I started this semester having no clue what Theano and Lasagne were. And now they are my go-to libraries for any DL stuff. After countless of failed tried, I managed to train a GAN, which is a victory in itself. Can’t wait to try GANs on other cool stuff.
Comments :
- using the captions. Even though there has been really cool papers on generating images from captions (namely this one), I really don’t think it’s feasible to expect similar results with our dataset. This is because MSCOCO is arguably the most complicated/diverse public image dataset. Therefore, there are only very few images of each object (e.g. banana, oven, surfer), so it’s hard to imagine a model successfully capturing all the complexities of the dataset. This is way I decided not to go down that path and focus only on the images.
- GPUs : it’s worth mentioning that having access to good GPUs is a key ingredient to training big and complex models like GANs. I did more progress on 2 weeks of TITAN X GPUs than on what was offered on the Hades cluster. I think TAs should have that in mind when going over the projects.
- Class blog. It was of great help 🙂
Further work :
I would have liked to try a WGAN for this project, especially the one from Improved training for Wasserstein GANs. The main issue I had with WGANs was how they enforced lipschitz continuity by clipping weights). But now, the paper mentioned above ensures lipschitz constraint by penalizing the norm of the critic’s gradient . I think WGANs just may be the way to go and this improved approach is a step in the right direction.