Generative adversarial networks are a class of deep learning algorithms invented by Ian Goodfellow in 2014. Prominent machine learning researcher Yann LeCun also described GANs as the coolest idea in machine learning in the last 20 years. After its invention, there have been many different variants of the same idea (like Deep Convolutional GAN, Bidirectional GAN, Conditional GANs, Stack GAN, Info GAN, Discover Cross-Domain Relations with GANs, Style GAN, etc) were proposed as well. We'll be discussing the core idea behind GANs and it's applications in this article.
Generative Adversarial Networks are a type of generative model of deep learning which is based on the adversarial process for training neural networks involved in the training. GANs consist of two kinds of neural networks that work with each other in the adversarial process.
Both of these networks are trained simultaneously.
A generative neural network is responsible for generating new data from the distribution of training data in a way that discriminator can not differentiate that it has not come from training data. Generative Model is trained in a way that it's able to capture the distribution of training data and generate new data which looks like it has come from the distribution of training data hence fooling the discriminative model. Generative networks take as input random data and convert it to data coming from the distribution of training data. It's generally designed as a multi-layer perceptron or deconvolutional neural network.
A discriminative neural network is responsible for predicting whether output generated by the generator has actually come from the distribution of training data or not. It outputs the probability of sample generated by generator telling whether it belongs to the same distribution as training or not. Discriminator takes as output generated by the generator and tells whether it has come from training data or not. It's trained with original training samples as well so that it can better understand the difference between training samples. It's generally designed as a multi-layer perceptron or convolutional neural network.
The training process follows a framework that corresponds to the min-max two-player game. In this game discriminator tries it best to discriminate the image generated by generator from actual training sample and generator tries best to fool discriminator into believing that image indeed has come from training sample. Training stops when discriminator predicts a probability of 0.5 for each output generated by the generator. We can say that we have reached a stage where the generator has almost captured the distribution of training samples and can independently convert random data to look like the sample has come from training data. Both discriminator and generator can be declared multilayer perceptron and the whole system can be trained using backpropagation. The training process for both generator and discriminator happens simultaneously. We train discriminator during each epoch of total data to get maximum accuracy of predicting correct label whereas generator is trained in parallel to minimize its loss function by generating as accurate image as possible which can confuse discriminator in mislabeling it.
We'll now discuss few GAN variants in short which have become famous over time.
DCGANs are improved version of GANs which generate nicer images than normal GAN trained with multilayer perceptrons.DCGAN uses deconvolution layers in the design of the generator and convolution layers in the design of discriminator. They also can be used for style/theme transfer.
Conditional GANs use extra-label information passed to it to better generate images. Both generator and discriminator are conditioned on these extra given labels.
Stack GAN authors proposed a solution to generating high-quality images from text descriptions about these images. In stack GAN, models are conditioned on text description to generate images as text description as given as extra information.
It consists of a two-stage process where the first stage sketches primitive objects based on the text description and during the second stage it takes that primitive low-resolution image as well as text description as input and generates high-resolution images with realistic details.
InfoGANs can learn disentangled representations of data. InfoGAN was successful in disentangling writing styles from digit shapes on MNIST and pose from 3D images. It can also discover the presence/absence of glasses, hairstyles and emotions on CelebA face dataset.
Disco GANs recognize relations between data from different domains and then can style transfer it to different objects. It basically learns the style of one object and can then apply the same style/theme to another object.
There are still many more types of GANs. We have covered above just a few famous of them.
Below are some references which were referred to while creating the above article. All images were taken from research papers itself.
If you want to