Ian Goodfellow: Generative Adversial Networks

Jessica YungData Science, Highlights, Talk ReviewsLeave a Comment

screenshot

The second talk I went to at AI WithTheBest 2016 was Ian Goodfellow’s talk on Generative Adversarial Networks (GANs), which he invented. Ian is a researcher at OpenAI.

GANs are generative models based on supervised learning and game theory. They learn to generate realistic samples and have mostly been used to generate images. For example, you can feed it images of cats as training data and it will give you new images of cats it constructed.

Ian told us how GANs worked at a high level, showed us some applications and described problems GANs faced.

I won’t describe how GANs work because I don’t want to butcher Ian’s explanation (might give an outline in a later post when I have a clearer understanding), but I will list the applications and problems Ian mentioned.

For those who know what Convolutional Neural Networks (CNN) are, note the difference between CNN that attempt to decrease the number of features are GANs which are generative and aim to increase the number of features through ‘deconvs’:

screenshot.png

Disclaimer: All such slides used in this post are Ian Goodfellow’s.

Applications: What GANs can do

The general application is generating images. Ian talked about two ways of generating images: (1) generating new images from a training set of images, and

screenshot.png

Disclaimer: All such slides used in this post are Ian Goodfellow’s.

(2) generating new images with text conditions based on a training set of labelled images.

screenshot
You may notice that the bottom row of  flowers generated are all very similar – the white and yellow flowers all have around the same number of petals. This is called mode collapse and we will mention this again later in the Problems section.
There is an additional interesting capability: the ability to take averages of images and add and subtract them.
screenshot
In the slide above, we have the ‘average code’ for the images of a man with glasses, a man and a woman in the training set. By subtracting the code of the ‘average man image’ from the code of the ‘average man with glasses image’ and adding  the code of the ‘average woman image’, the GAN generates pictures of a woman with glasses.
That’s freaking cool.
Ian pointed out this was similar to language models where word embeddings (mapping words to coordinates or vectors in some n-dimensional space) had interesting algebraic relationships, only more impressive. With word embeddings, if you take ‘Queen – Female + Male’, you get close to the word ‘King’.
It’s somewhat more impressive with images because you need to decode the vectors into an image and get something where the pixels mesh together into an image that makes sense, whereas with words you just pick the word that’s closest to where you end up.
Applications: Specific examples of GANs applied
Ian mentioned four applications:
1. Assisting artists in drawing realistic images (e.g. automated painting in Photoshop).
In the slide below, the artist drew only the black dotted line and was suggested more realistic pictures of a mountain.
screenshot.png
2. Taking low-resolution images to produce high-resolution ones.
This is a problem for generative models because it is under-constrained: there are many possible high-resolution images for each low-resolution image. GANs work well because they are specialised for making realistic outputs. The leftmost image is the original, and the rightmost one is the image produced by a GAN. It is less blurry than the bicubic model (second from left) and has a lower signal-to-noise ratio that the SRResNet (third from left).
screenshot.png
3. Speech synthesis.
DeepMind released WaveNet just last week. WaveNet generates decent samples but is slow because it has to generate each component of the output one step at a time (with 12 neural nets in sequence), taking two minutes to generate one second of audio. This prevents it from being able to speak in real time.
4. Generating new Pokemon from existing ones.
This works well because Pokemon can be eccentric and still ‘valid’.
screenshot.png
Problems

There are two classes of problems.

1. The images generated are realistic

The images generated are realistic, but this is because the generator is just outputting members of its training set over and over again. This is called mode collapse. So the model often isn’t generating new images.

2. The images generated are unrealistic.

screenshot.png

Ian showed us this slide with examples of some images a GAN generated. He pointed out two key problems: firstly, the generator sometimes overgeneralises and does not know which textures go with which animal. In the bottom left picture it put cow skin on what looks like a horse that is standing on four legs and hind legs. It did not know not to combine two-legged and four-legged animals.
Secondly, it’s common for images to lack 3D composition. In the bottom row, the picture third from the left contains a dog face and dog fur but they are not 3D – they are orthographically projected.
true-color-terramodis-satellite-image-of-the-earth-rendered-in-orthographic-B12YAM.jpg

An orthographic projection of the Earth. (Credits: Alamy stock)

When asked what the solution to the image texture problem was, Ian said they’d probably need to come up with a different algorithm or reformulate the game, and that decreasing the learning rate wasn’t sufficient. It’s a difficult research problem that people – himself included – were working on. Ian is most interested in improving training stability and also spends time working on new architectures.

Leave a Reply