In the previous article, we discussed pix2pix which does similar image translations. In fact, the same people that worked on pix2pix developed CycleGAN to overcome problems in pix2pix. So, let’s first see what kind of problems exist in pix2pix, which is useful knowledge for us to better understand CycleGAN.
The Inconvenient Truth about Pix2Pix
It Needs Pairs of Images for Training
In pix2pix, it is possible to convert the contents of an image into a different style, which is called “image-to-image translation”. For example, you can generate a photo-like image from a sketch image. However, since pix2pix uses supervised learning, we must have a lot of pairs of images for training.
In the above example paired image sets,
x1 is paired with
x2 is paired with
y2, and so on. For an input (condition) image
xi, there must be a corresponding target (label) image
yi. We need lots of paired images to train a model that can robustly handle unseen input images. However, there aren't so many image-to-image translation datasets since it requires time and effort to prepare such datasets. Although it is a common issue in any supervised learning that we need to collect many labeled data, it is especially troublesome for image-to-image translation cases due to the need for paired images.
One-way Image Generation Training
In pix2pix, we train one generator network in one-way image generation. For example, let’s suppose that a generator translates from a black-and-white sketch into a colored image. If we want to perform a reverse image-to-image translation (from a colored image to a black-and-white image), we need to separately…