The three weeks of The Community Bonding Period have almost come to a close. It’s time to start to write some serious code and get some models running. My experience with this period was mostly some interaction on Slack along with writing code.
As I did not need many intricacies for the first two models under my proposal, I decided to begin work after talking to my mentor. The models that have been implemented till now are -
Let me explain the code flow and walk you through the impelmentation.
CycleGAN is an implementation of this paper.
The architecture is as follows :
Basically it learns the mapping between images from one domain to another. There is a generator and a discriminator, one each for the two domains. The problem is formulated as a normal GAN with constraints on the construction of images. We’ll go in more detail on this as we discuss the loss functions.
Here is the link to the code.
Loading The Dataset
We use the
apples2oranges dataset. Download and extract it to the
data directory. Let’s load our dataset.
function load_image(filename) img = load(filename) img = Float64.(channelview(img)) end function load_dataset(path,imsize) imgs =  for r in readdir(path) img_path = string(path,r) push!(imgs,load_image(img_path)) end reshape(hcat(imgs...),imsize,imsize,3,length(imgs)) end # Load the dataset dataA = load_dataset("../data/trainA/",256) |> gpu dataB = load_dataset("../data/trainB/",256) |> gpu
Building The Architectures
The paper uses the
UNet architecture for the generators and a simple sequentially downsampling discriminator. We avoid using
MaxPool layer to avoid sparse gradients during the GAN training.
Here is the reference code to follow along.
UNet is basically an encoder-decoder with skip connections in between. We define two modules, one for downsampling and one for upsampling :
# Convolution And Downsample ConvDown(in_chs,out_chs)... # Arguments are the input and output number of channels # Convolution And Upsample struct UNetUpBlock upsample conv_layer end UNetUpBlock(in_chs,out_chs)... # Arguments are the input and output nuumber of channels function(u::UNetUpBlock)(x,bridge) # Upsample -> concatenate [up(x),bridge] -> convolution end struct UNet conv_down_blocks # Convolve And Downsample conv_blocks # Convolve up_blocks # Upsample, concatenate, convolve end
The discriminator is a sequence of convolutions with strides such that the image is spatially halved at each step. A
sigmoid activation is appended at the end of the model to represent the probability that the input image is from the real distribution.
Writing The Loss Functions
The discriminator loss is the standard adversarial loss as in a normal GAN. It tires to classify the real image as real and the generated image as fake.
function dA_loss(a,b) """ a : Image in domain A b : Image in domain B """ # LABELS # real_labels = ones(1,BATCH_SIZE) |> gpu fake_labels = zeros(1,BATCH_SIZE) |> gpu fake_A = gen_B(b) # Fake image generated in domain A fake_A_prob = drop_first_two(dis_B(fake_A.data)) # Probability that generated image in domain A is real real_A_prob = drop_first_two(dis_B(a)) # Probability that original image in domain A is real dis_A_real_loss = ((real_A_prob .- real_labels).^2) dis_A_fake_loss = ((fake_A_prob .- fake_labels).^2) convert(Float32,0.5) * mean(dis_A_real_loss + dis_A_fake_loss) end
The generator loss is a bit more interesting. Apart from the standard adversarial loss, to enforce the constraints on the structure of the output, a reconstruction and an identity loss is enforced.
The model is designed for unpaired image to image translation. If only adversarial losses are used, a one-to-many mapping is possible for each image in domain A. One desires that only the relevant part of the input is translated. For instance, an image must only convert an apple to an orange while leaving the background unchanged. This constraint is enforced usnig the reconstruction loss. The generated domain B image when passed through the
B->A generator must reconstruct the input. The identity loss is also enforced to complement this process. It basically states that for an input image of domain A to the
B->A generator, the generator must behave like an identity function.
function g_loss(a,b) """ a : Image in domain A b : Image in domain B """ # LABELS # real_labels = ones(1,BATCH_SIZE) |> gpu fake_labels = zeros(1,BATCH_SIZE) |> gpu # Forward Propogation # fake_B = gen_A(a) # Fake image generated in domain B fake_B_prob = dis_B(fake_B) # Probability that generated image in domain B is real real_B_prob = dis_B(b) # Probability that original image in domain B is real fake_A = gen_B(b) # Fake image generated in domain A fake_A_prob = drop_first_two(dis_A(fake_A)) # Probability that generated image in domain A is real real_A_prob = drop_first_two(dis_A(a)) # Probability that original image in domain A is real # Reconstructions # rec_A = gen_B(fake_B) rec_B = gen_A(fake_A) ### Generator Losses ### # For domain A->B # gen_B_loss = mean((fake_B_prob .- real_labels).^2) # Adversarial loss rec_B_loss = mean(abs.(b .- rec_B)) # Reconstruction loss for domain B # For domain B->A # gen_A_loss = mean((fake_A_prob .- real_labels).^2) # Adversarial loss rec_A_loss = mean(abs.(a .- rec_A)) # Reconstrucion loss for domain A # Identity losses # gen_A should be identity if b is fed : ||gen_A(b) - b|| idt_A_loss = mean(abs.(gen_A(b) .- b)) # gen_B should be identity if a is fed : ||gen_B(a) - a|| idt_B_loss = mean(abs.(gen_B(a) .- a)) gen_A_loss + gen_B_loss + λ₁*rec_A_loss + λ₂*rec_B_loss + λid*(λ₁*idt_A_loss + λ₂*idt_B_loss) end
The model training is currently not complete as I did not have access to a good GPU machine until now. Work on that should start soon.
Sampling from the generator
After having trained our model, we would want to go out and convert an apple to an orange, right? We turn the generator network into
testmode which ensures that batchnorm and other layers use their inference time properties.
function sampleA2B(X_A_test) """ Samples new images in domain B X_A_test : N x C x H x W array - Test images in domain A """ testmode!(gen_A) X_A_test = norm(X_A_test) X_B_generated = cpu(denorm(gen_A(X_A_test |> gpu)).data) testmode!(gen_A,false) imgs =  s = size(X_B_generated) for i in size(X_B_generated)[end] push!(imgs,colorview(RGB,reshape(X_B_generated[:,:,:,i],3,s,s))) end imgs end function test() # load test data dataA = load_dataset("../data/trainA/",256)[:,:,:,1:2] |> gpu out = sampleA2B(dataA) for (i,img) in enumerate(out) save("../sample/A_$i.png",img) end end
That should complete all of the building blocks required to glue together a
CycleGAN model. Let’s now move onto the
Here is the link to the paper.
This model also solves the problem of image to image translation. However, the translations are paired up here. Thus each input image can correspond to only one output image. The concept is precisely a conditional GAN, with the input to the discriminator conditioned on the input image.
The code for data loading and the architectures are almost the same here as in
The difference lies in the loss function implementation. The discriminator loss is the general adversarial GAN loss with the input conditioned on the primary domain image. The generator loss consists of an adversarial loss along with a loss that weights proper reconstruction. It aims to minimise the difference between the generated output and the image in the target domain.
function d_loss(a,b) """ a : Image in domain A b : Image in domain B """ fake_B = gen(a |> gpu) fake_AB = cat(fake_B,a,dims=3) |> gpu fake_prob = dis(fake_AB) loss_D_fake = bce(fake_prob,fake_labels) real_AB = cat(b,a,dims=3) |> gpu real_prob = dis(real_AB) loss_D_real = bce(real_prob,real_labels) 0.5 * mean(loss_D_real .+ loss_D_fake) end function g_loss(a,b) """ a : Image in domain A b : Image in domain B """ fake_B = gen(a |> gpu) fake_AB = cat(fake_B,a,dims=3) |> gpu fake_prob = dis(fake_AB) loss_adv = mean(bce(fake_prob,real_labels)) loss_L1 = mean(abs.(fake_B .- b)) loss_adv + λ*loss_L1 end
The training intricacies are similar here as compared to the
The two weeks of the Community Bonding Period turned out to be a great experience. I was able to read up literature on GANs, the papers and write some code. Besides, I also set out to understanding the math behind
Policy Gradient Algorithms, which would turn out to be handy while debugging in the later stages of the coding period, wherein I shall be implementing some advanced algorithms of these types.