Community Bonding

Holiday's on a close, work nearing

The three weeks of The Community Bonding Period have almost come to a close. It’s time to start to write some serious code and get some models running. My experience with this period was mostly some interaction on Slack along with writing code.

As I did not need many intricacies for the first two models under my proposal, I decided to begin work after talking to my mentor. The models that have been implemented till now are - CycleGAN.jl and pix2pix.jl.

Let me explain the code flow and walk you through the impelmentation.

CycleGAN.jl

CycleGAN is an implementation of this paper.

The architecture is as follows : cyclegan-architecture

Basically it learns the mapping between images from one domain to another. There is a generator and a discriminator, one each for the two domains. The problem is formulated as a normal GAN with constraints on the construction of images. We’ll go in more detail on this as we discuss the loss functions.

Here is the link to the code.

Loading The Dataset

We use the apples2oranges dataset. Download and extract it to the data directory. Let’s load our dataset.

function load_image(filename)
    img = load(filename)
    img = Float64.(channelview(img))
end

function load_dataset(path,imsize)
   imgs = []
   for r in readdir(path)
        img_path = string(path,r)
        push!(imgs,load_image(img_path))
    end
    reshape(hcat(imgs...),imsize,imsize,3,length(imgs))
end

# Load the dataset
dataA = load_dataset("../data/trainA/",256) |> gpu
dataB = load_dataset("../data/trainB/",256) |> gpu

Building The Architectures

The paper uses the UNet architecture for the generators and a simple sequentially downsampling discriminator. We avoid using MaxPool layer to avoid sparse gradients during the GAN training.

Here is the reference code to follow along.

A UNet is basically an encoder-decoder with skip connections in between. We define two modules, one for downsampling and one for upsampling :

# Convolution And Downsample
ConvDown(in_chs,out_chs)... # Arguments are the input and output number of channels

# Convolution And Upsample
struct UNetUpBlock
    upsample
    conv_layer
end

UNetUpBlock(in_chs,out_chs)... # Arguments are the input and output nuumber of channels

function(u::UNetUpBlock)(x,bridge)
	# Upsample -> concatenate [up(x),bridge] -> convolution
end

struct UNet
    conv_down_blocks # Convolve And Downsample
    conv_blocks # Convolve
    up_blocks # Upsample, concatenate, convolve
end

The discriminator is a sequence of convolutions with strides such that the image is spatially halved at each step. A sigmoid activation is appended at the end of the model to represent the probability that the input image is from the real distribution.

Writing The Loss Functions

The discriminator loss is the standard adversarial loss as in a normal GAN. It tires to classify the real image as real and the generated image as fake.

function dA_loss(a,b)
    """
    a : Image in domain A
    b : Image in domain B
    """
    # LABELS #
    real_labels = ones(1,BATCH_SIZE) |> gpu
    fake_labels = zeros(1,BATCH_SIZE) |> gpu

    fake_A = gen_B(b) # Fake image generated in domain A
    fake_A_prob = drop_first_two(dis_B(fake_A.data)) # Probability that generated image in domain A is real
    real_A_prob = drop_first_two(dis_B(a)) # Probability that original image in domain A is real

    dis_A_real_loss = ((real_A_prob .- real_labels).^2)
    dis_A_fake_loss = ((fake_A_prob .- fake_labels).^2)
    convert(Float32,0.5) * mean(dis_A_real_loss + dis_A_fake_loss)
end

The generator loss is a bit more interesting. Apart from the standard adversarial loss, to enforce the constraints on the structure of the output, a reconstruction and an identity loss is enforced.

The model is designed for unpaired image to image translation. If only adversarial losses are used, a one-to-many mapping is possible for each image in domain A. One desires that only the relevant part of the input is translated. For instance, an image must only convert an apple to an orange while leaving the background unchanged. This constraint is enforced usnig the reconstruction loss. The generated domain B image when passed through the B->A generator must reconstruct the input. The identity loss is also enforced to complement this process. It basically states that for an input image of domain A to the B->A generator, the generator must behave like an identity function.

function g_loss(a,b)
    """
    a : Image in domain A
    b : Image in domain B
    """
    # LABELS #
    real_labels = ones(1,BATCH_SIZE) |> gpu
    fake_labels = zeros(1,BATCH_SIZE) |> gpu

    # Forward Propogation # 
    fake_B = gen_A(a) # Fake image generated in domain B
    fake_B_prob = dis_B(fake_B) # Probability that generated image in domain B is real
    real_B_prob = dis_B(b) # Probability that original image in domain B is real

    fake_A = gen_B(b) # Fake image generated in domain A
    fake_A_prob = drop_first_two(dis_A(fake_A)) # Probability that generated image in domain A is real
    real_A_prob = drop_first_two(dis_A(a)) # Probability that original image in domain A is real
    
    # Reconstructions #
    rec_A = gen_B(fake_B)
    rec_B = gen_A(fake_A)
    
    ### Generator Losses ###
    # For domain A->B  #
    gen_B_loss = mean((fake_B_prob .- real_labels).^2) # Adversarial loss
    rec_B_loss = mean(abs.(b .- rec_B)) # Reconstruction loss for domain B 
    
    # For domain B->A  #
    gen_A_loss = mean((fake_A_prob .- real_labels).^2) # Adversarial loss
    rec_A_loss = mean(abs.(a .- rec_A)) # Reconstrucion loss for domain A 

    # Identity losses 
    # gen_A should be identity if b is fed : ||gen_A(b) - b||
    idt_A_loss = mean(abs.(gen_A(b) .- b))
    # gen_B should be identity if a is fed : ||gen_B(a) - a||
    idt_B_loss = mean(abs.(gen_B(a) .- a))

    gen_A_loss + gen_B_loss + λ₁*rec_A_loss + λ₂*rec_B_loss  + λid*(λ₁*idt_A_loss + λ₂*idt_B_loss)
end

The model training is currently not complete as I did not have access to a good GPU machine until now. Work on that should start soon.

Sampling from the generator

After having trained our model, we would want to go out and convert an apple to an orange, right? We turn the generator network into testmode which ensures that batchnorm and other layers use their inference time properties.

function sampleA2B(X_A_test)
    """
    Samples new images in domain B
    X_A_test : N x C x H x W array - Test images in domain A
    """
    testmode!(gen_A)
    X_A_test = norm(X_A_test)
    X_B_generated = cpu(denorm(gen_A(X_A_test |> gpu)).data)
    testmode!(gen_A,false)
    imgs = []
    s = size(X_B_generated)
    for i in size(X_B_generated)[end]
       push!(imgs,colorview(RGB,reshape(X_B_generated[:,:,:,i],3,s[1],s[2])))
    end
    imgs
end

function test()
   # load test data
   dataA = load_dataset("../data/trainA/",256)[:,:,:,1:2] |> gpu
   out = sampleA2B(dataA)
   for (i,img) in enumerate(out)
        save("../sample/A_$i.png",img)
   end
end

That should complete all of the building blocks required to glue together a CycleGAN model. Let’s now move onto the pix2pix network.

pix2pix.jl

Here is the link to the paper.

This model also solves the problem of image to image translation. However, the translations are paired up here. Thus each input image can correspond to only one output image. The concept is precisely a conditional GAN, with the input to the discriminator conditioned on the input image.

The code for data loading and the architectures are almost the same here as in CycleGAN.jl.

The difference lies in the loss function implementation. The discriminator loss is the general adversarial GAN loss with the input conditioned on the primary domain image. The generator loss consists of an adversarial loss along with a loss that weights proper reconstruction. It aims to minimise the difference between the generated output and the image in the target domain.

function d_loss(a,b)
    """
    a : Image in domain A
    b : Image in domain B
    """
    fake_B = gen(a |> gpu)
    fake_AB = cat(fake_B,a,dims=3) |> gpu

    fake_prob = dis(fake_AB)
    loss_D_fake = bce(fake_prob,fake_labels)

    real_AB =  cat(b,a,dims=3) |> gpu
    real_prob = dis(real_AB)
    loss_D_real = bce(real_prob,real_labels)

    0.5 * mean(loss_D_real .+ loss_D_fake)
end

function g_loss(a,b)
    """
    a : Image in domain A
    b : Image in domain B
    """
    fake_B = gen(a |> gpu)
    fake_AB = cat(fake_B,a,dims=3) |> gpu

    fake_prob = dis(fake_AB)

    loss_adv = mean(bce(fake_prob,real_labels))
    loss_L1 = mean(abs.(fake_B .- b)) 
    loss_adv + λ*loss_L1
end

The training intricacies are similar here as compared to the CycleGAN model.

Conclusion

The two weeks of the Community Bonding Period turned out to be a great experience. I was able to read up literature on GANs, the papers and write some code. Besides, I also set out to understanding the math behind Policy Gradient Algorithms, which would turn out to be handy while debugging in the later stages of the coding period, wherein I shall be implementing some advanced algorithms of these types.