How To Implement DCGAN To Generate Synthetic Samples?

Introduction

Generative Adversarial Networks, commonly called GAN’s, are an architecture for training deep learning models to generate samples that match a given distribution. Most often, GAN’s are used for the generation of images.

GAN’s demonstrate in exceptional ability to learn diverse patterns given an input dataset such that they can generate samples that could have plausibly been real. This proves to be quite useful for generating synthetic training samples in some machine learning problems.

In this article, you will learn how to implement a powerful but approachable GAN using the DCGAN architecture. This includes learning how to build a DCGAN generator and discriminator, and how to setup the training loop.

Just want the code? View the complete DCGAN Implementation on Github here.

Article Overview

Brief Introduction to GAN’s

As I mentioned, Generative Adversarial Networks are a type of architecture for generative deep learning. This means the models generate new samples in the input distribution.

GAN architectures were first described in a 2014 paper by by Ian Goodfellow, et al. titled “Generative Adversarial Networks.”

The proposed architecture involves two neural networks that compete against each other in a structured manner in order to improve the quality of generated synthetic samples.

The generator model is responsible for generating synthetic samples that could reasonably be from the input distribution. The discriminator model is responsible for discriminating real samples from synthetic samples.

Flow of Data in a GAN
This image describes the flow of data within a GAN. It's content's are described below.
This is the flow of data within a basic Generative Adversarial Network

The generator network takes in a random 1-D vector of any length, generally 100. It produces an image using this random latent vector as the input. The image that is produced is of the same dimensions as the real images we are trying to mimic. The images produced by the generator are mixed with images from the real input distribution. Real images are labeled 1 and fake images are labeled 0. The batch of images is fed into the discriminator. For the discriminator, it is just a binary classification problem between the labels real and fake.

Update Step in a GAN
How gradients are back propagated for the update step in a GAN

These diagrams come directly from this CS230 Lecture at Stanford, which is a great resource to go more in-depth on GAN’s.

The image above demonstrates how we update the weights in the discriminator and the generator after the training step. The discriminator updates using binary crossentropy loss and back propagation. The input to the discriminator, x, depends on the input to the generator (in this diagram, Z). This means we can backpropagate all the way back to the input of the generator, as shown by the red arrow.

This is just a quick overview of generative adversarial networks intended as a review. If you need more background, I’ve included some resources below to help you gain a deep understanding of the GAN architecture.

Now, let’s get into some more specifics and take a look at the architecture of DCGAN.

DCGAN Architecture Review

Deep Convolutional Generative Adversarial Networks, or DCGAN, is a standardized approach to building GAN’s that is able to produce more stable output. It was proposed by Alec Radford, et al. in the 2015 paper titled “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks“.

DCGAN is not the most advanced generative method when compared to architectures like StyleGAN and BigGAN, which can produce images of much higher quality. However, DCGAN is the easiest practical GAN to understand end-to-end. By understanding how to implement DCGAN from scratch, you will gain strong foundational knowledge about GAN’s.

First, let’s look at the generator architecture for DCGAN.

The first layer takes in our latent vector and projects it to a small spatial extent convolutional representation. Then a series of fractionally strided convolutions (labeled as Deconv) occur which project the convolutional representation into a 64 x 64 x 3 image. Additional fractionally strided convolutions can be added to upscale the output resolution.

Now, let’s take a look at the DCGAN architecture for the discriminator.

The discriminator has an input layer of the same size as the output of the generator (i64 x 64 x 3 in our example). Each block that is shown represents a strided convolutional layer. At the end, we flatten the output of the convolution and use a dense layer with a sigmoid activation function to produce the prediction. The output of the discriminator is between 0 and 1.

Major Architectural Features of DCGAN

  • Replace pooling layers with strided convolutions in discriminator and fractional-strided convolutions in generator
  • Batchnorm is used in both the generator and the discriminator.
  • Removing fully connected hidden layers for deeper architectures.
  • Use ReLU activation in generator for all layers except for the output, which uses tanh.
    • tanh is used because input images are scaled to the range [-1, 1] from the original range [0, 255]
  • Use LeakyReLU activation in the discriminator for all layer.

If you are still feeling a bit confused by the architecture, don’t worry. It will become more clear in the implementation below. It is helpful to reference these diagrams when looking at the code to identify each block.

Implement DCGAN from Scratch in Python

With an overview of the DCGAN architecture, we can now implement our GAN from scratch in Python.

The first thing to implement is our DCGAN generator.

How to write a DCGAN Generator network?

Our generator takes in a latent vector of size 100 as the input. The latent vector is randomly populated from a normal gaussian distribution. At first the latent vector is meaningless but as the generator learns the features of the input dataset, the latent vector and the latent space it is sampled from will represent compressed representation of the output space.

This means that the latent vector is a representation that the generator can take in to output a synthetic sample that fits into the input dataset distribution.

Let’s define a collection of fixed latent vectors that we can use to evaluate the quality of images produced by the generator. Since our latent vector will not change, we will be able to see how the generator learns the features of the dataset over many epochs of training.

noise_dim = 100

# You will reuse this static noise overtime (so it's easier)
# to visualize progress in the animated GIF)
static_noise = np.random.normal(0, 1, size=(100, noise_dim))

DCGAN Generator Input Layer

Our static noise is a collection of 16 latent vectors, each with a size of 100. Now we can define the input to our generator to accept a latent vector.

generator = tf.keras.Sequential()
d = 4
generator.add(tf.keras.layers.Dense(d*d*256, kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02), input_dim=noise_dim))

We take in our latent vector in to a dense layer with 4096 nodes. This is the only dense layer in our generator and we use a large number of nodes so that it can be reshaped into many parallel interpretations of our input vector. We effectively create multiple parallel versions of the final output image, each with its own interpretation of the input dataset. At the end, these different learned feature maps will be collapsed into a single output image.

Now, we can add the activation layer and our reshape layer to convert our 4096 nodes into 256 filters of size 4×4.

generator.add(tf.keras.layers.LeakyReLU(0.2))
# 4x4x256
generator.add(tf.keras.layers.Reshape((d, d, 256)))

DCGAN Generator Upscaling Layer

With the input layers done, we can move on to the core blocks of the generator. These blocks utilize Conv2DTranspose in order to upsample our low resolution representations of the output image. Upsampling means we are increasing the resolution.

The Conv2DTranspose layer utilizes a stride of (2×2) which means that it will quadruple the area of the input feature maps (double their width and height dimensions).

# 8x8x128
generator.add(tf.keras.layers.Conv2DTranspose(128, (4, 4), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
generator.add(tf.keras.layers.LeakyReLU(0.2))

Here we can see a single block which upscales our 4x4x256 representation to an 8x8x128 representation. This process is simply repeated until we reach our target output resolution. We use a LeakyRelu activation once again as this is the best practice for training GAN’s.

DCGAN Generator Output Layer

The output layer of our DCGAN generator model is a Conv2d layer with a kernel size of (3, 3) and 3 filters to represent the 3 color channels in an image. This final layer combines all our feature maps into a single output image. We utilize a tanh activation function so that the output pixel values belong to the range [-1, 1].

Remember, we scale our images from the real dataset to the range [-1, 1] from the original [0, 255] before they are used for training. So, we want the output to be in the same range and we can transform it back to the original [0, 255] to decode it.

generator.add(tf.keras.layers.Conv2D(3, (3, 3), padding='same', activation='tanh', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))

With our review of each of the parts covered, let’s take a look at all the pieces together.

Here is the complete DCGAN generator implementation:

def create_generator():
    generator = tf.keras.Sequential()
    
    # Input Layer
    d = 4
    generator.add(tf.keras.layers.Dense(d*d*256, kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02), input_dim=noise_dim))
    generator.add(tf.keras.layers.LeakyReLU(0.2))
    # 4x4x256
    generator.add(tf.keras.layers.Reshape((d, d, 256)))
    
    # Upscaling Layer
    # 8x8x128
    generator.add(tf.keras.layers.Conv2DTranspose(128, (4, 4), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    generator.add(tf.keras.layers.LeakyReLU(0.2))
    
    # Upscaling Layer
    # 16x16*128
    generator.add(tf.keras.layers.Conv2DTranspose(128, (4, 4), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    generator.add(tf.keras.layers.LeakyReLU(0.2))
    
    # Upscaling Layer
    # 32x32x128
    generator.add(tf.keras.layers.Conv2DTranspose(128, (4, 4), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    generator.add(tf.keras.layers.LeakyReLU(0.2))

    # Upscaling Layer
    # 64x64x128
    generator.add(tf.keras.layers.Conv2DTranspose(128, (4, 4), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    generator.add(tf.keras.layers.LeakyReLU(0.2))
    
    # Output Layer
    # 64x64x3
    generator.add(tf.keras.layers.Conv2D(3, (3, 3), padding='same', activation='tanh', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    
    generator.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(0.0002, 0.5))
    return generator

We can now instantiate our generator and get the summary for a better overview of the architecture.

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 4096)              413696    
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 4096)              0         
_________________________________________________________________
reshape (Reshape)            (None, 4, 4, 256)         0         
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 8, 8, 128)         524416    
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 16, 16, 128)       262272    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 32, 32, 128)       262272    
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 32, 32, 128)       0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 64, 64, 128)       262272    
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 64, 64, 3)         3459      
=================================================================
Total params: 1,728,387
Trainable params: 1,728,387

Next, let’s take a look at how to generate a sample with our generator. Since the model has not been trained yet, we will simply see noise.

Generating an individual sample with DCGAN Generator

We start by making a single latent vector using a normal gaussian distribution. Then we get the output of the generator using the predict function.

temp_noise = np.random.normal(0, 1, size=(1, 100))
preds = generator.predict(temp_noise)

Now that we have the prediction, let’s turn it into an actual image that we can look at. First, we need to adjust the range of our pixel values from [-1, 1] to [0, 1]. This is done by adding 1.0 to each pixel and then dividing by 2.0.

Consider a pixel with the value -1.0. We add 1.0 to get 0.0 and then divide by 2.0, which leaves us with 0.0. This fits within the desired range.

Now consider a pixel with the value 1.0. Again by adding 1.0 and dividing by 2.0, we get 1.0, which is our desired upper bound.

After adjusting the range, we can display the image using matplotlib. The np.clip function is used to resolve any values that are still outside our desired range after we perform the range transformation.

image = ((image + 1) / 2)
plt.imshow(np.clip(image.reshape((64, 64, channels)), 0.0, 1.0))

The untrained generator outputs all grey images or random noise.

That’s where we will leave our generator for the time being. Next, let’s take a look at how to write our discriminator network.

How to write a DCGAN Discriminator network?

Our discriminator is comprised of Convolutional layers with an increasing number of filters, a kernel size of 3 and a stride of 2. After a series of these convolutional layers, there is a single node output layer that uses the sigmoid activation function. A dropout layer is used prior to this output node in order to reduce overfitting. Note that some implementations utilize a dropout layer between each convolutional layer.

We use Leaky ReLu as the activation function for our convolutional layers as we did in the generator as well. The model is compiled using binary crossentropy loss since we only have two classes. The optimizer that is used here is Adam with a learning rate of 0.0002 and a momentum of 0.5.

Here is the complete DCGAN Discriminator:

def create_discriminator():
    discriminator = tf.keras.Sequential()
    
    discriminator.add(tf.keras.layers.Conv2D(64, (3, 3), padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02), input_shape=(64, 64, 3)))
    discriminator.add(tf.keras.layers.LeakyReLU(0.2))
    
    discriminator.add(tf.keras.layers.Conv2D(128, (3, 3), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    discriminator.add(tf.keras.layers.LeakyReLU(0.2))
    
    discriminator.add(tf.keras.layers.Conv2D(128, (3, 3), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    discriminator.add(tf.keras.layers.LeakyReLU(0.2))
    
    discriminator.add(tf.keras.layers.Conv2D(256, (3, 3), strides=2, padding='same', kernel_initializer=tf.keras.initializers.RandomNormal(0, 0.02)))
    discriminator.add(tf.keras.layers.LeakyReLU(0.2))
    
    discriminator.add(tf.keras.layers.Flatten())
    discriminator.add(tf.keras.layers.Dropout(0.4))
    discriminator.add(tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(64, 64, 3)))
    
    discriminator.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(0.0002, 0.5))
    return discriminator

Again, the summary gives us a good overview of the architecture for the discriminator.

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_5 (Conv2D)            (None, 64, 64, 64)        1792      
_________________________________________________________________
leaky_re_lu_9 (LeakyReLU)    (None, 64, 64, 64)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 32, 32, 128)       73856     
_________________________________________________________________
leaky_re_lu_10 (LeakyReLU)   (None, 32, 32, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 16, 16, 128)       147584    
_________________________________________________________________
leaky_re_lu_11 (LeakyReLU)   (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 8, 8, 256)         295168    
_________________________________________________________________
leaky_re_lu_12 (LeakyReLU)   (None, 8, 8, 256)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 16384)             0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16384)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 16385     
=================================================================
Total params: 534,785
Trainable params: 534,785
Non-trainable params: 0
_________________________________________________________________

With our two component networks complete, we can combine the two models into our GAN architecture.

Combining the models into a GAN

To begin, instantiate both of the networks we just created.

discriminator = create_discriminator()
generator = create_generator()

The first thing to do is set trainable to false on our discriminator. This will prevent it from updating its weights independently. This doesn’t mean that the discriminator won’t learn though. We are going to add the discriminator and generator to another network as components. Our discriminator will be able to update its weights in the context of the GAN network, while the generator will be updated independently. This will allow us to use the generator outside the context of the GAN to actually produce synthetic samples.

discriminator.trainable = False

Then, we can actually define out GAN. We start by creating an input layer with the appropriate size for our latent vector. Next, we say that our generator will take in the input from this layer and it will output fake images. After this, we pass the fake generated images into the discriminator network. The discriminator acts as the output layer. Finally, we create our GAN with the input and output layers and compile it with binary crossentropy loss, the same as was used for the discriminator.

# Link the two models to create the GAN
gan_input = tf.keras.Input(shape=(noise_dim,))
fake_image = generator(gan_input)

gan_output = discriminator(fake_image)

gan = tf.keras.Model(gan_input, gan_output)
gan.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(0.0002, 0.5))

Wow! We have just created a GAN following the DCGAN architecture! There is just a few more things to do in order to get it training.

We need a utility function so that we can display a batch of synthetic samples.

Displaying the Outputted Synthetic Samples

First, we will define the output path for the images, and specify the number of channels in our images. I am using color images so I have 3 channels. This utility function will take in the static noise vector and output an image with 100 synthetic samples.

channels = 3
save_path = 'dcgan-images'

# Display images, and save them if the epoch number is specified
def show_images(noise, epoch=None):
    generated_images = generator.predict(noise)
    plt.figure(figsize=(10, 10))
    
    for i, image in enumerate(generated_images):
        plt.subplot(10, 10, i+1)
        if channels == 1:
            plt.imshow(np.clip(image.reshape((64, 64)), 0.0, 1.0), cmap='gray')
        else:
            image = ((image + 1) / 2)
            plt.imshow(np.clip(image.reshape((64, 64, channels)), 0.0, 1.0))
        plt.axis('off')
    
    plt.tight_layout()
    
    if epoch != None:
        plt.savefig(f'{save_path}/gan-images_epoch-{epoch}.png')

With this, we can now visualize the output of our generator and evaluate the quality of the synthetic samples through visual inspection. Here is an example of the output of the function above:

This function let’s us visually inspect the output of the generator.

I have include the image from my very first epoch of training the GAN. This function is super helpful for understanding how your generator is finding patterns and also evaluating when to stop training.

The final piece of the puzzle is writing the training loop.

Coding the Training Loop

The training loop involves generating a batch of synthetic samples from our generator and selecting a random batch of real samples from the input dataset. Then, we combine the fake and real samples into a minibatch and label them. Fake samples are labeled 0.9 and real samples are labeled 0.0. We use 0.9 instead of 1.0 in order to avoid overfitting. The minibatch is passed into the discriminator and the discriminator loss is computed. Finally, our GAN is trained using a batch of random latent vectors. Calling train_on_batch handles the update step (backpropagation) for us.

I also populate the batch_size and steps_per_epoch variables. The batch size can be whatever you want and the steps_per_epoch is calculated by dividing the total number of real samples in your input dataset by the batch_size.

temp_epochs = 50
batch_size = 16
steps_per_epoch = 548

for epoch in range(temp_epochs):
    for batch in range(steps_per_epoch):
        noise = np.random.normal(0, 1, size=(batch_size, noise_dim))
        real_x = train_images_art[np.random.randint(0, train_images_art.shape[0], size=batch_size)]

        fake_x = generator.predict(noise)

        x = np.concatenate((real_x, fake_x))

        disc_y = np.zeros(2*batch_size)
        disc_y[:batch_size] = 0.9

        d_loss = discriminator.train_on_batch(x, disc_y)

        y_gen = np.ones(batch_size)
        g_loss = gan.train_on_batch(noise, y_gen)

    print(f'Epoch: {epoch} \t Discriminator Loss: {d_loss} \t\t Generator Loss: {g_loss}')
    if epoch % 2 == 0:
        show_images(static_noise, epoch)

Important Note: You will need to replace all reference to train_images_art with your own numpy array containing your images.

If we run this block of code, the GAN will begin to train! Once training has completed, you can save all your weights to use the generator model in the future or resume training from where you left off.

discriminator.save('dcdiscriminator.h5')
generator.save('dcgenerator.h5')
gan.save('dcgan.h5')

Example Output after Training

I trained my GAN on a dataset of fine art images. After 100 epochs, this was the results. Note that if you don’t have a powerful GPU, this training will likely take many hours. I left my project running overnight and that was enough time for some decent results

100 Fake Fine Art Images generated by the DCGAN generator trained for 100 epochs

As you can see, some samples look pretty visually interesting and there is decent variety in each batch. Zooming in, you can get a better look at individual samples.

My goal was just to generate some unique artwork that looks cool. Training for longer could yield better results, but for my purpose this was good enough.

Converting Images in a Folder to a Dataset

In order to use the GAN to generate synthetic samples, you will have to load in your dataset. If you are using a custom dataset of images like me, this involves a little leg work.

My dataset is of fine art images, organized into individual folders by artist name. All these subfolders are in the ‘./images/images/*’ directory. This function loops through all the subfolders, grabs every image, resizes it to the proper dimensions, and converts it to an array using opencv2.

You can adapt this process for any group of images you would like to use in training your GAN.

train_images_art = []
for directory in glob.glob('./images/images/*'):
    for filename in glob.glob(directory + '/*'):
        image = cv2.imread(filename)
        image = cv2.resize(image, (64, 64))
        image = tf.keras.preprocessing.image.img_to_array(image)
        train_images_art.append(image)

train_images_art = np.array(train_images_art, dtype="float")
train_images_art = (train_images_art - 127.5) / 127.5

The last two lines convert the dataset to the float datatype and then scales the pixel values to the range [-1, 1]. This step is essential to the GAN working properly.

Conclusion

After reading this article, I hope you have a solid understanding of the following concepts:

  • What is Deep Convolutional Generative Adversarial Network (DCGAN) and what are the major architectural features?
  • How to write a DCGAN generator and discriminator network from scratch in Python?
  • How to train DCGAN and display the synthetic samples it creates?

If you find that the quality of samples generated by DCGAN is not sufficient for your application, please consider looking into Progressive Growing GANS and how to implement them.

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.