From the course: AI Workshop: Hands-on with GANs with Deep Convolutional Networks

Types of convolutional layers

Before we look at the architecture of any of the adversaries, the generator, or the discriminator, let's understand the different types of convolutional layers. There is the standard convolutional layer. This is where a learnable filter is slid over the input image to generate a feature representation of that image. Then we have the deconvolutional layer which does the opposite of the standard convolutional layer. And finally, we have the transposed convolutional layer used for upsampling the input image. Now the standard convolutional layer is one that we are familiar with. This is where a sliding kernel is applied to the input, an elementwise multiplication is performed using the weights of the kernel, and this allows the convolutional layer to extract hierarchical representations from the underlying image. And we've seen the basics of a convolutional layer in a previous movie. The deconvolution layer does the opposite of a convolution layer. It reverses the standard convolution layer's operation. It takes in a feature map as an input, and produces the original input that was used to generate the feature map. If you have a representation of the original image, a deconvolution layer will give you back the original image. And finally, we have the transposed convolutional layer. And this is important because this is what we are going to use in the generator. The objective of the transposed convolutional layer is to perform upsampling of input data. So you feed in a feature map, and this transposed convolutional layer will generate another feature map that has a spatial dimension greater than that of the input. Transposed convolutional layers also use learnable parameters, the filters to figure out how the input is transformed to the output, and here is a representation of how the transposed convolutional layer works. Here we have the input to the layer and the kernel applied to the input. The kernel is of size k x k. Now observe the two parameters here, s and p. These refer to the stride and padding of the transposed convolutional layer. But stride and padding here means something different from the stride and padding parameters that we saw in a regular convolutional layer. We use the stride and padding that we've chosen to compute new parameters. z = s - 1, p' = k - p - 1, where k refers to the size of the kernel, and s' is always set to 1, that is, the stride of the kernel as it slides over the input feature map representation. The parameter z, that we've computed as s - 1, is the number of zeros that you add to the input feature map between the individual rows and columns of that feature map. Here, z = 1 and you can see how I've added one row of zero padding represented in purple between each row and column of the input feature map. In addition to this, you add p' number of zeros around the input feature map. So this is additional padding around the edges. You then slide the kernel over the input feature map, the stride of this kernel, both in the horizontal and vertical direction is always equal to one. What you get at the result is an output image or feature map representation that is upsampled from the input. So this upsampling was performed using the trainable parameters in the kernel, and the dimensions of the output upsampled map can be computed using this formula. s is the stride that we had chosen, k is the size of the kernel, and p is the original padding that we had chosen. Once again, s and p, the stride and padding that I refer to here don't mean the same thing as the parameters in a standard convolutional layer. Those parameters are represented by variables, s' and p', that we computed from this original stride and padding.

Contents