Transposed Convolutions (Deep Learning)
Convolutional Neural Networks are used for computer vision projects and can be used to automatically extract features from inputs like photos and videos. These neural networks employ so-called convolutional layers that convolve (slide) over the input image, try to detect patterns, and adapt weights accordingly during the training process — allowing learning to occur.
Sometimes, however, we want the opposite to happen: invert the output of a convolutional layer and reconstruct the original input. This is for example the case with autoencoders, where we use normal convolutions to learn an encoded state and subsequently decode them into the original inputs. If done successfully, the encoded state can be used as a lower-dimensional representation of our input data, for dimensionality reduction.
Transposed convolutional layers can be used for this purpose. Rather than performing interpolation, they learn a set of weights that can be used to reconstruct original inputs. They can be trained jointly with convolutional layers during the training process.
The below figure depicts transposed convolution:
Applications of transposed convolutions
1. Firstly, it is of course possible to perform up sampling operations with transposed convolutions. That is, when we have a smaller image, we can make it larger by applying transposed convolutions.
2. Secondly, transposed convolutions can be used in Generative Adversarial Networks. Those networks randomly generate a small matrix and use fractionally-stride convolutions (another name to describe transposed convolutions, but then perhaps in the relatively inefficient implementation of regular convolutions with fractional strides) to up sample them to true images. The weights that have been learnt in the process allow for the up sampling of the random noise to the actual image.
3. Thirdly, they’re also used in semantic segmentation, where we wish to classify each pixel of an image into a certain category. They work by generating predictions for intermediate results achieved with convolutions, subsequently up sampling those to find predictions for the originally-shaped input image.
4. Finally, and perhaps more recently, they are used in what is called a convolutional autoencoder. In those, convolutional layers are used to find an encoding for some input, i.e., a representation of the original input in much lower dimensionality. A clear example would be a radar image with a landmine and one without a landmine. For the latter, one could train an autoencoder to find a particular encoding. However, autoencoders also contain a decoding side: given some encoded input, it attempts to find the original output (byunsurprisingly, up sampling layers such as transposed convolutions). By comparing the original input image with the output image, it’s possible to identify whether the input belongs to a certain class. In our case, that would be landmine yes/no, and ‘yes’ would only be the case if the decoded encoding really looks like the original image, since then the landmine-trained encoding worked well.