Complete Glossary of Keras Neural Network Layers (with Code)

Learn the purpose and instantiation for Core layers, Pooling layers, Preprocessing layers, etc.

Article Overview

Introduction

Deep learning isn’t easy to know about. Since the subject is relatively new and still in its developing phase, beginners often find it hard to find all the information they need in one place, especially about all the different types of neural network layers that exist. I know this since I went through this phase myself.

So, to help out the future aspirants, I decided to take it upon myself to make a glossary of all the neural network layers, along with the procedure of their instantiation in Keras.

So, if you need any information about what neural network layers are all about or how they work, feel free to go through this article! Let’s start.

What Are Neural Network Layers?

Layers are the basic building blocks of an artificial neural network. Each layer consists of a specific number of neurons and has a specific purpose. From a broader perspective, there are three basic types of neural network layers:

  • Input layer
  • Output layer
  • Hidden layer

The first layer that takes in the inputs to the neural network is referred to as the input layer and the last layer that produces the results for a given input is called the output layer. Every layer in between is referred to as a hidden layer since the user cannot and does not have to interact directly with it.

Each layer has a specific set of parameters associated with it:

  • Weights
  • Activation function
  • Bias

All these parameters contribute towards and output of each node and subsequently the inputs to the next layer. Here’s a graphical representation of how this works:

*Note: the unit step function is used only for demonstration purposes. Any activation function of choice could be used here.

Now that we have a brief overview of what the layers are, let’s dive into the specifics of each layer and its functions.

Core Layers

These are the most important neural network layers that will be used in almost all the networks, regardless of what the application is. You can think of them as the basic building blocks of every deep learning project.

I. Input Layer

The input layer is responsible for taking in the input to the neural network. It consists of several nodes and passes on its output to the hidden layers in front of it. The architecture of an input layer is quite straightforward since it doesn’t have any weights associated with it.

Here’s how simply you can make an input layer for a model.

 x = Input(shape=(32,))
 y = tf.square(x)  # This op will be treated like a layer
 model = Model(x, y) 

II. Dense Layer

The dense layer is the most common type of layer used in a neural network. Its basic characteristic is that all neurons in a dense layer receive inputs from all the neurons present in the previous layer, hence the name dense. Sometimes, it’s also referred to as the fully connected (FC) layer.

The layer accepts all kinds of data types and usually is used with a variety of activation functions. However, the most common types of activation functions used are ReLu or leaky-ReLu.

Here’s how it can be instantiated:

from keras.models import Sequential 
from keras.layers import Activation, Dense 
 
model = Sequential() 
layer_1 = Dense(16, input_shape = (8,)) 
model.add(layer_1) 

III. Embedding Layer

The embedding layer is a vital part of the neural network when you’re dealing with text. To understand what the embedding layer does, you need to know word embeddings first. Briefly, It’s a technique where similar types of words, that have similar meanings, have a similar representation. 

The embedding layer could be thought of as an improvement to the bag of words models often used in machine learning. Word embeddings could be used on text data and once it’s learned, you could use it in different neural networks. Let’s see how this layer works.

The layer takes in categorical input as a 1D array, with each document (word) represented in the form of a unique integer. The output from the layer is, however, a 2D array where each document inputted has a corresponding embedding with it. Note that if you don’t have labeled text data, you could use the Tokenizer API from Keras.

There are 3 arguments that you need to specify:

  • input_dim: the size of input vocabulary, or the no. of words.
  • output_dim: the size of the vector to represent embedded words.
  • input_length: length of the individual input documents.

Also, it’s not necessary to use it along with the rest of your deep learning network and you could use it independently as well to learn the weights using a given vocabulary. Also, if you need to skip this step as well and need to use some common words, a great option is to use Transfer Learning and use the pre-trained embedding models available.

Let’s see how the keras embedding layer can be instantiated:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(1000, 64, input_length=10))
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array.shape)

>>> (32, 10, 64) 

IV. Masking Layer

Sometimes during the processing of data, it’s common to have samples of varying lengths. However, to ensure consistency within the data samples, a mechanism known as padding is used. Padding assigns dummy values where the data is missing. However, we need to make sure that the padded values don’t affect our calculations, since they were just dummy values.

So, what masking essentially does is that it lets the model know what specific values are missing, to skip them while doing the calculations. However, a masking layer could be used without padding as well. Even if there’s no padding used, masking helps the model know what specific data points are missing.

Let’s take a look at how a masking layer can be used:

samples, timesteps, features = 32, 10, 8
inputs = np.random.random([samples, timesteps, features]).astype(np.float32)
 
inputs[:, 3, :] = 0.
inputs[:, 5, :] = 0.
 

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.,
                                   input_shape=(timesteps, features)))
model.add(tf.keras.layers.LSTM(32)) 

For this specific case, the time steps 3 and 5 will be skipped from the LSTM calculation.

V. Lambda Layer

If you’re familiar with Python, there are chances you know what Lambda functions are – they are used to transform an input value to a certain output value using a specific function. It could be as simple as multiplying the input by two.

Lambda layers follow the same intuition and whatever input is passed onto them, they simply apply a function to the input and give the output. Below given is an example to see how they’re instantiated:

model.add(Lambda(lambda x: x ** 2)) 

That’s how easy it is to introduce a lambda layer into your neural network! However, it’s pretty unlikely that you’ll want to make a function as short as the above one. Most of the times, functions are written separately to ensure code readability. Let’s see another example where a bit longer function is used to transform data in the lambda layer.

def antirectifier(x):
    x -= K.mean(x, axis=1, keepdims=True)
    x = K.l2_normalize(x, axis=1)
    pos = K.relu(x)
    neg = K.relu(-x)

    return K.concatenate([pos, neg], axis=1)
 
model.add(Lambda(antirectifier)) 

There you go! Just like that, you can add any function you require into the lambda layer. It could be as complex as you require, or as simple as a mere line. As long as it fulfills your requirements, you’re good to go.

Pooling Layers

Pooling is a very important concept whenever we talk about Convolutional Neural Networks, or more commonly known as CNNs. The pooling layer helps to down-sample the feature maps. This is achieved by effectively summarizing the features in patches. Not only does it solve the problem of CNNs being very sensitive to feature locations in the input map, but it also drastically reduces the computational resources required to learn the parameters.

There are two major categories of Pooling used in practical neural networks:

  • Average Pooling: Down-sample using average per patch.
  • Max Pooling: Down-sample using the maximum value in each patch.

An important thing to note here is that there are no learnable parameters in the pooling layer, and it merely serves the purpose of reducing the dimensions of the tensor.

According to the tensor dimensions, there are a lot of Pooling classes provided by Keras. For the sake of example, we’ll be considering max-pooling in 2D. However, feel free to dive in further here.

This is how the MaxPooling2D layer can be initialized:

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), input_shape=(4, 4, 1)))
model.compile('adam', 'mean_squared_error')

model.predict(input_image, steps=1) 

You can set the input_image variable according to your use case and change the parameters as you require, and the pooling layer is all set to roll.

Convolutional Layer

Convolutional layers are the basic building blocks of CNNs. The task of a convolutional layer is nothing but applying filters onto the input passed onto it. Once the output of the layer is calculated, an activation function is applied, and the results are passed on to the successive layer.

If there are multiple filters applied to the input image, the resulting image is referred to as a feature map, which detects different features present in the input image.

This demonstrates a single filter (yellow) and the resulting convolved feature it produces.

The filters used in the convolutional layers are of varying types. They could be as simple as line detectors. The main aim, however, of a convolutional layer is to learn all the parameters within the context of the problem we’re dealing with.

Now Keras provides a lot of different classes for convolutional layers depending upon the requirements and the dimensions of the input tensors. The details can be found here on official docs.

Let’s initialize a two-dimensional convolutional layer as an example and see how things work out.

# The inputs are 28x28 RGB images with `channels_last` and the batch  
# size is 4.
  
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(2, 3, activation='relu', input_shape=input_shape[1:])(x)

print(y.shape)
>>> (4, 26, 26, 2) 

That’s a very basic example of how you can use the convolutional layers but when you study them in-depth, you’ll figure out there are actually a lot of parameters that you can play around with and set according to your requirements.

Preprocessing Layers

Preprocessing is the foremost step that’s done after collecting the data and before data normalization. From cleaning the data thoroughly to applying any data transformations required, preprocessing takes care of everything. In a nutshell, it turns the raw, unstructured data into useful information that can be processed by the neural network.

Since preprocessing is done on raw data, it differs for applications. So, there are different preprocessing layers you’d need for different purposes. Let’s move on and see what layers Keras provides:

I. Text Preprocessing

This consists of the textvectorization layer which is very helpful when you have raw text data available. The layer preprocesses the samples inputted to it in the form of strings in the following way to finally convert them into a form that the mode can comprehend:

  1. Standardizing the sample (lowercase)
  2. Splitting into substrings (breaking down into words)
  3. Recombining into tokens (ngrams)
  4. Indexing
  5. Transform using the Index

That’s all. Let’s see a practical example to instantiate a textvectorization layer using a list of random words:

vocab_data = ["earth", "wind", "and", "fire"]
max_len = 4  # Sequence length to pad the outputs to.

# Create the layer, passing the vocab directly. You can also pass the  
# vocabulary arg a path to a file containing one vocabulary word per  
# line.
  
vectorize_layer = TextVectorization(
  max_tokens=max_features,
  output_mode='int',
  output_sequence_length=max_len,
  vocabulary=vocab_data)

# Because we've passed the vocabulary directly, we don't need to adapt  
# the layer - the vocabulary is already set. The vocabulary contains the # padding token ('') and OOV token ('[UNK]') as well as the passed tokens.
  
vectorize_layer.get_vocabulary()
>>> ['', '[UNK]', 'earth', 'wind', 'and', 'fire'] 

II. Numerical Preprocessing

When you have numerical data at hand, the task isn’t complex. Since the data is already in a form that can be fed to the network, all you have to do is normalize the data so there are no unusually high or low values affecting the model training. This is achieved by the preprocessing normalization layer.

You can simply feed the layer your raw data and it will attempt to normalize it with a mean or 0 and a standard deviation of 1. Let’s see an example of how to set the layer up in Keras.

input_data = np.array([[1.], [2.], [3.]], np.float32)
layer = Normalization(mean=3., variance=2.)
layer(input_data)

>>> <tf.Tensor: shape=(3, 1), dtype=float32, 
     numpy= array([[-1.4142135 ],
                   [-0.70710677],
                   [ 0.        ]], dtype=float32)> 

As shown above, you can pass in your required mean and variance values when instantiating the normalization layer.

Keras provides another useful preprocessing layer for numerical data called the discretization layer. If you have continuous data but your project requires a discrete set of values, the discretization layer can be quite helpful for you. Feel free to dive in here to know more about it.

III. Categorical Preprocessing

Just like the text data, categorical data also cannot be used by the neural network, and we need to preprocess it and encode it in some way before passing it on to the training phase. Keras provides a CategoryEncoding layer built-in for this very purpose.

The class structure is as follows:

 
tf.keras.layers.experimental.preprocessing.CategoryEncoding(
     num_tokens=None, output_mode="multi_hot", sparse=False, **kwargs
 ) 

The output_mode parameter lets you decide what type of encoding you want to use. There are three types of encodings that you can choose from, namely:

  • one-hot encoding
  • multi-hot encoding
  • count encoding

Here’s an example that uses one-hot encoding:

layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
          num_tokens=4, output_mode="one_hot")
layer([3, 2, 0, 1])

>>>  <tf.Tensor: shape=(4, 4), dtype=float32, numpy=
       array([[0., 0., 0., 1.],
              [0., 0., 1., 0.],
              [1., 0., 0., 0.],
              [0., 1., 0., 0.]], dtype=float32)> 

IV. Image Preprocessing

Last but not least, preprocessing images is another very important topic to cover, especially if you’re interested in using CNNs for object detection. Keras offers three different image-preprocessing classes that you can use to transform your images as you like:

  1. Resizing layer: Used to resize the input tensors. Also lets you change the aspect_ratio of the image.
  2. Rescaling layer: To rescale the image to a required level. Preserves the scale of the input and keeps it the same in the output.
  3. CenterCrop layer: Used to crop the central portion of the image to the dimensions of your requirement.

Note: Except for the Rescaling layer, the input and output tensors should be 4D.

Normalization Layers

In the context of deep learning, normalization is the process that helps prepare the data before it’s used for training a neural network. Most of the time, the data collected comes from real-life scenarios, and hence the spread is a lot. This causes a lot of unnecessary noise and inconsistencies in the model performance since a small set of values has a huge effect on the model.

To deal with it, normalization is used, and a uniform scale is deployed for the numerical values present. However, the process ensures that there isn’t any loss of information and the range on which the values are spread out isn’t affected as well.

There are two normalization layers present in Keras that we’ll be taking a look at.

I. Batch Normalization Layer

Batch normalization is the most used normalization technique, especially in the case of CNNs. It’s used when the neural network is trained in the form of mini-batches. This essentially means that instead of passing a single training example in the network and then backpropagating the error, we pass multiple examples in the form of batches.

Batch normalization is applied on the activations on all the neurons in the batch to make the output close to Gaussian distribution, i.e., mean close to 0 and standard deviation close to 1. There are some additional advantages of using batch normalization, such as:

  • Improves the training time of the network
  • Has a regularization effect overall
  • Reduces the effect of weight initialization

Here’s the class structure of batch normalization:

tf.keras.layers.BatchNormalization(
     axis=-1,
     momentum=0.99,
     epsilon=0.001,
     center=True,
     scale=True,
     beta_initializer="zeros",
     gamma_initializer="ones",
     moving_mean_initializer="zeros",
     moving_variance_initializer="ones",
     beta_regularizer=None,
     gamma_regularizer=None,
     beta_constraint=None,
     gamma_constraint=None,
     **kwargs
 ) 

You can add a batch normalization layer with only a single line of code:

model = Sequential()
model.add(BatchNormalization()) 

II. Layer Normalization Layer

Layer normalization follows pretty much an opposite approach to batch normalization. In fact, it was designed to address the issues that were experienced with batch normalization such as being dependent on the batch size.

In layer normalization, normalization is applied independent of the batch size and each neuron is normalized for a single instance throughout all the channels. This way, whatever the batch size is, the results stay the same. The idea, however, is the same, to make the distribution as close to gaussian as possible.

While this technique doesn’t sit well with CNNs as batch normalization does, it has some benefits associated with it:

  • Much better with RNNs (Recurrent Neural Network)
  • No effect of batch size

The layer normalization class in Keras is as follows:

 
tf.keras.layers.LayerNormalization(
     axis=-1,
     epsilon=0.001,
     center=True,
     scale=True,
     beta_initializer="zeros",
     gamma_initializer="ones",
     beta_regularizer=None,
     gamma_regularizer=None,
     beta_constraint=None,
     gamma_constraint=None,
     **kwargs
 ) 

Just like batch normalization, it’s very convenient to add a layer normalization layer into your network:

 model_lay = tf.keras.models.Sequential([
 tf.keras.layers.LayerNormalization(axis=3 , center=True , scale=True)
 ]) 

Regularization Layers

Dropout Layer

Dropout is probably the most used regularization method in Deep Learning, mainly due to its impressive results and easy interpretation. If you’re not familiar with how dropout works, here’s a brief introduction to give you an overview.

The dropout layer randomly sets some inputs units as zero during the training phase, in order to keep the model from getting too complex, hence ensuring generalization. There is a parameter rate that decided how many values are to be set to zero, basically the probability.

Note that since rate is the probability of a unit being set to zero, it should be between 0 and 1.

Here’s an instance of making a dropout layer:

tf.random.set_seed(0)
layer = tf.keras.layers.Dropout(.2, input_shape=(2,))
data = np.arange(10).reshape(5, 2).astype(np.float32)
print(data)
>>>  [[0. 1.]
      [2. 3.]
      [4. 5.]
      [6. 7.]
      [8. 9.]]

outputs = layer(data, training=True)
print(outputs)
>>>  tf.Tensor(
      [[ 0.    1.25]
       [ 2.5   3.75]
       [ 5.    6.25]
       [ 7.5   8.75]
       [10.    0.  ]], shape=(5, 2), dtype=float32) 

As you can see, since the rate parameter was set to 0.2, only 2 of the 10 inputs were set to 0.

Reshaping Layers

Sometimes when you’re developing a neural network, you don’t get the output you wish for. In such a scenario, you can reshape whatever you have. Keras provides a comprehensive list of reshaping layers that you can use in different scenarios.

In this tutorial, we’ll be covering the following reshaping layers since they’re used the most:

I. Reshape Layer

The reshape layer lets you change the shape of any arbitrary input into your desired shape. However, you do need to know the input shape and provide it as a parameter to the layer. The output shape is (batch_size,) + target_shape.

Here’s a brief example of how this layer can be used:

 
model = tf.keras.Sequential()
model.add(tf.keras.layers.Reshape((3, 4), input_shape=(12,)))
model.output_shape

>>>  (None, 3, 4)

As you can see, we have converted a shape of (12,) into our desired shape of (3,4). The first value is None since the batch size here is None.

II. Flatten Layer

The purpose of this layer is fairly straightforward. Whatver input it gets, it just flattens it. So, it’s not affected by anything such as batch size. Let’s try inputting a multi-dimensional input to the flatten layer and see how the layer flattens it:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(64, 3, 3, input_shape=(3, 32, 32)))
model.add(Flatten())
model.output_shape

>>>  (None, 640) 

As you can see, the flatten layer has flattened the initial shape of (3,32,32) to (None, 640). If you’re wondering what None refers to it, it refers to the batch size.

Summary

Deep neural networks are composed of neural network layers and the data gets processed all the way from the input layer to the output layer, going through a lot of learnable parameters on its way. There is a variety of layers that are involved in this process and hence affect the data in different ways.

Throughout the article, we’ve taken a detailed look at the most important layers you need to know about if you’re going to train your neural networks. We have gone through the descriptions of each layer and how they modify the inputs and pass on the outputs to the successive layers. Moreover, we have seen instantiations examples for each layer in Keras.

So, make sure you go through the article in detail and if you want to explore further, don’t hesitate to pay a visit to the official docs of Keras.

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.