With the rapid development of deep learning, a whole range of neural network architectures have been created to solve a wide variety of tasks and problems. Although there are countless neural network architectures, here are 11 essential to know for any deep learning engineer, which are divided into four major categories: standard networks, recurrent networks, convolutional networks, and autoencoders.
Standard Network 1. Perceptron The perceptron is the most basic of all neural networks and is the basic building block of more complex neural networks. It simply connects an input unit to an output unit. 2. Feedforward Network A feedforward network is a collection of perceptrons where there are three basic types of layers - input layer, hidden layer, and output layer. During each connection, the signal from the previous layer is multiplied by the weights, added to the bias, and passed through the activation function. Feedforward networks use backpropagation to iteratively update the parameters until the desired performance is achieved. 3. Residual Network (ResNet) One problem with deep feedforward neural networks is called the vanishing gradient problem, which occurs when the network is too long to backpropagate useful information throughout the network. As the signal to update the parameters propagates through the network, it gradually decreases until the weights at the front end of the network are not changed or utilized at all. To address this problem, residual networks employ skip connections, which propagate signals across “skipped” layers. By using connections that are less susceptible to the vanishing gradient problem, the vanishing gradient problem is reduced. Over time, the network learns to recover the skipped layers as it learns the feature space, but is more efficient to train because it is less susceptible to vanishing gradients and needs to explore less feature space. Recurrent Networks 4. Recurrent Neural Network (RNN) A recurrent neural network is a special type of network that contains loops and recurses on itself, hence the name "recursive". RNNs allow information to be stored in the network, using reasoning from previous training to make better, more informed decisions about upcoming events. To do this, it uses previous predictions as "contextual signals". Due to their nature, RNNs are often used to handle sequential tasks, such as generating text letter by letter or predicting time series data (such as stock prices). They can also handle inputs of any size. 5. Long Short-Term Memory Network (LSTM) RNNs are problematic because in practice the scope of contextual information is very limited. The effect (back-propagated error) of a given input on the hidden layers (and therefore on the network output) either grows exponentially or vanishes as it loops around the network connections. A solution to this vanishing gradient problem is to use Long Short-Term Memory networks, or LSTMs. This RNN architecture is specifically designed to solve the vanishing gradient problem, combining the structure with memory blocks. These modules can be thought of as memory chips in a computer - each module contains several recurrently connected memory cells and three gates (input, output, and forget, equivalent to write, read, and reset). The network can only interact with the cell through each gate, so the gate learns to open and close intelligently to prevent the gradient from exploding or disappearing, but also to propagate useful information through a "constant error carousel" and discard irrelevant memory content. 6. Echo State Network (ESN) The echo state network is a variant of a recurrent neural network with very sparsely connected hidden layers (typically one percent connectivity). The connectivity and weights of the neurons are randomly assigned and the differences between layers and neurons are ignored (skip connections). The weights of the output neurons are learned so that the network can produce and reproduce specific temporal patterns. The rationale behind this network comes from the fact that, despite being nonlinear, the only weights modified during training are the synaptic connections, so the error function can be differentiated as a linear system. Convolutional Networks 7. Convolutional Neural Network (CNN) Images have a high dimensionality, so training a standard feed-forward network to recognize images would require thousands of input neurons, which, besides being blatantly computationally expensive, can lead to many problems associated with the curse of dimensionality of neural networks. Convolutional Neural Networks (CNNs) provide a solution by using convolutional and pooling layers to help reduce the dimensionality of images. Since the convolutional layer is trainable, but has far fewer parameters than a standard hidden layer, it is able to highlight the important parts of the image and pass them forward. Traditionally, in CNNs, the last few layers are hidden layers that process "compressed image information." Convolutional neural networks excel at image-based tasks, such as classifying an image as a dog or a cat. 8. Deconvolutional Neural Network (DNN) As the name implies, a deconvolutional neural network does the opposite of a convolutional neural network. Instead of performing convolution to reduce the dimensionality of an image, a DNN uses deconvolution to create an image, usually from noise. This is an inherently difficult task. Consider a CNN tasked with writing a three-sentence summary of the entire book of Orwell’s 1984, while a DNN tasked with writing the entire book from a three-sentence structure. 9. Generative Adversarial Networks (GANs) Generative adversarial networks are a special type of network designed specifically for generating images, and consist of two networks: a discriminator and a generator. The task of the discriminator is to distinguish whether the image was extracted from the dataset or generated by the generator, while the task of the generator is to generate images that are convincing enough that the discriminator cannot distinguish whether the image is real or not. Over time, and with careful supervision, the two opponents compete against each other, pushing each other to successfully improve each other. The end result is a well-trained generator that can spit out realistic images. The discriminator is a convolutional neural network whose goal is to maximize the accuracy of identifying real/fake images, while the generator is a deconvolutional neural network whose goal is to minimize the performance of the discriminator. Autoencoder 10. Autoencoder (AE) The basic idea of an autoencoder is to take the original high-dimensional data, "compress" it into highly informative low-dimensional data, and then project the compressed form into a new space. Autoencoders have many applications, including dimensionality reduction, image compression, denoising data, feature extraction, image generation, and recommender systems. It can be used as both an unsupervised and supervised method and can be very insightful about the nature of the data. The hidden units can be replaced with convolutional layers to adapt to processing images. 11. Variational Autoencoder (VAE) While autoencoders learn compressed representations of inputs, which can be images or text sequences, by first compressing the input and then decompressing it to match the original input, variational autoencoders (VAEs) learn the parameters of a probability distribution to represent the data. It not only learns a function to represent the data, but also obtains a more detailed and nuanced view of the data, sampling from the distribution and generating new input data samples. In this sense, it is more like a purely "generative" model, such as GAN. VAEs use probabilistic hidden cells that apply a radial basis function to the difference between a test case and the cell mean. |
<<: 5G is not yet popular, 6G is on the way, and 7G will achieve space roaming
>>: Factors that affect OSPF neighbor relationships, OSPF neighbor issues: network and subnet mask
Ah…the beauty of wireless convenience. Thanks to ...
It's the last day of the holiday. Did you get...
This article is reprinted from the WeChat public ...
Enterprises need to develop an effective and adap...
Since 2019, the pace of 5G commercialization has ...
Continue from the previous article "Introduc...
[[426371]] Web pages will load resources, run JS,...
Safety monitoring of transportation infrastructur...
[[387787]] March 15 news: At tonight's 315 Ga...
At the MediaTek Technology Summit, MediaTek annou...
Telecom operators are investing in operator softw...
Karamay is a desert city that was born and prospe...
What is the role of a host computer gateway? Supp...
[[403928]] This article is reprinted from the WeC...
Nowadays, in such a competitive market as server ...