AI chip black technology inventory

As big data and deep learning are increasingly used, new requirements are put forward for the underlying hardware and chips. Unlike traditional processors that emphasize "processing power", big data and deep learning applications often emphasize "computing power" and "energy efficiency". Since the feature extraction and processing in big data and deep learning application algorithms often use real calculations, high-computing chips are required to complete the calculations in the shortest possible time. On the other hand, energy efficiency is also an important indicator. Energy efficiency refers to the energy required to complete the calculation. The better the energy efficiency, the less energy is consumed to complete the same calculation.

[[237425]]

For terminal chips, more and more data cannot be transmitted to cloud data centers for calculation due to data privacy, network transmission bandwidth and processing delay issues. This requires terminal chips to complete calculations at the terminal. At the same time, the battery capacity of terminal devices is often limited, so terminal chips cannot consume too much energy while completing calculations, which means that they need a good energy efficiency ratio. For chips in cloud data centers, they also need a good energy efficiency ratio, because heat dissipation costs are an important expense of data centers, so the heat dissipation of chips cannot be too large.

In big data and deep learning applications, data is often independent and can be computed in parallel. Traditional CPUs have limited parallel computing capabilities, so it is difficult to meet computing power requirements. Although GPUs have high computing power (on the order of 10TOPS) and have been used in data centers, their power consumption is also very high (hundreds of watts), and their architecture determines that they cannot achieve scenarios where terminals require lower power consumption (such as less than 100mW). At the same time, even in the data center field, because the original intention of GPU design is for image rendering rather than big data computing, there is still a lot of room for improvement.

Therefore, we have seen many projects from academia and industry in the field of AI chips, trying to challenge CPUs and GPUs. These projects can be roughly divided into two categories. One is based on the traditional digital processor model, but the processor architecture is improved to improve computing power and energy efficiency; the second is to take a different approach and use a completely different method from traditional processors to perform calculations, thereby achieving performance far higher than traditional processors in some areas. Today we bring you a summary of the second category of technologies. We expect that some of the second category of technologies will stand the test of time and eventually become mainstream technologies.

Neuromorphic computing

In fact, neuromorphic technology has a long history. It was first proposed by Carver Mead, a circuit master at Caltech, in the 1980s and 1990s. At that time, Professor Mead noticed that the phenomenon of charge flow in MOS devices was similar to the discharge phenomenon of human neurons, so he proposed using MOS tubes to simulate neurons to form a neural network for calculation, which is called "neuromorphic".

It should be noted that the neural network in neuromorphic circuits is slightly different from the neural network in current deep learning algorithms. The neural network in neuromorphic circuits is a high degree of simulation of biological neurons and synapses, including processes such as changes in neural potential and firing pulses. This process can be implemented with asynchronous digital circuits or mixed signal circuits; while the neural network in deep learning is an abstract mathematical simulation of neural tissue in biology, which only depicts the statistical characteristics of its potential changes without specifically describing its charging and discharging process. However, this charging and discharging process may be a key to why the human brain is so energy-saving. The complex neural network in the human brain can realize extremely complex reasoning and cognitive processes, but its power consumption is far less than that of a GPU.

In May 2017, Oak Ridge National Laboratory published an important review of neuromorphic research. It should be said that the current research on neuromorphic is still in its early stages, and the potential of many neuromorphic architectures has not yet been discovered. At the same time, how to train neuromorphic circuits is also an important challenge. From the current research, people have found that neuromorphic neurons consume less power when not activated, so they can achieve lower average power consumption, which is an important advantage.

For example, when we deploy a camera plus artificial intelligence system to identify whether someone enters the camera's field of view, there is often no one in the field of view for a long time. In this case, the traditional deep learning algorithm needs to complete the same calculation regardless of the situation in the camera, so the power consumption remains constant; if the neuromorphic chip is used, the neurons are only activated when someone enters the camera, and when no one enters the field of view, the neurons are in standby mode and the power consumption is very low, so its average power consumption can be much lower than that of traditional deep learning chips.

In other words, the energy efficiency of neuromorphic circuits can be much higher than that of traditional GPU/CPU chips. In addition, low-power neuromorphic chips used in terminals can also complete online learning, while traditional deep learning inference acceleration chips used in terminals often do not have the ability to do online learning. These are just some of the benefits of neuromorphic circuits, and other potential of neuromorphic circuits is still waiting to be explored.

The potential of neuromorphic circuit chips is also the reason why some large companies have begun to lay out. IBM and Intel have launched their own neuromorphic chips (IBM and TrueNorth and Intel's Loihi), which can achieve very high energy efficiency. We expect to see more neuromorphic circuit chips released in the future to further explore the potential of neuromorphism.

Photonic Computing

Silicon photonics technology is currently gaining more and more applications in high-speed data transmission in data centers and 5G. In addition, silicon photonics can also be used to directly accelerate deep learning calculations with ultra-low power consumption.

In 2017, Professor Marin Solijacic of MIT and his research group published a paper in the journal Nature Photonics on using optical devices to accelerate deep learning calculations. In deep learning, most calculations can be reduced to matrix operations (this is also the principle of GPU for deep learning), and matrices in practical applications can be decomposed into the product of several characteristic matrices using SVD decomposition. Once SVD decomposition is used, the multiplication of two matrices can be achieved using optical devices (phase shifters, beam splitters, attenuators, and Mach-Zehnder interferometers).

More importantly, the process of multiplying two matrices can be converted into the interference of two beams of light, so deep learning calculations can be completed at the speed of light and theoretically with zero power consumption! The design proposed in this paper is to first modulate the two inputs of deep learning onto two beams of light, then let the two beams of light complete SVD decomposition and interference multiplication on the device of the photonic chip, and then convert the optical signal into a digital signal to read out the result. All these optical devices can be integrated on the same silicon photonic chip, thus realizing a high-performance optical computing module.

MIT's optical modules for deep learning computing

As mentioned earlier, once optical interference is used to implement deep learning calculations, the calculation speed becomes the speed of light, and the power consumption of matrix calculations becomes 0. Therefore, once the performance and power consumption of optical signal processing and interface modules such as optical modulation and optical signal readout are improved, the performance and energy efficiency of the overall system can be rapidly improved. MIT's optical computing team has incubated a startup company, Lightelligence, which has completed its A round of financing. Let us wait and see the prospects of optoelectronics for deep learning.

In-memory computing

Traditional AI accelerators are almost all based on the von Neumann architecture, that is, memory access and computing are separated. The problem with the von Neumann architecture is memory access, because the power consumption and latency of memory access are difficult to reduce, so memory becomes the bottleneck of processor performance and power consumption, which is the so-called "memory wall".

In order to solve the memory wall problem, many scholars have proposed the concept of in-memory computing. This concept even had a special session at ISSCC this year, which shows that the academic community still recognizes this direction. The most advanced research belongs to the MIT Anantha Chandrakasan group. Anantha Chandrakasan is well-known in the chip field. He is one of the authors of the classic digital circuit textbook "Digital Integrated Circuits: A Design Perspective". He is also a pioneer in many fields such as low-power digital circuit design and UWB systems. Basically, the Chandrakasan group publishes at least one paper at ISSCC every year. The in-memory computing paper published by the Chandrakasan group at ISSCC this year is aimed at neural networks with weights compressed to 1-bit. When the weight is only 1-bit, the convolution can be reduced to the average of multiple data, and the average value can be easily achieved using the charge averaging method commonly used in classic DACs.

Therefore, the in-memory computing paper essentially connects a circuit similar to the charge averaging circuit in the DAC to the on-chip SRAM, and directly implements analog computing to perform convolution in the memory, thereby eliminating the need to move data between the processor and the memory, which consumes a lot of time and energy. The calculated result can be converted into a digital signal again using an ADC.

Compared with traditional digital circuit AI accelerators, the circuit using in-memory computing plus analog computing can improve the energy efficiency ratio by more than 60 times, showing great potential. Of course, the circuit can only be used for networks with 1-bit weights. Let us wait and see how in-memory computing can be extended to more application scenarios in the future.

Quantum computing

Quantum computing is a truly disruptive paradigm—of course, the prerequisite is that quantum computers can be built first!

The biggest difference between quantum computing and classical computing is that quantum computing uses quantum states. Different quantum states can be linearly superimposed on each other, so a quantum bit can be in a superposition of multiple states at the same time before measurement. Quantum computing can operate on multiple superposition states at the same time, so it is equivalent to doing a large number of parallel calculations.

Quantum computing is still in the very early stages of research. Currently, only a few quantum algorithms can use quantum properties to achieve exponential acceleration. The so-called "quantum hegemony" means that a certain algorithm can produce a corresponding quantum computer that runs faster than a classical computer. So, how does quantum computing accelerate artificial intelligence? First of all, quantum versions of linear algebra algorithms are currently being actively studied, which are expected to achieve exponential acceleration. The basis of many calculations in AI algorithms is linear algebra, so once the quantum version of linear algebra algorithms is developed, artificial intelligence calculations can be greatly accelerated. In addition, quantum annealing algorithms represented by D-Wave are expected to accelerate the maximization problem, and one of the most important problems in artificial intelligence training is actually to find a maximization solution. Therefore, quantum computing is expected to accelerate artificial intelligence.

Google and UCSB jointly developed a 20-qubit chip

There are many ways to implement quantum computing chips, including ion traps and superconducting circuits that work at ultra-low temperatures and nonlinear optical devices that work at room temperature. It should be said that these chips are still in the early stages. Although some chips can already realize many quantum bits, quantum decoherence time and quantum gate fidelity are still performance bottlenecks. Quantum computing still has a long way to go before it can be used in practice, but once it succeeds, it will become a disruptive development, which is why giants such as Google, IBM and Microsoft are actively deploying quantum computing.

Conclusion

This article introduces some new AI chip technologies, including neuromorphic, optoelectronic computing, in-memory computing, and quantum computing. Traditional AI accelerator chips based on the von Neumann architecture have various limitations such as memory walls. We are expected to see these new technologies officially take the stage and be widely used in a few years. Let's wait and see!

<<: A chart showing the first phase 5G deployment schedule of the four major US operators

>>: F5 Named a Leader in WAF by Independent Research Firm Forrester Research

China Mobile's 5G planning goals have been clarified

LOCVPS: VPS in the United States/Netherlands/Germany/Australia 40% off, starting at 22 yuan/month, Hong Kong/Korea/Japan 20% off

Blog

How to eliminate 5G network blind spots in rural areas?

AI chip black technology inventory

China Mobile's 5G planning goals have been clarified

The key role of optical transceivers in passive optical network technology

5G accelerates commercialization, mobile edge computing test to be completed within the year

Improving time efficiency and accuracy: Carrier routing network mining

It’s 2021, and you’re still confused about the `IEnumerator` and `IEnumerable` interfaces?

Everything about Http persistent connection, volume for you to see

LOCVPS: VPS in the United States/Netherlands/Germany/Australia 40% off, starting at 22 yuan/month, Hong Kong/Korea/Japan 20% off

How to eliminate 5G network blind spots in rural areas?

5G messaging has reached the final stage of interoperability and testing. Can it replace WeChat?

In the "5G era", a large number of "unicorn" companies will emerge to seize the opportunity

Recommend

IDC predicts that the domestic Wi-Fi 6 market will be close to US$200 million in 2020

The third quarter of 2021 has passed. How is the progress of 5G construction in my country?

80VPS Los Angeles MC Data Center 199 yuan/year KVM simple test

The TRUST principle for 6G network performance experience

What is edge computing and why is it important?

Understanding HTTP and TCP protocols from an HTTP request

Summary information: Yunmi Technology/Yunji Internet/Duoxiantong/PigYun/Cool Cloud

Message bus for communication between processes

About edge computing: Is it right for your business?

6G network speed greatly improved, Chinese team set a new record of terahertz 100Gbps transmission

my country will start deploying and building IPv6 address projects in 2017

One article to understand SAN network composition and daily operation and maintenance

LOCVPS: 40% off for native IP in Osaka, Japan/30% off for Hong Kong Cloud/20% off for other sites

WOT Boco Nie Xiaoyun: WLAN network capacity performance design and optimization

OluCloud: Los Angeles CN2 GIA/Hong Kong CN2 VPS 60% off, starting from $2.4/month