Summarize the differences between the three: Although the CPU has multiple cores, there are usually only a few of them. Each core has a large enough cache and enough digital and logic operation units. It needs to be highly versatile to handle a variety of different data types. At the same time, logical judgment will introduce a large number of branch jumps and interrupt processing, and there are a lot of hardware to accelerate branch judgment or even more complex logical judgment. The number of GPU cores far exceeds that of CPU cores, which are called many-core (NVIDIA Fermi has 512 cores). Each core has a relatively small cache size, and the digital logic operation units are few and simple (GPU was initially weaker than CPU in floating-point calculations). It faces highly unified and independent large-scale data and a pure computing environment that does not need to be interrupted. TPU is a chip customized for machine learning. It has been specially trained for deep machine learning and has higher performance (computing power per watt). Generally speaking, it has a 7-year lead over current processors, has higher tolerance, can squeeze out more operating time per second in the chip, uses more complex and powerful machine learning models, deploys them faster, and users will get smarter results more quickly. The so-called NPU, or neural network processor, uses circuits to simulate human neurons and synaptic structures. CPU The central processing unit (CPU) is one of the main devices of an electronic computer and the core component of a computer. Its main function is to interpret computer instructions and process data in computer software. All operations in a computer are performed by the CPU, which is the core component responsible for reading instructions, decoding instructions and executing instructions. The structure of CPU mainly includes Arithmetic and Logic Unit (ALU), Control Unit (CU), Register, Cache and the bus for data, control and status communication between them. The CPU follows the von Neumann architecture, the core of which is: storing programs and executing them sequentially. In addition, because it follows the von Neumann architecture (stored program, sequential execution), the CPU is like a strict butler, and it always does what people tell it to do step by step. However, as people's demand for larger scale and faster processing speed increases, this butler gradually becomes unable to cope with it. So, everyone wondered, could we put multiple processors on the same chip and let them work together? Wouldn't that improve efficiency? That’s right, the GPU was born. GPU Before formally explaining GPU, let's first talk about a concept mentioned above: parallel computing. Parallel computing refers to the process of using multiple computing resources to solve computing problems at the same time. It is an effective means to improve the computing speed and processing power of computer systems. Its basic idea is to use multiple processors to jointly solve the same problem, that is, to decompose the problem to be solved into several parts, each of which is calculated in parallel by an independent processor. Parallel computing can be divided into temporal parallelism and spatial parallelism. Parallelism in time refers to assembly line technology. For example, when a factory produces food, it is divided into four steps: cleaning-disinfection-cutting-packaging. If the pipeline is not used, one food will be processed only after completing the above four steps, which is time-consuming and affects efficiency. However, with the pipeline technology, four foods can be processed at the same time. This is the time parallelism in the parallel algorithm, which starts two or more operations at the same time, greatly improving the computing performance. Spatial parallelism refers to the concurrent execution of calculations by multiple processors, that is, connecting two or more processors through a network to simultaneously calculate different parts of the same task, or large problems that cannot be solved by a single processor. For example, Xiao Li plans to plant three trees on Arbor Day. If Xiao Li needs 6 hours to complete the task alone, he will call his good friends Xiao Hong and Xiao Wang on Arbor Day. The three of them start digging holes and planting trees at the same time. After 2 hours, everyone has completed the task of planting a tree. This is the spatial parallelism in the parallel algorithm, which divides a large task into multiple identical subtasks to speed up problem solving. So, if the CPU is used to perform the tree planting task, it will plant trees one by one and take 6 hours. However, if the GPU is used to plant trees, it is equivalent to several people planting trees at the same time. The full name of GPU is Graphics Processing Unit, which means graphics processor in Chinese. As its name suggests, GPU was originally a microprocessor used to run graphics calculations on personal computers, workstations, game consoles and some mobile devices (such as tablets, smartphones, etc.). Why are GPUs particularly good at processing image data? This is because every pixel on the image needs to be processed, and the process and method of processing each pixel are very similar, which makes it a natural breeding ground for GPUs. However, the GPU cannot work alone and must be controlled by the CPU . The CPU can work alone to process complex logical operations and different data types, but when a large amount of data of the same type needs to be processed, the GPU can be called for parallel computing. Most of the work done by GPUs is computationally intensive, but not very technical, and needs to be repeated many times. To borrow the words of a great person on Zhihu, just like you have a job that requires you to calculate addition, subtraction, multiplication and division within 100 for hundreds of millions of times, the best way is to hire dozens of primary school students to calculate together, each of them calculates a part. Anyway, these calculations do not have much technical content, it is just pure physical work; and the CPU is like an old professor, who can calculate integrals and differentials, but the salary is high. An old professor is worth twenty primary school students. If you were Foxconn, which one would you hire? But one thing needs to be emphasized. Although the GPU was created for image processing, we can find from the previous introduction that it does not have any components specifically designed for image processing. It is just an optimization and adjustment of the CPU structure. So now the GPU can not only be used in the field of image processing, but also in scientific computing, password cracking, numerical analysis, massive data processing (sorting, Map-Reduce, etc.), financial analysis and other fields that require large-scale parallel computing. TPU The Tensor Processing Unit (TPU) is a custom ASIC chip designed from scratch by Google specifically for machine learning workloads. TPU provides computing power for major Google products, including Translate, Photos, Search Assistant, and Gmail. Cloud TPU uses TPU as a scalable cloud computing resource and provides computing resources to all developers and data scientists running cutting-edge ML models on Google Cloud. As mentioned above, CPU and GPU are both relatively general chips, but there is an old saying: universal tools are never as efficient as specialized tools . As people's computing needs become more and more specialized, they hope to have chips that can better meet their professional needs. At this time, the concept of ASIC (application-specific integrated circuit) came into being. ASIC refers to an integrated circuit with special specifications that is customized according to different product requirements. It is designed and manufactured according to specific user requirements and the needs of specific electronic systems . The TPU (Tensor Processing Unit) is a chip developed by Google specifically to accelerate the computing power of deep neural networks . It is actually also an ASIC. It is said that TPU can provide 15-30 times performance improvement and 30-80 times efficiency (performance/watt) improvement compared with CPU and GPU of the same period. The first generation of TPU can only do inference, relying on Google Cloud to collect data and generate results in real time, and the training process requires additional resources; the second generation TPU can be used for both training neural networks and inference. NPU The so-called NPU (Neural Network Processing Unit) is a neural network processor that uses circuits to simulate the structure of human neurons and synapses . In neural networks, storage and processing are integrated and are reflected through synaptic weights. In the von Neumann structure, storage and processing are separated and implemented by memory and arithmetic units respectively. There is a huge difference between the two. When using existing classical computers based on the von Neumann structure (such as X86 processors and NVIDIA GPUs) to run neural network applications, they are inevitably restricted by the separate structure of storage and processing, thus affecting efficiency. This is one of the reasons why professional chips specifically for artificial intelligence can have certain inherent advantages over traditional chips. Typical representatives of NPU include China's Cambrian chips and IBM's TrueNorth. Taking China's Cambrian as an example, the DianNaoYu instruction directly faces the processing of large-scale neurons and synapses. One instruction can complete the processing of a group of neurons and provide a series of special support for the transmission of neuron and synapse data on the chip. To put it in numbers, there is a performance or energy consumption ratio gap of more than 100 times between CPU, GPU and NPU - take the DianNao paper jointly published by the Cambrian team and Inria in the past as an example - DianNao is a single-core processor with a main frequency of 0.98GHz and a peak performance of 452 billion basic neural network operations per second. It consumes 0.485W of power under 65nm process and has an area of 3.02 square millimeters. BPU BPU (Brain Processing Unit) is an embedded artificial intelligence processor architecture proposed by Horizon Robotics. The first generation is the Gaussian architecture, the second generation is the Bernoulli architecture, and the third generation is the Bayesian architecture. Horizon Robotics has designed the first generation of Gaussian architecture and jointly launched the ADAS system (Advanced Driver Assistance System) with Intel at the 2017 CES. DPU DPU (Deep Learning Processing Unit) was first proposed by DeePhi Technology in China. Based on Xilinx's reconfigurable FPGA chip, it designs a dedicated deep learning processing unit (based on existing logic units, it can design parallel and efficient multipliers and logic circuits, which belongs to the IP category), and abstracts a customized instruction set and compiler (instead of using OpenCL), thereby achieving rapid development and product iteration. In fact, the DPU proposed by DeePhi is a semi-customized FPGA. Summarize
|
<<: What else to look forward to in the communications industry in 2021?
The demands placed on communications service prov...
A new report from MarketsandMarkets predicts that...
An IEEE survey of 350 chief technology officers a...
According to Google user statistics, as of June t...
01 Introduction WebSocket is a network communicat...
At present, the number of users of 5G packages ha...
AkkoCloud is a merchant that mainly provides VPS ...
In recent years, the development of blockchain te...
【51CTO.com Quick Translation】With the industry...
In the digital age, the Internet has become an in...
BGPTO currently offers a special discount code fo...
3GPP Release (Rel) 17, due in mid-2022, introduce...
Edge computing has quickly become popular for com...
I remember in 2018, my father's mobile phone ...