Cisco ACI original core technology expert reveals the birth of ACI

The official development of ACI began in January 2012, but the original intention that triggered the thinking and eventually led to ACI can be traced back to 12 years ago. In 2000, a group of us started the first internal startup in Cisco's history, Andiamo Systems, to develop the MDS series of Fibre Channel switches. This was a very interesting project that introduced several new ideas. Previously, we designed Cisco's traditional Ethernet switches such as Catalyst 5000, 6000 and 6500. But unlike Ethernet, Fibre Channel networks are layer 2 networks with structured addressing, multipathing and lossless. At that time, these concepts did not exist on Ethernet, but the value was obvious.

After Andiamo, a few of us continued to develop Cisco's Fabric Path technology and IETF's TRILL from 2003 to 2011. At this time, we introduced multipath technology to Layer 2 Ethernet using overlay networks. Fabric Path uses proprietary encapsulation, while TRILL is more based on standards-based technology. We also brought what we called converged Ethernet, which supports both lossless and lossy traffic classes and Fabric Path. TRILL introduced protocol learning, which we all liked, but unfortunately, it was not very successful in the market. During that time, we also defined the VxLAN packet format with VMWare, which originated from the work related to LISP and DCI technology.

In January 2012, we founded Insieme Networks, dedicated to integrating Fabric Path, Converged Ethernet, VxLAN, and the emerging Software Defined Network (SDN) to develop a new generation of data center networks. While doing this, an inspiration hit us: the way networks are configured and managed also needs to be fundamentally transformed to reduce data center operating costs, increase flexibility, improve security, and reduce human errors. The final product is now known to everyone, that is, "Application Centric Infrastructure (ACI)" and "Intent-Based Networking (IBN)".

At the beginning of the integration, we deconstructed each of these technologies, trying to understand what the underlying components are, what they achieve, and how they support the final solution. We asked hundreds of questions, such as: "What is layer 3 forwarding?", "Why is layer 2 much less configured?", "Why flood ARP?". Each question provided us with deep insights into whether we should keep or discard a component. We want all the benefits of layer 3, all the benefits of layer 2, and all the benefits of overlay networks, but reject their disadvantages. We expect a system that can easily scale to the size of the largest data center and also work for the smallest data center. We also expect this system that is less likely to cause downtime due to mistakes made by its users, has no single point of failure, and can play well with the existing environment.

One of the key issues we discussed first was how to scale the overlay network. From the beginning, we recognized that implementing VxLAN on TOR switches would have a huge advantage over the pure software solutions that were just emerging at the time. Data centers have about 40 times fewer switches than servers, and this is a classic distributed system problem because scaling the number of tightly coupled components is the hardest. However, our executive management held the design team very close to this problem. There were many cases in the past where the original scalability goals were no longer sufficient when the product was finally shipped. Management asked us if we could do better, and we also began to hear that some startups might not be able to achieve their ambitious goals on this issue. Finally, after six months of development efforts, we stopped everything we were doing and tore down all the previous work - about eight of us locked ourselves in a room for a month and redesigned all the hardware. We struggled with the network state handling mechanism that determined the scale of the system. In order to maintain the normal operation of the system, a lot of network state must be synchronized; in a distributed system, the scale of the state is a function of how much state synchronization is done, how many places, how often, and how robust the mechanism is. The approach we developed separates global state into two categories: endpoint location and policy. Endpoint location can change relatively quickly to accommodate applications moving around, servers coming online and going offline, and network components failing. We minimize this state to the point where it only needs to be synchronized in two places to ensure correct behavior, regardless of the size of the network, and is completely self-correcting when errors occur. Policy state is much more complex than endpoint location state, but it changes less often, more predictably when it does, and doesn't need to be propagated throughout the system. Combining these two types of state handling, we end up with a much better, faster, and more robust solution for scalability. We're particularly proud of our first solution to this problem.

Another important component of the solution is the APIC controller. At that time, the call for SDN networks was very popular, and the rising stars advocated that centralized controllers could solve all network problems. Everyone asked Cisco what your SDN strategy was, and some people predicted that it would end Cisco. We see the advantages of a centralized controller, which is that it provides a single point of contact for the network, ensures the consistency of the entire matrix configuration; treats the entire network as a single component rather than a pile of network devices, all of which will greatly reduce operating costs and improve flexibility and availability. However, the controller also introduces a single point of failure, or it must be reliable enough to eventually achieve at least the same reliability as the previous controller-free network. Our solution is to remove the controller from any critical path. In this way, even if the controller is offline for any reason, the network can continue to work; at the same time, from a management perspective, it can still serve as a single point of contact. This means that the controller does not reduce, but actually enhances the availability of the network. Maximum availability is our primary goal, and removing the controller from the critical path also makes software upgrades easier and the controller scale larger. Easy upgrading and better scaling have significantly accelerated and will continue to accelerate our development cycles.

The policy model we developed for ACI is a key component of the overall approach. It was the biggest challenge for us because it introduced a whole new paradigm in network management. I like to describe it this way: Traditional networks have their own language to configure and operate; data center networks are used to carry applications, and applications have their own language. The main job of IT professionals is to translate the language of applications into the language of the network - and this is exactly where errors occur, information is lost, and money is wasted. We are trying to use the policy model of ACI to change the language of the network - to make it closer to the language of applications to reduce errors, time, and cost. ACI policies describe the conversations between servers, regardless of where they are on the network, whether they are bridged or routed, how many there are, and as much as possible decoupled from their IP addresses. This leads to a more automated and secure way to configure and manage data center networks. Today, we call this mechanism intent-based networking - IBN.

Along the way, having been developing and shipping ACI for five years, we have learned a lot. It is much more difficult for our customers to take advantage of programmatic or API-driven network management than we expected. This skills gap, while narrowing, still exists among many customers. We probably should have delivered a software-only version of ACI sooner so that it could be deployed to more environments sooner. Our object model is not perfect: it is powerful, but takes a long time to learn. Although not intentional, we have essentially made the VxLAN overlay disappear - administrators do not need to configure and manage it. This is probably more important and valuable than many people realize. I think we provide more support for embedding layer 4-7 services on the network than is actually needed. We should probably lead the industry to adopt a small number of standard ways to import these services. This consumes too much of our engineering resources.

Developing ACI-related technologies and ACI itself over the years has brought endless technical challenges, countless problem-solving thrills, the sense of accomplishment in changing an industry, and the sheer joy of having a positive impact on my colleagues, employees, and customers. Although there is still a long way to go, it has been a great journey so far and has provided me with a very satisfying career.

Tom Edsall

Cisco Fellow/Father of ACI

<<: What is 5G? How is it better than 4G?

>>: A complete analysis of the IoT strategies of Huawei, Alibaba and China Unicom

8 Internet startups that could change the industry

HostXen offers 50 yuan for new user registration, 6GB memory package starts from 70 yuan/month, and data centers in the United States, Japan, Singapore, and Hong Kong are available

HostXen is offering a 50 yuan voucher to new user...

CMIVPS: VPS hosting monthly payment 20% off, annual payment 30% off, Hong Kong large bandwidth/direct line monthly payment starting from US$5.6

CMIVPS has added Hong Kong 200GB and HK-CN2-8000G...

China has more than 150 million 5G users, of which 50 million do not use 5G phones

China Mobile and China Telecom announced that the...

The fourth largest operator is applying for a new LOGO and 5G broadcasting and television is expected to be seen within this year

The integration of the national radio and televis...

[11.11] UUUVPS US VPS annual payment starts from 91 yuan, Hong Kong CN2 annual payment starts from 182 yuan

UUUVPS (Sanyou Cloud) launched the promotion duri...

Cloud, IPv6 and all-optical networks

With the development of technologies such as 5G a...

Cisco ACI original core technology expert reveals the birth of ACI

8 Internet startups that could change the industry

WeChat Pay received the most complaints, and Alipay had the highest satisfaction, but Jack Ma was not happy!

Three-minute review! A quick overview of 5G industry development trends in December 2021

Why is NB-IoT, which once "firmly sat" at the top of the low-power Internet of Things, now frequently "questioned"?

This article explains OSPF clearly.

Interesting explanation of TCP three-way handshake and four-way wave

EtherNetservers: Los Angeles VPS starts at $12 per year, 1GB/30GB/2TB/2IP, supports Alipay/PayPal

China Mobile's operating revenue in the first quarter of 2021 was 198.4 billion yuan, a year-on-year increase of 9.5%

Advantages and Challenges of 5G Network Slicing

Highlights | Speech content of the 39th GTI seminar (1/2)

Recommend

Enrich online and in-app user experience and increase ROI through optimized video

Global fiber shortage threatens 5G and data center infrastructure

Let's take a look at what new tricks Huawei has come up with during its ICT Ecosystem Tour in China!

DiyVM: 69 yuan/month XEN-dual core/2G memory/50G hard disk/2M/Hong Kong CN2 data center

HostXen offers 50 yuan for new user registration, 6GB memory package starts from 70 yuan/month, and data centers in the United States, Japan, Singapore, and Hong Kong are available

Eurocloud's new San Jose high-defense VPS starts at 8 yuan per month, and Hong Kong CN2 starts at 15 yuan per month

Detailed explanation of the "three-way handshake" and "four-way wave" of TCP connection

Wangsu Technology launches edge AI gateway to help developers build AI

The Heart of Smart Devices: Understanding Semiconductor Sensors

[6.18] CMIVPS recharge gift 20%, Hong Kong high bandwidth VPS monthly payment 20% off / annual payment 30% off

CMIVPS: VPS hosting monthly payment 20% off, annual payment 30% off, Hong Kong large bandwidth/direct line monthly payment starting from US$5.6

China has more than 150 million 5G users, of which 50 million do not use 5G phones

The fourth largest operator is applying for a new LOGO and 5G broadcasting and television is expected to be seen within this year

[11.11] UUUVPS US VPS annual payment starts from 91 yuan, Hong Kong CN2 annual payment starts from 182 yuan

Cloud, IPv6 and all-optical networks