Talking about my cold thoughts on SD-WAN on the crater

Talking about my cold thoughts on SD-WAN on the crater

SDWAN will be the most valuable investment outlet in 2018. This is not a prediction, because it was already the case in 2017, but Gartner may not be able to see it. If even Gartner can draw a curve chart, it is no longer an opportunity.

However, since ancient times, the real vent must be the crater of a volcano. Only a very small amount of carbon element can become diamond under the instantaneous high temperature and high pressure. Most of the red magma will turn into dust and soil within 24 hours.

In 2017, when I was fighting with SDWAN, I encountered many problems, solved many problems, and had the opportunity to fight new problems. But there was one question that always lingered in my mind: facing an Internet that is completely impossible to describe with mathematical models, is the modeled control algorithm useful?

[[215645]]

Last night, I finally figured this out, but it was no use.

If it was useful, the Internet in 1997 would have been dominated by algorithms, and there would be no need to wait 20 years. Moreover, in terms of insight into problems and the ability to solve them, the first generation of Internet pioneers were much smarter than later generations. To date, apart from virtualization and cloud computing, we cannot find anything comparable to the foundational Internet technology. Until 2018, the entire Internet was still developing along the trend that had already been formed in 1988, but the speed of progress was getting faster and faster. Even AI has only increased the acceleration, and is far from reaching the point of changing the trend.

Any Internet product must be successful because it is closely coupled with a generation. When this generation gradually ages, the product will also wither. Whether it is Facebook, Twitter, WeChat, or QQ, they will all be buried with us.

However, in the TCP/IP protocol stack far away from users, TCP, IP and their derivatives have been with several generations safely, with no signs of decline so far, and no one has ever thought that they are outdated except in the academic circle.

The basic protocols in the TCP/IP protocol stack can be counted on two hands. Why can they break the curse of the periodic law?

The reason is human nature, or in other words, the greatest common divisor of human nature.

From the generation with written records to the present day, the common parts of human nature have hardly changed, and they appear in pairs in the form of contradictions: greed, fear, following the crowd, and acting alone...

The reason why the core protocol of TCP/IP can last so long that no one doubts its durability is precisely because the design concept is tightly coupled with the greatest common divisor of human nature, adheres to the doctrine of the mean, and is loosely coupled or even isolated from the personalized part.

However, the design concept of Internet products we deal with in our daily lives is just the opposite.

How to choose between the commonality of human nature and the individuality of a specific group of people determines the lifespan and survival form of a technology. However, this choice is determined by the positioning of products and technologies. The technology and products at the application layer can only pursue the personalized needs of a specific group, while the underlying technology must keep a distance from personalization.

So the question is, how should SDWAN, a schizophrenic patient, choose his place of residence?

First of all, SDWAN is a genuine network technology, but the target of its service is extremely personalized customer needs. The common needs of the network have been well met by the network infrastructure, and there is no need for SDWAN. The survival space of SDWAN lies in the gaps that the giant SPs and CPs are unable or have no time to take care of.

I always think that SDWAN technology, which serves operators and large Internet companies, should be classified as basic network technology. The first consideration of this type of technology is not personalization and flexibility, but stability and availability. The characteristics of this type of technology are defense rather than offense. It fills the gap between the growing general needs of the majority of users and the lagging service capabilities of the infrastructure, rather than satisfying the personalized needs of a few users first.

This is why the three major domestic operators attach great importance to SDWAN at the strategic level, but are divided and even contradictory at the tactical level. Defense or offense, this is the question.

The advantage of a startup SDWAN company is that it does not need to worry about multiple choices. It can only attack. Defense means death.

In order to achieve the goal of continuous attack, SDWAN has only one choice: to continuously search for controllable intersections between personalized needs and complex network states during movement and quickly extract resources from them.

There are many start-up SDWAN companies at home and abroad, but according to my observation and understanding, the ones that really have good profitability are masters of mobile warfare and guerrilla warfare, with crude technology and agile skills. Those companies with sophisticated technology and clumsy skills have not made any money.

Now more and more people are talking about SDWAN in various grand occasions. If we roughly divide it, it can be divided into two major schools: architecture and algorithm. Unfortunately, the least valuable things in this industry happen to be architecture and algorithms.

Let's talk about architecture first. Most people think that architecture is planning, and as long as you have a good architecture, you are more than halfway to success. This concept is at least 30 years behind the times. All architectures must be built on a correct understanding of the network. The more you understand correctly, the more refined the architecture will be. Otherwise, the more rough it will be, and it may even look like no architecture at all. However, for the Internet, a complex and huge system, no one can get enough correct understanding in advance, and even the pitiful correct understanding is not available. On this empty basis, where does the architecture come from? If you pay a little attention to those who are keen on talking about architecture, it is easy to find that they are often obsessed with weaving a delicate and perfect system architecture diagram, but they know almost nothing about what is happening in the network. Such an architecture is not even dared to be used by them. In the field of the Internet, architecture has always been just a summary of established facts, and a summary of established facts with a wide enough impact. This is why the papers on SDWAN released by Google in the past four years have become more and more cautious, less and less like SDWAN, but more like a summary of the experience of a company's internal technical transformation project.

As for algorithms, some people are considering using AI to save network engineers. But don’t forget that the blood of all algorithms comes from input data. The more complex the algorithm, the more stringent the requirements for the accuracy and comprehensiveness of the input data. The dilemma of the Internet is that both operators and Internet companies lack sufficient data to support slightly more complex network engineers to replace algorithms.

However, these macro-level difficulties do not affect SDWAN's role in the world.

We just mentioned that SDWAN is a schizophrenic technology. It must keep up with the personalized needs of users and adapt to the complex characteristics of the wide area network. This is a disadvantage. However, schizophrenics have their own way of playing, which is to capture temporary and local certainty in macro uncertainty.

Now, let’s get back to the topic and talk about some of my cold thoughts.

First of all, the core of SDWAN is not control but management, and the core of management is the ability to maintain the consistency of network status.

When countless people talk about the SDWAN control plane on various occasions, anyone with a little common sense should naturally think of the following question.

First, the core of control is the algorithm, that is, the modeled solution, but the modeled solution must be aimed at problems that can be modeled. Are the problems faced by the control algorithms modeled?

Second, if algorithms are the core of the problem, why didn’t engineers twenty years ago think of using these algorithms? Were our predecessors stupid?

Third, most of the control algorithms that are being hotly discussed today have been tried at least ten years ago, but at that time there was no SDWAN. Why did they fail then? Have the factors that led to the failure been eliminated today?

The answers to the above three questions are mostly negative. Because in order to obtain the accuracy and timeliness required by the control algorithm for input data, the control plane itself can't help at all, and can only rely on the management plane. The efficiency of the management plane depends on the means of obtaining and measuring network status information. In the past few decades, the progress of these means has been very slow, and they are often troubled by the contradiction between "fast measurement and accurate measurement". In most SDWAN architectures that you can see now, the management plane is only a supporting role, and the technology used is at the same level as traditional network management. I really don't see how such a system design can be better than a traditional network management system.

The collection of network information is only a part of the network state consistency maintenance. If the information collection is incomplete or inaccurate, it will only reduce the efficiency of control and will not cause any harm. However, another aspect of network state consistency maintenance is to maintain the consistency of control state in the control plane and the data plane, which can only be done well and not badly. Once a zombie flow table or forwarding table that should be revoked but is not revoked appears in the data plane, it is like burying a time bomb for itself. When accumulated to a certain level, it is enough to destroy the entire network. Unfortunately, this issue has also been widely ignored.

What is a good network status consistency maintenance? In the field of SDWAN, Google is the best. But if you broaden your horizons, those old protocols are enough to make Google far behind. For example, the much-criticized distributed routing protocol.

Almost all the gorgeous SDWAN system architectures start with criticizing the distributed routing protocols as stupid, clumsy, and slow. But the fact is just the opposite. The design wisdom of the distributed routing protocols is completely incomparable to the existing SDWAN. This is because these protocols firmly grasp the essence of maintaining the consistency of network status.

Take OSPF as an example. The core of the entire protocol is not Dijkstra's algorithm, but "who, what and when the router should exchange information". This information is the network topology information. Due to the distributed implementation, each router does not know whether the topology information it obtains is comprehensive, real-time and accurate, nor does it know how fast to measure and update the topology information to achieve the best effect. This is the dilemma of OSPF. For this reason, two clever mechanisms are used in the design of OSPF. The first is to use the excellent permeability of broadcast to ensure that the accessibility of signaling messages is not affected by topology changes and route failures, and to construct an independent and flexible signaling plane. The second is to calculate what I do, no matter what the LSDB is or should be, as long as there is a change, it will be calculated. Although the result may be full of errors in a single calculation, this continuous calculation behavior will converge to the correct state sooner or later, and there will never be problems such as zombie routing tables. Therefore, OSPF can maintain the consistency of network status without knowing the real network status, and it always tends to the right direction and has a self-cleaning function. On the other hand, the reliability of SDWAN control message transmission is almost entirely dependent on the preset dedicated line or the weather, and the most common network status consistency maintenance mechanism is to maintain the forwarding policy after the forwarding device loses connection with the controller and completely delete it after a certain period of time. That's all.

This means that SDWAN has obvious vulnerabilities in both obtaining network status and maintaining consistency in control strategies. When the network is not perfect, the SDWAN system has neither the ability to provide accurate and real-time input to the control plane nor the ability to suppress and clear erroneous calculation results. It is even more impossible to automatically drive in the right direction before the network returns to a steady state.

Most SDWANs do not work in a perfect environment. They are either standing on a dangerous wall in the form of OVERLAY, or renting dedicated line resources that are not completely reliable. In this scenario, the reason why the fragile management plane has not received enough attention is that the scale of the existing SDWAN system is still very limited, and it can be cleaned up manually. However, once the system size exceeds a certain limit, problems on the management plane will inevitably surface. Companies like Google, whether in SDN or DCI within the data center, rely heavily on the design concept of distributed routing protocols, while Juniper's proud segment routing has returned to source routing technology because it is difficult to break through the bottleneck of the management plane.

Any commercial system must provide stable output expectations as a prerequisite. This stable output obviously depends on management rather than control. This is the same as the fact that the key to AI is data rather than algorithms.

Second, the ability to maintain network status consistency does not always need to rely on a controller. In small and medium-sized networks, manual efficiency may even be higher.

In the past two years, when I talk to my peers about SDWAN, the most frequently asked question is not what problems we solve with SDWAN or what special needs we meet, but "Is your controller based on ODL or ONOS?" This represents a deep-rooted misunderstanding that to do SDWAN, you must first have a controller, and the controller is the core of the system. But according to my experience, the controller is not only not the core, but it can also be completely non-existent. Especially in small and medium-sized SDWAN systems, the benefits of hiring an experienced network engineer far outweigh those of hiring a group of coders to develop a complex control system. I can easily cite more than three such examples, and they are all the money-making leaders in this industry. Once you have mastered the characteristics of the network, and this characteristic has long-term stability, the only thing left is to use and control this characteristic in the cheapest way, which is much simpler than imagined.

Of course, if you want to quickly attract attention, you can also cover yourself with the SDWAN coat.

Many companies that were only focused on making money only realized it after being enlightened by a master, ┗|`O′|┛ Oh~~, it turns out that what I do is SDWAN!

Therefore, architecture can only be summarized but not used for planning.

If a startup tells you that it is not making money yet but is working hard to develop the SDWAN controller, then there is at least a 100% chance that the company is not far from death.

However, when the scale of SDWAN exceeds a certain limit, the disadvantages of manual work will become apparent, and the importance of operation and maintenance automation will inevitably emerge. This critical point will appear when the network status consistency maintenance is difficult to maintain. But even at this moment, the controller is still not necessary.

Third, the way for SDWAN to survive is to expand its exposure to real problems in production environments and thereby curb unrealistic ideas.

The most important task of SDWAN may be to capture temporary and local certainty in a macro network environment full of uncertainty. Algorithms cannot be relied upon for this, because all algorithms rely on deterministic inputs and deterministic models. The most efficient way is to rush in first and fully come into contact with real problems in the production environment to distinguish which are the main contradictions and which are the secondary contradictions, and by the way, figure out which things are really profitable and which things just look good.

Everyone knows the principle of Occam's razor: do not add entities unless necessary. However, the complexity of the environment in which SDWAN itself is located and the problems it faces make this field seem like AD HOC, full of innovative problems. However, which problems are worth solving still needs to be tested in the production environment.

Based on this point alone, it is difficult for the academic community to make any achievements in the SDWAN field unless it closely integrates with the industry or develops an SDWAN system that can meet real user needs and uses experimental data to speak for itself.

From today, it is 2018. Many people are preparing to join the SDWAN gold rush, or take advantage of the SDWAN hype to write a few good papers and get a few lucrative projects. At this crater, I think we should first stay calm, because the maturity of this field, both in technology and industry, far exceeds the imagination of the media. This is neither a no man's land nor a virgin land. Giant beasts and small carnivores have already filled every track. If latecomers want to gain a foothold and survive, they need to open up new tracks with a more pragmatic attitude and more superb skills.

<<:  The Industrial Internet of Things is coming in full force. Wind River shows you how to keep the industrial control system “on track”

>>:  Building a strong network nation, Inspur Cisco network helps government information construction

Recommend

What can 5G technology do? It will have a significant impact on 20 industries

First of all, we must know what 5G is. In a nutsh...

Fatal question: How many HTTP requests can be sent through a TCP connection?

There was once such an interview question: What h...

CrownCloud: Los Angeles AMD Ryzen KVM special price starts at $30 per year

In April this year, CrownCloud launched the AMD R...

IDC: Global Ethernet switch and router markets mixed in the second quarter

According to the Ethernet Switch and Router Quart...

Ubuntu 18.04 changes the IP address

My memory is getting worse and worse, just record...

Network security programming: C language reverse loop structure analysis

[[392807]] The loop structures of C language incl...

Approaches to Solving Multiradio Hardware Design Challenges

The combination of multi-radio and multi-protocol...

What is a mesh Wi-Fi router? What makes it so cool?

No matter how big or small your house or apartmen...

ERP, CRM, SRM, PLM, HRM, OA...what do they all mean?

When working in a company, you often hear some st...