Today, many enterprises are digitally transforming their operational treasure chests with new distributed applications, leveraging core and edge clouds. These enterprise applications impose different demands and use a large number of data center and network resources in unpredictable ways. The rapid growth in the number and logical distribution of these applications in cloud data centers is now challenging network operations, especially during the ongoing COVID-19 pandemic, but this situation is expected to continue to intensify. The question that network operators are increasingly asking is: "How can we keep up with the growing number of cloud-based applications without excessively increasing the cost of designing, building, and operating new data center network environments?" The only way for enterprises to ensure that the operating model keeps pace and scales cost-effectively is to use end-to-end network automation. Specifically, using closed-loop automation to continuously monitor the network, traffic, and available resources to automatically adjust according to predetermined intent, ensuring optimal service quality and resource utilization. To achieve this intent-based network through closed-loop automation, a solid foundation is needed to make network resources as consumable as computing and storage. Evolving Network Operating SystemsTraditionally, data center networks have been somewhat of a "black box" to the applications that run on them. Ideally, data center operators would like to consume network resources on demand, just as they consume compute and storage resources, with the network operating invisibly to support applications. However, to achieve this, the network operating system (NOS) needs to be architected differently. Traditional closed and vendor-proprietary NOSs provide limited visibility into their operations and little control over applications higher up the stack. Automation toolkits provided by network equipment vendors offer only a limited set of tools and force operations teams to write any network applications in the proprietary language of the vendor NOS. This requires additional resources, provides a more limited automation scope, and creates inconveniences such as recompiling automation applications with each new version of the vendor NOS. More open network operating systems (NOS), often based on Linux systems, have evolved to address some of these issues. They use standardized functionality, leverage the work of the open source community, and reduce the amount of custom coding required. However, they have their limitations and are difficult to customize, integrate, and automate. There is a DIY mentality that works for some, but the learning curve is steep and requires investment in expertise to collect and test modules and write applications to automate operations. Infrastructure as CodeAn emerging trend across industries is the best of both worlds. In this model, network operating system (NOS) vendors leverage the open Linux base, but toolkits are often based on a combination of proprietary vendor and open source modules that are customized and integrated in a way that makes them flexible and consumable. The key to this best of both worlds approach is the use of declarative, intent-based automation and operational toolkits based on container orchestration systems such as Kubernetes. This aligns with the increased adoption of "infrastructure as code", which is important for solutions that span on-premises and off-premises hybrid clouds. The network should be able to be tightly coupled with these cloud ecosystems, following the needs of the applications and remaining invisible until problems arise. The fabric operations platform must adopt a loosely coupled, cloud-native approach to enable plug-and-play integration with software-defined data center or SDDC stacks, such as those based on VMware or Kubernetes. Template IntentIn this new framework, data center operations teams use fabric design models that have been tested for stability and validated in network vendor labs. The "fabric intent" is abstracted to a level where operations teams do not need to understand the underlying high-level network details or have trained and certified personnel to provide services. The fabric consists of different network virtualizations – for example, a “logical distributed switch” or a “logical distributed router”. For example, the abstraction is intended to focus on the common structure of the data center infrastructure, such as the number of racks, servers per rack, dual-homing, etc., to automatically design and deploy a standard Border Gateway Protocol (BGP)-based IP fabric. Network automation can be applied to both virtual and physical resources. For physical switching and routing resources, this has the added advantage of eliminating human error in the configuration of the data center stack. Maximize agility and minimize riskThis evolution to infrastructure as code brings network fabric more in line with data center operational philosophies, such as DevOps, which uses extensible automation platforms to simplify continuous integration and continuous development. This new intent-based approach to network fabric automation can allow for rapid and frequent changes to ensure that distributed applications are integrated and developed in sync with the network fabric required to support them. This means, among other changes, the need for a network digital sandbox. A network digital sandbox is a digital twin (in the language of software development) of a real production network. Network equipment vendors have traditionally developed and tested various scenarios in their physical labs. However, not every scenario can be created or validated, and it is not always feasible to quickly secure lab resources. A digital sandbox allows operations teams to quickly experiment, test, and validate various automation steps, and more importantly, validate failure scenarios and associated closed-loop automation without having to try them on the production network. Observability and AutomationAutomation and observability can go hand in hand. Unfortunately, the traditional approach of simply collecting various data and pushing big data to the operations team without interpretation complicates the operator's task while providing little useful information. This is called telemetry, but what is needed is to extract and provide contextual insights that allow operators to understand the root cause of the problem and mitigate it, rather than raw data. Modern data center operations platforms must implement an insight database that consolidates configuration and observability data to provide contextual insights in an easy-to-understand manner. In addition, these contextual insights must enable operators to perform closed-loop automation in a programmable manner. As the randomization and complexity of collected data increases, applying standard business logic is not enough. Instead, implementing advanced machine learning baselines and analytics can provide deeper and more in-depth insights to human operators. In this way, software operations can empower operators to perform the complex operations required of modern data centers. ConclusionTo achieve the scalability and flexibility required of the modern data center, closed-loop automation is key to network resources becoming as consumable as compute and storage. Today’s most advanced network operating systems (NOS) enable automation through abstract intent and innovative network virtualization. This enables the network to become invisible in the ecosystem when needed. Following DevOps best practices, these NOS include a digital sandbox that enables operations teams to design network automation for failure. They leverage the best features of an open approach by providing a plug-and-play approach and, most importantly, tightly integrating observability with automation. This approach and combination of capabilities has been proven in the field and provides a solid foundation for operations teams to deploy much-needed closed-loop automation for their data centers. |
<<: Operators are cutting marketing expenses, so how can agents survive?
>>: What did Chinese operators show the world at the Winter Olympics?
In June, people from all over the world gathered ...
Smart homes are becoming an increasingly importan...
edgeNAT has released a regular promotion for June...
In 2019, China Radio and Television, together wit...
[[373455]] The widespread problem of unreliable c...
A computer network is a system of interconnected ...
SpaceX's satellite internet service has been ...
TLS v1.2 was released in August 2008. Ten years l...
[[405125]] This film note is a development summar...
As enterprises develop their network strategies a...
Speaking of HTTPS, I believe most people are fami...
The last time I shared news about Ramnode was in ...
Recently, Hughes Network Systems (Hughes) announc...
Today, most businesses realize that in order to a...
Methods for Identifying Fiber Link Problems There...