What do you do when your SD-WAN has a problem or you suspect it’s causing application problems? Troubleshoot, of course.
But SD-WAN troubleshooting requires IT teams to have a very good understanding of the network devices, connections, and topologies they are dealing with, as well as many other factors. Here are some helpful monitoring and practical troubleshooting steps that IT teams can follow when dealing with SD-WAN issues. The first step in SD-WAN troubleshooting is to understand when the network is not functioning properly. In most cases, monitoring an SD-WAN is not much different than monitoring a regular network. Physical components are usually the easiest to monitor: they either work or they don't. Logical functions can be more challenging because abstractions can make multiple network links appear as if they are one. Monitoring SD-WAN 1. Event handling. The most useful element of a good network management architecture is to examine events from network devices, including SD-WAN devices. Think of events as the network letting you know that something noteworthy has happened. The process does not require polling, and it can scale as the network grows. I prefer syslog events to Simple Network Management Protocol (SNMP) traps because they do not require a specific management information base to be loaded into the management system to view the details. IT teams should configure SD-WAN devices to send events to a common event processing system where they can be stored, correlated, and acted upon. Organizations with limited budgets can use open source collectors such as syslog-ng, as well as various analysis tools to summarize the large number of events that the network can generate. Organizations with budgets can look into the ELK stack - Elasticsearch, Logstash, and Kibana. If you need vendor support, there are vendor-supported versions of ELK, equipment vendors, and log processing vendors. Event handling systems should be configured to automatically generate trouble tickets or send real-time alerts to the IT organization when critical events are detected. All events should be reported in daily or weekly summaries to ensure that missed events are eventually seen - for example, it would be good to know that half of a redundant design is not working. 2. Active link test. SD-WAN uses multiple links to provide reliable end-to-end service. Active link monitoring allows the system to verify the success of the SD-WAN in providing the required reliability. Multiple tests may be required to verify the paths for different types of traffic, such as real-time data versus batch data. As the number of SD-WAN sites increases, ease of deployment is critical to a successful implementation. Make sure the test is configured to simulate actual application traffic, including packet size, transmission rate, and quality of service markings. An advantage of active link testing is that it can detect problems outside of normal business hours when there is no application traffic. Active link testing simulates realistic application traffic and tests the entire end-to-end system, including link selection. IT teams can use this type of test during a proof-of-concept evaluation by disabling each WAN link and monitoring how the test results change. This is particularly useful for determining how well an inexpensive broadband link can handle high-priority or real-time traffic when a low-latency path is down. Configure the test to always run so you can also get a sense of how likely an application is to run at different times of the day. You may also want to know the performance level when other applications are running — like backups or database synchronization, or when the broadband network is busy. 3. Physical state. SD-WAN appliances are typically based on x86 systems with internal CPUs, memory, interfaces, power supplies, and cooling. Network events (usually syslog) should report problems with these components. Monitoring with SNMP can provide additional data about the use of these resources and provide answers to questions such as:
The default configuration for parameters such as buffering is usually correct, but sometimes you need to be able to modify the number of buffers to suit the functional characteristics of your application, such as processing a large number of very small packets. Make sure you can modify the queue depth as needed. You should verify that the SD-WAN controller provides alerts and reporting when there are issues with the physical link. It should be able to detect flapping links, interface errors, packet loss due to congestion and duplex mismatches, which are still a common problem, so use auto-negotiation whenever possible. Use daily or weekly reports to identify alert issues that may have been overlooked. 4. Topology diagram. Understanding the topology is important when troubleshooting, but manually updating the topology map is a time-consuming and error-prone process. Look for SD-WAN control systems that provide dynamic mapping of the physical and logical topology. A baseline is like a true network source of the SD-WAN physical topology, and understanding the difference between the actual state and the desired state can make SD-WAN troubleshooting much easier. Identify the problem The key to troubleshooting network problems is to be methodical. Start at one end and work toward the other, or use a divide-and-conquer strategy. Based on the symptoms, determine the type of problem that may exist. The Open Systems Interconnection model makes it easy to determine the type of problem and direct troubleshooting in the right direction, for example:
If some of the data passes the test, the lower-level functionality is likely working properly, so you can focus your work on the higher levels. SD-WAN Troubleshooting Steps The analysis of the problem usually includes the following points:
The command line interface is useful when you need low-level details. These commands will include show commands for checking system status and test commands such as ping and traceroute. Learn how to apply them to the testing of individual links as well as application flows. Packet capture techniques may be necessary to diagnose problems with an application that would otherwise be incomprehensible. Wireshark's TCP Sequence Space Plotting feature is a useful tool that relies on packet capture files. WAN Operator - Link - Problem You need to understand the link characteristics of packet loss, latency, and jitter. Do they comply with the policies you define? Does the link perform according to the service level agreement (SLA) defined by the link provider? An MPLS link may have an SLA, while a cheap broadband link may not. A divide-and-conquer approach may be necessary here. Selectively enable only one physical link at a time and verify that the link is functioning properly. Then, try combinations of links, eventually getting to a point where all links are functioning. Don't forget to check that the policy is correct. Link characteristics may change, rendering those links unacceptable for any policy. A good approach is to generate a weekly report on link characteristics and usage. For large SD-WAN implementations, the report itself would be too large to be useful, so filter the results to show only those links whose characteristics do not match any policy. Check for MTU mismatches. Applications that use small packets may work, but those that require larger packets may not. If ping and terminal connections succeed, but file transfers, backups, and database synchronization fail, consider an MTU issue. Duplex mismatch. Check the interface statistics to determine if there is a duplex mismatch, even if you cannot check the configuration of each interface on the Ethernet link. Full-duplex interfaces will show runt packets received, and half-duplex interfaces will show late collisions. These counters should contain small values and increment on an active link if there is a mismatch. in conclusion Troubleshooting is half art and half science. I recommend learning how a specific SD-WAN product works and what SD-WAN troubleshooting tools exist during the initial proof-of-concept phase. Create a simple text document that describes the basic steps to take for a specific SD-WAN vendor. This will simplify the SD-WAN troubleshooting process when problems arise in the network. Original link: https://searchnetworking.techtarget.com/feature/A-deep-dive-into-SD-WAN-troubleshooting-and-monitoring |
<<: Risks and opportunities in the 5G era
>>: In-depth analysis of SDN switch configuration and application issues
What is IoT The Internet of Things (IoT) is abbre...
AkkoCloud is a Chinese hosting company founded in...
MPLS has been a popular technology for enterprise...
With the orderly resumption of production and wor...
edgeNAT sent a promotional plan for this year'...
[[400174]] In the 5G era, in order to rapidly adv...
Regardless of whether it was a unified arrangemen...
Fairytale Town is a Chinese hosting company estab...
80VPS is a Chinese hosting company that was estab...
We have talked a lot about network protocols befo...
Recently, at China Mobile's 2021 Science and ...
Huawei Cloud's various activities are also on...
Hosteons announced the launch of VPS in French (P...
The network inside Kubernetes is not much differe...
On July 9, 2023, China United Network Communicati...