1. OverviewWith the rapid development of network scale, the quality of network status has been directly related to the daily income of enterprises. Every second of failure will lead to a large number of user losses and economic losses. Every enterprise is constantly improving its network monitoring methods, but in the process of building a monitoring system, it is inevitable to face the following difficulties :
Therefore, how to quickly discover network problems and locate abnormal traffic at the lowest possible monitoring cost has become a priority issue that must be addressed in large enterprises, and many network traffic analysis technologies have also emerged. sFlow technology is such an efficient and flexible solution. It can extract part of the information in the data packet through traffic sampling technology, so as to realize continuous monitoring of a large amount of network traffic data. At the same time, sFlow technology also has flexible configuration and scalability, can be customized according to actual needs, and supports a variety of network devices and protocols. These advantages make sFlow technology widely used in modern network monitoring and management. 2. Common Network Traffic Collection TechnologiesMainstream network traffic collection is mainly divided into two types: full traffic collection and sampled traffic collection. 2.1 Full traffic collectionFull traffic collection includes methods such as port mirroring and optical splitting equipment. In a network with huge traffic, the use of port mirroring will not only increase the latency of the entire link, but also increase the pressure on network equipment under large throughput conditions. Although optical splitting equipment can reduce link latency, it also has a high purchasing price threshold. In addition, due to the large scale of IDCs in large enterprises, the resulting amount of full traffic data will also surge. If you want to fully rely on self-research to do full traffic data analysis, you not only need certain storage and computing resources, but also a certain software development cycle, which is not conducive to the rapid construction of the project. 2.2 Sampling flow collectionIn the absence of a traffic analysis system, the advantages of using sampling analysis are reflected. Compared with full traffic, it has low deployment cost and low data analysis cost, and is very suitable for quickly locating abnormal traffic and analyzing the trend proportion within the network. The following mainly compares the advantages and disadvantages of the two sampling methods, sFlow and Netflow. sFlow has a wider range of traffic monitoring. In an IDC internal environment that meets hardware requirements, using sFlow for sampling traffic monitoring can effectively reduce the load on network equipment and provide real-time traffic monitoring to deal with sudden network anomaly scenarios. 3. System Design Based on sFlow3.1 Basic designWhen the hardware conditions are met, the basic system design based on sFlow is very simple. The data closed loop of the entire process can be achieved by using sFlow agent + sFlow collector + sFlow analyser. sFlow agent : By enabling the sFlow capability on relevant network devices, setting parameters such as sampling ratio and specifying the corresponding address of the collection end, the port receiving and sending traffic can be collected. What is more important on the agent side is how to determine the scope of network devices for collection. Compared with the purposeless deployment of all network devices, it is more meaningful to deploy them on the border core network devices, because all external traffic must eventually pass through the border network devices. While being able to better monitor external traffic anomalies, it can also reduce the burden of data storage. sFlow collector : collects and parses sFlow datagrams collected and transmitted by the agent side. sFlow analyser : Visually analyzes and displays formatted data for network administrators to conduct effective observation and analysis. picture 3.2 Open Source + Self-developed: Advanced ArchitectureAfter determining the basic architecture, we can consider how to select components and expand customized functions. The open source solution elastiflow provides us with a good example. The author has expanded it based on open source to meet more customized functions. sFlow agent : Use the form of reporting to a unified VIP to sample port traffic (the official sampling ratio must be 2^n). You can use the LB capability of the VIP to perform load balancing, so that sFlow messages are evenly distributed to the fixed port of the collection end. Set different sampling ratios for different network lines, which can reduce data storage while ensuring higher accuracy for important lines. picture sFlow collector : Using the ELK suite for data collection and visualization analysis is one of the more mature technical solutions. Therefore, we use logstash for native data packet collection and parsing on the collection side. The author of elastiflow used the native udp-sFlow packet parsing component in logstash for data parsing, but the author found in actual testing that although this solution can obtain a better structured data format, the performance of data parsing is very poor. In the case of a large amount of data, a large amount of data packet loss will occur, resulting in a decrease in data accuracy. However, sFlowtool is written in C language at the bottom layer, and its performance is excellent. A single physical machine (32c64g) can reach 10w+tps. Although the data structure after sFlow packet parsing is a little weak, it can be cleaned and structured in the subsequent analysis module. The data analyzed by sFlowtool is shown below. The data sent to the kafka message queue via logstash. sFlow analyser : It consumes data from Kafka in real time, cleans and structures the data, and uses third-party meta data to software-define the parsed data for subsequent storage and analysis. database+display : Use Elasticsearch+Kibana for storage and visualization, and use mertic beat to monitor the collection performance of logstash. Kibana, as a data visualization solution for Bi, provides most of the charts and dashboards that can be used for free, which can be used for good visualization analysis. 3.3 Analysis-side software definitionWith native data, we can already perform basic session traffic analysis based on some IP quintuples, etc. However, the value that traffic data can reflect is far more than that. Using other platforms such as cmdb within the enterprise can provide greater value for our traffic data. Network device dimension : Through the switch address and inbound and outbound ports in the data, the inbound and outbound directions of the traffic can be determined based on the switch port index configured for collection. Other attributes such as channels, lines, and device names can also be assigned based on the network device IP. IP dimension : The IP quintuple provides a higher possibility of exploring data. We can determine the project, department and other information of the IP address, and can also reversely associate the domain name. This can quickly locate the business party when analyzing and judging abnormal traffic, greatly improving operation and maintenance efficiency. 3.4 Self-developed compression storage and visualizationSince the data compression effect of Elasticsearch itself is not ideal, the data we store for a long time is huge and bloated. Correspondingly, the olap-type database Druid solves this problem very well. After data sampling, it undergoes strict structured processing on the analysis side, which can achieve good data compression in Druid. In addition, Druid's built-in data pre-aggregation capability can also better help us to reduce the precision of historical data and reduce storage pressure. After switching the storage engine, it means that Kibana can no longer be used for general display. The use of the self-developed web service framework can also cope with flexible demand scenarios and achieve more customized analysis. 3.5 Lightweight stream processing model based on Celery designAlthough the traffic data has been sampled and reduced in precision, the overall data volume is still huge. Efficient and fast stream processing can reduce the overall system latency to within 30 seconds, which can help network managers find problems more quickly. In addition to using traditional stream processing tools, we can also use Celery to build a lightweight, efficient and scalable distributed stream processing cluster. picture Celery is a simple, flexible and reliable distributed system that processes a large number of messages. It focuses on asynchronous task queues for real-time processing and also supports task scheduling. Based on the real-time asynchronous processing characteristics of Celery, we designed the consumption link of celerybeat → watcher → producer → consumer to perform stream processing. celery beat : As a trigger for scheduled tasks, it dispatches a new task to the watcher queue every 1 second. Watcher worker : After getting the task in the queue, forward it to the producer and perform congestion control on the producer queue according to the set maximum queue value. Producer worker: After getting the task in the queue, it obtains the collected traffic data from Kafka, sends it to the consumer queue in batches according to the batch size, and performs congestion control on the consumer queue according to the set maximum queue value. Consumer worker : After receiving a task from the queue, it cleans the collected data, adds business tags, and writes it to another Kakfa or directly to the database based on the business information in the local cache/shared cache. Each role and node can communicate through Celery broker to achieve distributed cluster deployment. For consumer unit operations, eventlet can be used to start as a coroutine to ensure high concurrent consumption of the cluster. IV. Application Scenarios4.1 Traffic Analysis at the Data Center DimensionBy matching IP addresses based on network cmdb and summarizing traffic data at the computer room level, we can obtain an analysis of the computer room's overall inbound and outbound traffic. When the IDC interacts with the outside world, the overall traffic trend is a direct criterion for determining the degree of bandwidth usage. 4.2 Network line information associationBy mapping the logical information of network devices based on ip+ifindex, the core channel lines can be displayed in an aggregated manner. When there are some abnormal problems such as public network line anomalies and full bandwidth of dedicated lines, the first time point when the fault occurs can be directly and accurately located by observing the line analysis. 4.3 IP session information miningAlthough sflow only captures the header information of the message and does not include the data packet part, the IP five-tuple itself also provides great value for network traffic analysis. By using session information, we can accurately and effectively locate the IP address of abnormal traffic. Through IP+service port, we can even locate the specific service and process that generates abnormal traffic, so as to make the next decision. In addition, IP can also be linked with the CMDB within the enterprise to locate the resource group to which the IP belongs, so as to obtain the traffic proportion analysis generated by different departments/administrative groups. This is also conducive to the first perception of related business when abnormal traffic is generated, and notification and control. 4.4 IP Location AnalysisIn addition to combining internal information, through the location information provided by the operator, we can view the source of IP access and conduct relevant location analysis and dashboard production. V. ConclusionTo achieve comprehensive and real-time monitoring and analysis of the network, we must rely on advanced and effective network monitoring protocols and technologies to meet the growing needs of the business. Although sFlow-based traffic analysis has great advantages in lightweight construction, it can also respond quickly to abnormal traffic based on traffic trends and distribution ratios. However, sFlow's own sampling does not contain information about the data packets in the message, and it cannot provide accurate positioning and solutions for some network security attack and defense issues such as SQL injection and data security. Therefore, full traffic analysis should also be an indispensable part of the traffic analysis system in the future. The combination of the two can provide more comprehensive and refined traffic monitoring to protect the network security of the data center. VI. Future OutlookAlthough sFlow technology has been widely used in the field of network performance monitoring and management, it will need more capabilities in the face of larger-scale network traffic scenarios in the future: 1. Support more protocols and applications : The idea of sFlow monitoring is not only applicable to network traffic, but also can monitor application traffic, virtualization environment, cloud platform, etc. In the future, sFlow technology should support more protocols and applications to better adapt to new network environments. 2. Adaptive traffic collection technology : The traffic collection technology of sFlow technology is fixed-period, but as the network traffic changes, fixed-period collection may not accurately reflect the real-time status of the network. In the future, sFlow monitoring technology should support adaptive traffic collection technology, which can automatically adjust the collection cycle according to the actual network traffic changes. 3. Convenient management function : The current configuration of sFlow relies more on network administrators to configure on the switch, and cannot achieve one-click distribution, automatic discovery, rapid adjustment of sampling ratio and other functions. In the future, there will be a need for an sFlow management platform that can conveniently issue commands and hot-load configuration changes. |
<<: 5G networks and IoT: Research reveals cybersecurity risks
>>: PoE Basics: Do You Really Understand PoE?
A few days ago, Xiao Wei shared with everyone the...
The "Made in China 2025 Strategy" has e...
[[391876]] Recently, market research firm Gartner...
Hello everyone, I am Xiaolin. A reader was asked ...
With the advent of network automation, programmab...
Launchvps released a Black Friday discount plan, ...
[[356210]] This article is reprinted from WeChat ...
[[353775]] The transport layer is located between...
A Virtual Private Server (VPS) is a popular hosti...
As the COVID-19 pandemic is gradually brought und...
10gbiz is a foreign hosting service provider foun...
Recently, Ericsson released the ten-year special ...
Sharktech's special promotion machine this mo...
On the morning of April 15, the "2022 API Se...
[Beijing, China, June 5, 2019] Huawei released it...