The Internet is like this: Network optimization practice for quick payment transaction scenarios

The Internet is like this: Network optimization practice for quick payment transaction scenarios

introduction

In recent years, with the development of mobile payment technology and the popularization of payment scenarios, "scan and pay" has become an important part of our daily life. In key areas and life scenarios such as daily consumption, transportation, social security and medical care, quick payment brings us a lot of convenience. As a key link in the mobile payment infrastructure, the stability of the bank's quick payment system will directly affect the payment experience of every consumer.

In view of the frequent interactive needs of quick payment business, the quick payment system needs to cope with the daily huge transaction requests. From the network level, it has the characteristics of high-frequency short connection, that is, the connection is established and released quickly. The connection here is based on TCP (Transmission Control Protocol). In recent years, Bank G has continued to optimize the quick payment network, continuously improve the transaction success rate, and help improve the quick payment customer experience.

Introduction to TCP Protocol

TCP is a connection-oriented, reliable transmission communication protocol. Its working stages are divided into connection establishment, data transmission and connection termination. The five-tuple we often talk about, namely source IP address, source port, destination IP address, destination port and transport layer protocol (such as TCP), together constitute a connection.

To ensure reliable message transmission, the TCP protocol sets a sequence number for each message, which also ensures that the messages transmitted to the receiving entity are received in order. The receiving entity then sends a corresponding confirmation (ACK) for the bytes that have been successfully received. If the sending entity does not receive the confirmation within a reasonable round-trip time (RTT), the corresponding data will be retransmitted.

First, the TCP protocol establishes a connection through a "three-way handshake", as shown in the following figure:

Figure 1 TCP three-way handshake

1. The client sends a TCP message with the SYN flag set to the server, and then the client enters the SYN-SENT state;

2. After receiving the TCP message from the client, the server sends a TCP message with the SYN and ACK flags, and then enters the SYN-RCVD state;

3. After receiving the TCP message with SYN and ACK flags from the server, the client sends a TCP message with ACK flag, and then enters the ESTABLISHED state. After receiving the TCP message with ACK flag from the client, the server also enters the ESTABLISHED state, and then both parties enter the data message interaction stage.

After the three-way handshake is completed, the client and server successfully establish a TCP connection and start data transmission. After the data transmission is completed, the TCP protocol terminates the connection through "four waves". The process is shown in the following figure (taking the client actively initiating the closing of the connection as an example):

Figure 2 TCP protocol four waves

1. The client actively sends a TCP message with the flag bit set to FIN, and then enters the FIN-WAIT-1 state;

2. After receiving the TCP message with the FIN flag set to the client, the server sends a TCP message with the ACK flag set to the client, and then enters the CLOSE-WAIT state;

3. After receiving the TCP message with the ACK flag sent by the server, the client enters the FIN-WAIT-2 state. After preparing to release the connection, the server actively sends a TCP message with the FIN and ACK flags to the client, and then enters the LAST-ACK state.

4. After receiving the TCP message with FIN and ACK flags from the server, the client sends a TCP message with ACK flag to the server and then enters the TIME-WAIT state. The server closes the connection after receiving the TCP message with ACK flag sent by the client. The client also closes the connection after waiting for the TIME-WAIT state (2 maximum message survival times) to time out.

Introduction to Quick Payment Architecture

The business logic process of the quick payment system usually includes: the merchant initiates a quick payment transaction request, which reaches the bank's three-party intermediary business DMZ area through the operator's dedicated line, and then connects and interacts with the bank's quick payment system after being processed by the bank's firewall, load balancing or encryption and decryption equipment. Two connections will be established for each transaction. The specific process is as follows: the merchant first establishes a TCP connection with the bank's encryption and decryption equipment through a "three-way handshake" (connection 1 shown in Figure 3), the merchant initiates a transaction request to the bank's encryption and decryption equipment to decrypt the encrypted message, and then the encryption and decryption equipment uses the proxy mode (using the merchant's address and port) to establish a new connection with the bank's cardless quick front-end server through a "three-way handshake" (connection 2 shown in Figure 3), and returns the transaction processing results to the merchant after being processed by the bank's backend. After the transaction processing is completed, the merchant terminates the connection 1 with the encryption and decryption equipment through a "four-way handshake", and then the encryption and decryption equipment terminates the connection 2 with the back-end server.

To ensure the safe and smooth operation of the quick payment business, Bank G has deployed a network traffic analysis tool to capture and analyze network traffic in real time at multiple network nodes simultaneously, monitor quick payment transactions, and once a failed transaction is found, it can perform a retrospective analysis of each failed transaction through data packets as a basis for continuous optimization and improvement.

Figure 3 Schematic diagram of quick payment architecture

Table 4 shows the source and destination addresses and ports of two connections in a quick payment transaction of Bank G through various network paths. The bank firewall device controls the access to the transaction and converts the destination address (i.e., the bank encryption and decryption device address), the first-layer load balancing device loads the encryption and decryption device, and the second-layer load balancing device loads the cardless quick front-end server.

Table 4 Quick payment TCP connection source and destination addresses and ports

Load balancing device source port rapid reuse problem and optimization

During the early routine operation and maintenance process, the operation staff reported that there would be a small number of connection transactions failed in the quick payment business every day. After the network administrator captured the packet and analyzed it, it was found that the encryption and decryption device sometimes actively sent a Reset message to the merchant to disconnect the link, which was inconsistent with the merchant's active connection closure described in the previous article. The network staff conducted an in-depth analysis of the abnormal transaction connection on a packet-by-packet basis through the network traffic analysis platform. First, the network traffic analysis was performed on the encryption and decryption device side that generated the disconnection alarm, confirming that the encryption and decryption device actively sent a Reset message to the merchant, as shown in Figure 5:

Figure 5 The encryption and decryption device actively sends a Reset message to the merchant

To further confirm the root cause of the problem, the technicians continued to analyze the network traffic at a load balancing node. At this time, they found the load balancing source port conversion mechanism. Due to the port conversion, the source port used by the previous connection was quickly used, as shown in Figure 6:

Figure 6 Fast reuse of source ports in a single layer of load balancing

The reason for rapid reuse is that after the merchant terminates connection 1 with the encryption and decryption device through "four waves", it sends a Reset message to quickly recycle the connection, so the first layer of load balancing and encryption and decryption devices will quickly recycle connection 1. After the recycling, the next new connection has a probability of reusing the source port of the previous connection. At this time, because the encryption and decryption device has quickly recycled the previous connection 1, the new connection 1 that reuses the source port can be established normally, as shown in Figure 7:

Figure 7: Connection 1 of the fast multiplexing source port is established normally

As mentioned above, when a new connection is successfully established, the encryption and decryption device will use the source address and source port of connection 1 to initiate connection 2 with the back-end cardless express front-end server (destination address is the Layer 2 load balancing VS address). At this time, because connection 2 is a standard "four-wave" disconnection, the encryption and decryption device sends a TCP message with the ACK flag to the server and enters the TIME-WAIT state. During the TIME-WAIT waiting time for processing connection recycling, it cannot accept new connections with the same five-tuple. If a new connection with the same five-tuple appears during this time, the new connection cannot be established normally, and the encryption and decryption device sends a Reset disconnection to the merchant. Moreover, the larger the transaction volume, the higher the possibility of five-tuple conflict, and the higher the number and probability of transaction failures. At the business level, it is manifested as a quick payment transaction initiated by the customer failing and needing to be re-initiated, affecting the customer's experience.

After clarifying the cause and principle of the problem, the idea of ​​network optimization is to avoid the problem that connection 2 cannot be established normally due to TIME-WAIT, and at the same time, it is necessary to ensure that various optimization operations will not cause new impacts on actual production transactions. After comprehensively considering various factors, network technicians carried out optimization work in stages. First, the source port conversion function of the first-layer load balancing device is set to non-conversion mode, that is, the source port of connection 1 initiated by the merchant is no longer converted, reducing the probability of connection 1 that quickly reuses the source port of the previous connection. After optimization, the number of daily connection resets has decreased to a certain extent, as shown in Figure 8.

Figure 8 Changes in connection reset transactions after optimization of the source port of a load balancing device

Table 9 shows the source and destination addresses and ports of two connections in a quick payment transaction flowing through each network path after optimization.

Table 9 Quick payment TCP connection source and destination addresses and ports

Merchant source port rapid reuse problem and optimization

After turning off the source port translation setting of the first layer of load balancing, the network technicians found that the connection was still reset every day. Therefore, they continued to analyze the network traffic towards the nodes close to the merchants and found that the source port was quickly reused when the merchants issued transaction requests, as shown in Figure 10:

Figure 10 Merchant-side source port fast reuse

There are two optimization ideas at this time. One is to imitate the idea of ​​solving the problem of rapid reuse of the source port of the first layer of load balancing to reduce the probability of the same five-tuple connection 1. Considering that the rapid reuse of the source port on the merchant side is not within the maintenance scope of the G bank network technicians, it is difficult to optimize it. The merchant side will establish a connection 1 with the bank's encryption and decryption equipment. At this time, you can try to optimize from the destination end to achieve the same effect.

The layer of load balancing device that provides load for the encryption and decryption device at the destination end of connection 1 uses the "minimum number of connections" load balancing algorithm by default, that is, according to the current connection status of the back-end server, it dynamically selects a server with the least number of backlog connections and forwards the transaction initiated by the merchant to one of the multiple encryption and decryption devices. Compared with the "minimum number of connections" load balancing algorithm, the "round robin" load balancing algorithm allocates transaction requests to the back-end server in sequence. The destination address allocated to the latter connection is inconsistent with that allocated to the previous connection. In principle, it can reduce the probability of the same five-tuple connection 1 appearing. In the card-free quick transaction scenario, the connection processing mechanism on each encryption and decryption device is consistent. After adjusting to the "round robin" load balancing algorithm, the number of connections on each encryption and decryption device can also be relatively balanced.

After sufficient testing, network technicians carried out the second phase of network optimization and adjusted the first-layer load balancing algorithm to "polling". After the adjustment, the number of transactions with connection resets per day further decreased, proving that this step of optimization is also effective, as shown in Figure 11:

Figure 11 Changes in connection reset transactions after optimization of the first-layer load balancing algorithm

However, there are still a very small number of connection resets at this time. The reason is that the transaction volume of quick payment is large. After adjusting the load balancing algorithm of one layer to "polling", there are still a very small number of connections that quickly reuse the source port "polling" to the same encryption and decryption device. At this time, the network technicians consider using the second optimization idea, that is, reducing the waiting time for connection 2 to be recovered. If connection 2 can be recovered in a shorter time, even if connection 1 has the same five-tuple situation, connection 2 can be established normally. As mentioned above, connection 2 is closed by the encryption and decryption device through "four waves". The encryption and decryption device has a TIME-WAIT waiting time. The length of the TIME-WAIT time will determine the time for connection 2 to be recovered. To this end, after sufficient testing and pilot testing in batches, network technicians adjusted the TIME-WAIT time of the encryption and decryption device from 1 second to 100 milliseconds. After the third stage of optimization, the number of daily connection reset transactions dropped to 0, as shown in Figure 12:

Figure 12 Changes in connection reset transactions after TIME-WAIT time optimization of encryption and decryption devices

Table 13 shows the source and destination addresses and ports of two connections in a quick payment transaction flowing through each network path after overall optimization.

Table 13 Quick payment TCP connection source and destination addresses and ports

Summarize

The quick payment business has the characteristics of short connection and high-frequency transaction at the application and network levels. After three key stages of network optimization, the success rate of Bank G's quick payment transaction system has been significantly improved, which has strongly supported a large number of daily concurrent transactions and business visits during peak periods of e-commerce promotions such as "Double Eleven" and "Double Twelve".

With the transformation of financial business development, in order to achieve high-quality development, the Information Technology Department of Bank G continues to deepen the "123+N" digital banking development system, continue to exert the momentum of science and technology, and solve new development challenges through financial technology. The network will also continue to stand from the business perspective, pay attention to various key business transaction scenarios, optimize the network transmission path under the distributed architecture, improve the transaction response speed, continuously consolidate the underlying infrastructure, continuously optimize and improve, improve customer service and usage experience, and help the banking business reach a new level.

<<:  Why do we need a websocket protocol when there is an HTTP protocol?

>>:  The road to intelligent manufacturing: Ruijie builds a simple and efficient production network foundation for pharmaceutical companies

Recommend

An introduction to different types of edge computers

Before buying edge computer hardware, we must fir...

Why 5G Private Networks Are Critical to Enterprise Digital Transformation

Today’s enterprise manufacturing facilities are u...

What else to look forward to in the communications industry in 2021?

[[373658]] This article is reprinted from the WeC...

Aruba Again Named a Leader in Gartner Magic Quadrant for WAN Edge Infrastructure

September 22, 2021 – Aruba, a Hewlett Packard Ent...

The 5G era is unlikely to change the market structure of operators

The three major domestic operators have all annou...

Three steps to converge cloud and edge computing for IoT

The Internet of Things has grown rapidly over the...

Top ten trend predictions: Where will domestic telecom operators go in 2021?

After a complicated 2020, the wheel of time has e...

What exactly is the performance problem with TCP?

Overview The performance issue of TCP is essentia...

Architect: We are more afraid of 200 than 404!

Young man, you are reading a short hardcore scien...

Detailed explanation: What is a network switch?

A network switch is a device that extends a netwo...

...