Java performance optimization RPC network communication

Java performance optimization RPC network communication

[[277794]]

The core of the service framework

  1. The core of a large service framework: RPC communication
  2. The core of microservices is remote communication and service governance
  • Remote communication provides a bridge for communication between services, and service governance provides logistical support for services.

The split of services increases the cost of communication, so remote communication can easily become a system bottleneck.

  • Under the premise of meeting certain service governance requirements, the performance requirements for remote communications are the main influencing factors for technology selection.

Service communication in many microservice frameworks is based on RPC communication

  • Without component expansion, Spring Cloud implements RPC communication based on Feign components (based on HTTP+JSON serialization)
  • Dubbo is based on SPI and extends many RPC communication frameworks, including RMI, Dubbo, Hessian, etc. (the default is Dubbo+Hessian serialization)

Performance Testing

Based on Dubbo:2.6.4, single TCP long connection + Protobuf (better response time and throughput), short connection HTTP + JSON serialization

RPC Communication

Architecture Evolution

Whether it is microservices, SOA, or RPC architecture, they are all distributed service architectures, and they all need to achieve mutual communication between services. This kind of communication is usually collectively referred to as RPC communication.

concept

  1. RPC: Remote Process Call, remote service call, communication technology for requesting remote computer program services through the network
  2. The RPC framework encapsulates underlying network communication and serialization technologies
  3. You only need to introduce the interface packages of each service in the project, and you can call the RPC service in the code (just like calling a local method)

RMI

  1. RMI: Remote Method Invocation
  2. RMI is the RPC communication framework that comes with JDK. It has been maturely applied to EJB and Spring and is the core solution for pure Java network distributed application systems.
  3. RMI enables a virtual machine application to call remote methods in the same way as calling local methods. RMI encapsulates the specific details of remote communication.

Implementation principle

The RMI remote proxy object is the core component of RMI. In addition to the virtual machine where the object itself is located, other virtual machines can also call methods of this object.

These virtual machines can be distributed on different hosts, and through remote proxy objects, remote applications can communicate using network protocols and services.

Performance bottleneck under high concurrency

Java default serialization

RMI serialization uses Java default serialization, which has poor performance and does not support cross-language serialization.

TCP short connection

RMI is implemented based on TCP short connections. In high-concurrency situations, a large number of requests will lead to the creation and destruction of a large number of TCP connections, which consumes a lot of performance.

Blocking Network IO

In socket programming, the traditional IO model is used. In high-concurrency scenarios, network communication based on short connections is prone to IO blocking, which greatly reduces performance.

Optimize Path

TCP/UDP

The network transmission protocols are TCP and UDP, both of which are based on Socket programming

Socket communication based on TCP protocol is connected

The data transmission needs to be done through three-way handshake to achieve the reliability of data transmission. The data transmission has no boundaries and adopts the byte stream mode.

Socket communication based on UDP protocol. The client does not need to establish a connection, but only needs to create a socket to send data to the server.

Socket communication based on UDP protocol is unreliable

The data sent by UDP is in datagram mode. Each UDP datagram has a length, which is sent to the server together with the data.

In order to ensure the reliability of data transmission, TCP protocol is usually used.

In a local area network, if there is no requirement for data transmission reliability, you can consider using the UDP protocol, which is more efficient than the TCP protocol.

Long Connection

Communication between services is different from communication between clients and servers.

Due to the large number of clients, implementing requests based on short connections can avoid occupying connections for a long time, which leads to waste of system resources.

For communication between services, there are not as many connected consumers as clients, but the number of requests from consumers to servers is the same.

Based on the long connection implementation, a large number of operations of establishing and closing TCP connections can be saved, thereby reducing the performance consumption of the system and saving time.

Optimizing Socket Communication

Traditional Socket communication mainly has problems such as IO blocking, thread model defects and memory copying. Netty4 has made many optimizations to Socket communication programming.

Implementing non-blocking IO: The multiplexer Selector implements non-blocking IO communication

Efficient Reactor thread model

Netty uses the master-slave Reactor multi-threaded model

Main thread: used for client connection request operations. Once the connection is established successfully, it will listen to IO events and create a link request after listening to the event.

The link request will be registered to the IO worker thread responsible for IO operations, and the IO worker thread will be responsible for subsequent IO operations.

The Reactor thread model solves the problem that a single NIO thread cannot monitor a large number of clients and meet a large number of IO operations in high concurrency situations.

4.Serial design

After receiving the message, the server has link operations such as encoding, decoding, reading and sending.

If these operations are implemented in parallel, it will undoubtedly lead to serious lock contention, which will lead to a decrease in system performance.

In order to improve performance, Netty uses serial lock-free to complete link operations and provides Pipeline to implement various operations of the link without switching threads during operation.

5. Zero Copy

When data is sent from memory to the network, it is copied twice, first from user space to kernel space, and then from kernel space to network IO.

The ByteBuffer provided by NIO can use the Direct Buffer mode

Directly open up a non-heap physical memory, without the need for a secondary copy of the byte buffer, and write data directly to the kernel space

6. Optimize TCP parameter configuration to improve network throughput. Netty can be set based on ChannelOption

  • TCP_NODELAY: Used to control whether to enable the Nagle algorithm
  • The Nagle algorithm combines small data packets into a large data packet by caching, thereby avoiding sending a large number of small data packets and causing network congestion.
  • In latency-sensitive application scenarios, you can choose to turn off this algorithm.
  • SO_RCVBUF / SO_SNDBUF: The size of the socket receive buffer and send buffer
  • SO_BACKLOG: Specifies the size of the client connection request buffer queue
  • The server processes client connection requests in sequence and can only process one client connection at a time.
  • When multiple clients come in, the server will put the client connection requests that cannot be processed in the queue for processing
  • SO_KEEPALIVE
  • The connection will check the connection status of the client that has not sent data for a long time. After detecting that the client is disconnected, the server will recycle the connection
  • Setting this value to a smaller value can improve the efficiency of recycling connections.

Customized message format

Design a set of messages to describe specific verification, operations, data transmission, etc.

In order to improve transmission efficiency, the design can be based on actual conditions, and try to achieve characteristics such as small message size, satisfactory functions, and easy parsing.

Field length (bytes) Remarks Magic number 4 Protocol identifier, similar to the magic number of the bytecode, usually a fixed number Version number 1 Serialization algorithm 1 Protobuf / Thrift instruction 1 Similar to the addition, deletion, modification and query in HTTP Data length 4 Data N

Codec

To implement a communication protocol, you need to be compatible with an excellent serialization framework

If it is just a simple data object transmission, you can choose Protobuf serialization with relatively good performance, which is conducive to improving the performance of network communication

TCP parameter settings for Linux

Three-way handshake

Four waves

Configuration items

1.fs.file-max=194448/ulimit

2.net.ipv4.tcp_keepalive_time

3.net.ipv4.tcp_max_syn_backlog

4.net.ipv4.ip_local_port_range

5.net.ipv4.tcp_max_tw_buckets

6.net.ipv4.tcp_tw_reuse

Remark

1. The default limit of the number of files that a single process can open in Linux is 1024, and Socket is also a file

2. The function is consistent with Netty's SO_KEEPALIVE configuration item

3. The length of the SYN queue. Increasing the queue length can accommodate more network connections waiting to be connected.

4. When the client connects to the server, the source port number needs to be dynamically allocated. This configuration item indicates the port range for outbound connections.

5. When a connection is closed, TCP will complete a connection closing operation by waving four times. When the number of requests is large, there will be a large number of connections in TIME_WAIT state on the consumer side. This parameter can limit the number of connections in TIME_WAIT state. If the number of TIME_WAIT connections exceeds this value, TIME_WAIT will be cleared immediately and a warning message will be printed.

6. Each time the client connects to the server, it will obtain a new source port to achieve the uniqueness of the connection. If the number of connections in the TIME_WAIT state is too large, the port number will be occupied for a longer time. Since the connection in the TIME_WAIT state is a closed connection, the newly created connection can reuse the port number.

<<:  What stage have 5G, autonomous driving, and artificial intelligence reached? One picture can tell you

>>:  Experts: 6G will be available in 10 years and is expected to be 100 times faster than 5G

Recommend

Li Xue: Today's goal is to grow together with the company

[51CTO.com original article] As enterprises have ...

Clouveo: $3.5/month KVM-1GB/15G NVMe/2TB/Los Angeles Data Center

You may not be familiar with Clouveo. It is a sit...

Twists and turns: ZTE may have to fight a protracted battle to lift the ban

On May 22, foreign media reported that Trump prop...

What new developments have occurred in the 5G field in the first half of 2022?

On June 6, 2022, as 5G licenses were issued for t...

6G Trends in 2023: Architecture drives key technologies from broad to deep

With the large-scale commercial use of 5G network...

5G Development Trend Survey

Overview The COVID-19 incident in 2020 did not we...

Brief analysis: What exactly does a smart network card do?

What exactly is SmartNIC (Intelligent Network Car...

...

HostYun Los Angeles CU2 (AS9929) VPS simple test

I looked through the previous articles and found ...