Which parameters need to be tuned to support millions of long connections?

Which parameters need to be tuned to support millions of long connections?


File descriptor limits

  • System-level limit: The operating system sets a global file descriptor limit to control the maximum number of files that can be opened simultaneously in the entire system.
  • User-level restrictions: Each user will have a file descriptor limit, which controls the maximum number of files that the user can open simultaneously.
  • Process level limit: Each process will also have a file descriptor limit, which controls the maximum number of files that a single process can open simultaneously

Maximum number of TCP connections to the server

What is the maximum number of connections that a server-side TCP network application can theoretically support?

The server IP and server Port are fixed (that is, the listening TCP program), so the theoretical upper limit of the number of connections depends on the number of combinations of (client IP * client Port).

Of course, if the server program listens to all port numbers from 1 to 65535, the theoretical upper limit of the number of connections becomes:

Of course, in reality, this upper limit cannot be reached for three reasons:

  • IP addresses include classified addresses (A, B, C), intranet addresses, and reserved addresses (D, E). The latter two cannot be used for public network communications.
  • Some ports are reserved for use by specialized programs, such as DNS (53), HTTPS (443)
  • The server memory size has an upper limit, and a TCP socket will be associated with memory buffers, file descriptors and other resources

In summary, the maximum number of connections that a server-side TCP network application can support mainly depends on its memory size (assuming that the kernel parameters have been tuned).

How to test?

How to test the scenario of millions of connections when the test equipment is insufficient? The core idea is to break through the TCP four-tuple limitation.

  • The client is configured with multiple IP addresses, so that each IP address has about 64K port numbers available. Before initiating a connection to the server, bind different IP addresses.
  • The server listens to multiple port numbers, and the client only needs to connect to different server port numbers.

too many open files

First, let's look at a "classic problem" in a high-concurrency scenario: too many open files. The root cause of this problem is that a large number of network (file) connections are opened in a short period of time, exceeding the operating system's limit on the number of file descriptors allowed to be opened by a single process.

What parameters need to be tuned if a single machine is to support 1 million connections?

Solution

Soft open files is a Linux system parameter that affects the maximum number of file handles that a single process in the system can open.

 $ ulimit -n # 默认输出1024 或者65535 1024

This means that a single process can maintain a maximum of 1024 network (such as TCP) connections at the same time.

You can increase this parameter to support a larger number of network connections.

(1) Temporary Adjustment

Only valid in the current session (terminal), invalid after exiting or restarting

 $ ulimit -HSn 1048576

(2) Permanent settings

Modify the configuration file /etc/security/limits.conf:

 $ sudo vim /etc/security/limits.conf # 追加如下内容(例如支持百万连接) # 重启永久生效# 单个进程可以打开的最大进程数量# 表示可以针对不同用户配置不同的值# 当然实际情况中,网络应用一般会独享整个主机/容器所有资源# 调整文件描述符限制# 注意: 实际生效时会以两者中的较小值为准(所以最好的方法就是保持两个值相同) * soft nofile 1048576 * hard nofile 1048576 root soft nofile 1048576 root hard nofile 1048576

Run the sysctl -p command to make the settings effective. The settings will still be effective after reboot.

(3) Other settings

The number of file descriptors opened by a single process cannot exceed the number of file descriptors of all processes in the operating system (/proc/sys/fs/file-max), so the corresponding value needs to be modified:

 $ sudo vim /etc/sysctl.conf # 操作系统所有进程一共可以打开的文件数量# 增加/修改以下内容# 注意: 该设置只对非root 用户进行限制, root 不受影响fs.file-max = 16777216 # 进程级别可以打开的文件数量# 或者可以设置为一个比soft nofile 和hard nofile 略大的值fs.nr_open = 16777216

Run the sysctl -p command to make the settings effective. The settings will still be effective after reboot.

(4) View configuration

 $ cat /proc/sys/fs/file-nr # 第一个数表示当前系统使用的文件描述符数# 第二个数表示分配后已释放的文件描述符数# 第三个数等于file-max 1344 0 1048576

Linux kernel parameter tuning

To support 1 million connections on a single machine, in addition to tuning the file descriptor number parameter just mentioned, you also need to tune some kernel parameters.

Open the system configuration file /etc/sysctl.conf and add (or modify) the following configuration data. The parameter names and their functions are written in the comments.

 # 设置系统的TCP TIME_WAIT 数量,如果超过该值# 不需要等待2MSL,直接关闭net.ipv4.tcp_max_tw_buckets = 1048576 # 将处于TIME_WAIT 状态的套接字重用于新的连接# 如果新连接的时间戳大于旧连接的最新时间戳# 重用该状态下的现有TIME_WAIT 连接,这两个参数主要针对接收方(服务端) # 对于发送方(客户端) ,这两个参数没有任何作用net.ipv4.tcp_tw_reuse = 1 # 必须配合使用net.ipv4.tcp_timestamps = 1 # 启用快速回收TIME_WAIT 资源# net.ipv4.tcp_tw_recycle = 1 # 能够更快地回收TIME_WAIT 套接字# 此选项会导致处于NAT 网络的客户端超时,建议设置为0 # 因为当来自同一公网IP 地址的不同主机尝试与服务器建立连接时,服务器会因为时间戳的不匹配而拒绝新的连接# 这是因为内核会认为这些连接是旧连接的重传# 该配置会在Linux/4.12 被移除# 在之后的版本中查看/设置会提示"cannot stat /proc/sys/net/ipv4/tcp_tw_recycle" # net.ipv4.tcp_tw_recycle = 0 # 缩短Keepalive 探测失败后,连接失效之前发送的保活探测包数量net.ipv4.tcp_keepalive_probes = 3 # 缩短发送Keepalive 探测包的间隔时间net.ipv4.tcp_keepalive_intvl = 15 # 缩短最后一次数据包到Keepalive 探测包的间隔时间# 减小TCP 连接保活时间# 决定了TCP 连接在没有数据传输时,多久发送一次保活探测包,以确保连接的另一端仍然存在# 默认为7200 秒net.ipv4.tcp_keepalive_time = 600 # 控制TCP 的超时重传次数,决定了在TCP 连接丢失或没有响应的情况下,内核重传数据包的最大次数# 如果超过这个次数仍未收到对方的确认包,TCP 连接将被终止net.ipv4.tcp_retries2 = 10 # 缩短处于TIME_WAIT 状态的超时时间# 决定了在发送FIN(Finish)包之后,TCP 连接保持在FIN-WAIT-2 状态的时间(对FIN-WAIT-1 状态无效) # 主要作用是在TCP 连接关闭时,为了等待对方关闭连接而保留资源的时间# 如果超过这个时间仍未收到FIN 包,连接将被关闭# 更快地检测和释放无响应的连接,释放资源net.ipv4.tcp_fin_timeout = 15 # 调整TCP 接收和发送窗口的大小,以提高吞吐量# 三个数值分别是min,default,max,系统会根据这些设置,自动调整TCP 接收/ 发送缓冲区的大小net.ipv4.tcp_mem = 8388608 12582912 16777216 net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.tcp_wmem = 8192 65535 16777216 # 定义了系统中每一个端口监听队列的最大长度net.core.somaxconn = 65535 # 增加半连接队列容量# 除了系统参数外(net.core.somaxconn, net.ipv4.tcp_max_syn_backlog) # 程序设置的backlog 参数也会影响,以三者中的较小值为准net.ipv4.tcp_max_syn_backlog = 65535 # 全连接队列已满后,如何处理新到连接? # 如果设置为0 (默认情况) # 客户端发送的ACK 报文会被直接丢掉,然后服务端重新发送SYN+ACK (重传) 报文# 如果客户端设置的连接超时时间比较短,很容易在这里就超时了,返回connection timeout 错误,自然也就没有下文了# 如果客户端设置的连接超时时间比较长,收到服务端的SYN+ACK (重传) 报文之后,会认为之前的ACK 报文丢包了# 于是再次发送ACK 报文,也许可以等到服务端全连接队列有空闲之后,建立连接完成# 当服务端重试次数到达上限(tcp_synack_retries) 之后,发送RST 报文给客户端# 默认情况下,tcp_synack_retries 参数等于5, 而且采用指数退避算法# 也就是说,5 次的重试时间间隔为1s, 2s, 4s, 8s, 16s, 总共31s # 第5 次重试发出后还要等32s 才能知道第5 次重试也超时了,所以总共需要等待1s + 2s + 4s+ 8s+ 16s + 32s = 63s # 如果设置为1 # 服务端直接发送RST 报文给客户端,返回connection reset by peer # 设置为1, 可以避免服务端给客户端发送SYN+ACK # 但是会带来另外一个问题: 客户端无法根据RST 报文判断出,服务端拒绝的具体原因: # 因为对应的端口没有应用程序监听,还是全队列满了# 除了系统参数外(net.core.somaxconn) # 程序设置的backlog 参数也会影响,以两者中的较小值为准# 所以全连接队列大小= min(backlog, somaxconn) net.ipv4.tcp_abort_on_overflow = 1 # 增大每个套接字的缓冲区大小net.core.optmem_max = 81920 # 增大套接字接收缓冲区大小net.core.rmem_max = 16777216 # 增大套接字发送缓冲区大小net.core.wmem_max = 16777216 # 增加网络接口队列长度,可以避免在高负载情况下丢包# 在每个网络接口接收数据包的速率比内核处理这些包的速率快时,允许送到队列的数据包的最大数量net.core.netdev_max_backlog = 65535 # 增加连接追踪表的大小,可以支持更多的并发连接# 注意:如果防火墙没开则会提示error: "net.netfilter.nf_conntrack_max" is an unknown key,忽略即可net.netfilter.nf_conntrack_max = 1048576 # 缩短连接追踪表中处于TIME_WAIT 状态连接的超时时间net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30

Run the sysctl -p command to make the settings effective. The settings will still be effective after reboot.

Precautions

If the system has used the parameter net.ipv4.tcp_syncookies, the parameter net.ipv4.tcp_max_syn_backlog will automatically become invalid.

Client Parameters

When the server acts as a "client role" (such as a proxy server), each connection needs to be assigned a temporary port number when connecting to the backend server.

 # 查询系统配置的临时端口号范围$ sysctl net.ipv4.ip_local_port_range # 增加系统配置的临时端口号范围$ sysctl -w net.ipv4.ip_local_port_range="10000 65535"

<<: 

>>: 

Recommend

Learn VLAN division from scratch to double your network performance!

When it comes to network security and performance...

Popular science article: What exactly is 5G technology?

[[280757]] Introduction As a post-80s generation,...

How to choose DCIM, a data center infrastructure management tool?

DCIM (Data Center Infrastructure Management) is a...

Dewu App intercepts WiFi at 10,000 meters

0. Summary of the previous situation During a fli...

Life is not easy, where is the future for terminal manufacturers in the 5G era?

From the 1G analog communication era to the 4G mo...

Aryaka: Providing a global network "highway" for multinational enterprises

Gary Sevounts, Aryaka's chief marketing offic...

...

Can 20M Wi-Fi be used as 100M broadband? Let me share some knowledge

Sometimes I hear friends ask such questions: My r...

HostKvm Spring Special Offer: $48/year KVM-4G/40GB/1TB/Hong Kong Data Center

After the Chinese New Year, HostKvm sent a specia...

RongCloud's Yang Pan: Empowering Enterprises to Communicate

[51CTO.com original article] As an indispensable ...