Manually simulate and implement Docker container network!

Manually simulate and implement Docker container network!

[[435189]]

Hello everyone, I am Fei Ge!

Nowadays, server virtualization technology has developed to a deep level. Now many companies in the industry have migrated to containers. The code we develop is likely to run on containers. Therefore, it is very important to have a deep understanding of the working principle of container networks. This will help you know how to deal with problems in the future.

Network virtualization, in fact, can be summarized in one sentence as using software to simulate a real physical network connection. For example, Docker is an independent network environment simulated on the host machine using pure software. Today we will build a virtual network by hand, access external network resources in this network, and monitor ports to provide external services.

After reading this article, I believe you will have a better understanding of Docker virtual network. Okay, let's get started!

1. Review of basic knowledge

1.1 veth, bridge and namespace

Veth in Linux is a pair of virtual network card devices, which is very similar to our common lo. In this device, after sending data from one end, the kernel will look for the other half of the device, so it can be received at the other end. However, veth can only solve the problem of one-to-one communication. For details, see Easily understand the basics of Docker network virtualization-veth device!

If there are many veth pairs that need to communicate with each other, a virtual switch called bridge needs to be introduced. Each veth pair can connect one end to the interface of the bridge. The bridge can forward data between ports like a switch, so that the veths on each port can communicate with each other.

Namespace solves the isolation problem. By default, each virtual network card device, process, socket, routing table and other network stack-related objects belong to the default namespace init_net. However, we hope that different virtualization environments are isolated. Taking Docker as an example, container A cannot use the device, routing table, socket and other resources of container B, or even take a look at them. Only in this way can we ensure that different containers can reuse resources without affecting the normal operation of other containers. See

Through veth, namespace and bridge, we can virtualize multiple network environments on a Linux system, and they can communicate with each other and with the host machine.

However, after these three articles, we still have one problem left to solve, which is the communication between the virtualized network environment and the external network. Take the Docker container as an example. The service in the container you start must need to access an external database. In addition, you may need to expose port 80 to provide services to the outside world. For example, in Docker, we use the following command to make the web service on port 80 of the container accessible to the external network.

Our article today mainly solves these two problems: one is to access the external network from the virtual network, and the other is to provide services in the virtual network for the external network to use. To solve them, routing and NAT technologies are needed.

1.2 Routing

When Linux sends data packets, it involves the routing process. This includes both sending data packets locally and forwarding data packets passing through the current machine.

Let's first look at the local sending of data packets. We have discussed the local sending in the article "25 pictures, 10,000 words, disassembling the Linux network packet sending process".

Routing is actually very simple, it is to choose which network card (including virtual network card device) to write data into. Which network card should be selected? The rules are specified in the routing table. There can be multiple routing tables in Linux, the most important and commonly used are local and main.

The local routing table uniformly records the routing rules of the local network card device IP in the network namespace.

  1. #ip route list table   local  
  2.  
  3. local 10.143.xy dev eth0 proto kernel scope host src 10.143.xy
  4.  
  5. local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1

Other routing rules are generally recorded in the main routing table. You can view them using ip route list table local or the shorter route -n command.

Let's look at the forwarding of data packets passing through the current machine. In addition to local transmission, forwarding also involves the routing process. If Linux receives a data packet and finds that the destination address is not a local address, it can choose to forward the data packet from one of its network card devices. At this time, just like local transmission, it also needs to read the routing table. According to the configuration of the routing table, it chooses which device to forward the packet from.

However, it is worth noting that the forwarding function on Linux is disabled by default. That is, if the destination address is not the local IP address, the packet will be discarded by default. Some simple configuration is required before Linux can do the same work as a router and forward data packets.

1.3 iptables and NAT

The Linux kernel network stack is basically a pure kernel-mode thing in terms of operation, but in order to cater to various user-level needs, the kernel opens some holes for user-level intervention. Among them, iptables is a very commonly used tool for intervening in kernel behavior. It has buried five hook entrances in the kernel, which are commonly known as the five chains.

When Linux receives data, it enters ip_rcv at the IP layer for processing. Then it performs routing judgment. If it is found to be local, it enters ip_local_deliver for local reception and finally sends it to the TCP protocol layer. In this process, two HOOKs are embedded. The first one is PRE_ROUTING. This code will execute various tables in pre_routing in iptables. After finding that it is a local reception, it will then execute LOCAL_IN, which will execute the input rules configured in iptables.

When sending data, after searching the routing table to find the exit device, the packet is sent to the device layer through functions such as __ip_local_out and ip_output. In these two functions, various rules opened by OUTPUT and PREROUTING are passed respectively.

If it is a forwarding process, Linux receives a data packet and finds that it is not a local packet. It can find a suitable device to forward it by searching its own routing table. Then, the packet is first sent to the ip_forward function in ip_rcv for processing, and finally forwarded in the ip_output function. In this process, the three rules of PREROUTING, FORWARD and POSTROUTING are passed respectively.

To sum up, the positions of the five chains in iptables in the kernel network module can be summarized as follows.

The data receiving process goes through 1 and 2, the sending process goes through 4 and 5, and the forwarding process goes through 1, 3, and 5. With this diagram, we can understand the relationship between iptable and the kernel more clearly.

In iptables, there are four tables according to the different functions implemented. They are raw, mangle, nat and filter. The nat table implements the NAT (Network Address Translation) function we often say. NAT is divided into SNAT (Source NAT) and DNAT (Destination NAT).

SNAT solves the problem of intranet addresses accessing external networks. It is achieved by modifying the source IP in POSTROUTING.

DNAT solves the problem of making the services in the intranet accessible to the outside world. It does this by modifying the target IP through PREROUTING.

2. Realize virtual network extranet communication

Based on the above basic knowledge, we use a purely manual method to build a virtual network similar to Docker, and also need to realize the function of communicating with the external network.

1. Experimental Environment Preparation

Let's create a virtual network environment with a namespace of net1. The host machine's IP is in the 10.162 network segment, which can access external machines. The virtual network is assigned the 192.168.0 network segment, which is private and cannot be recognized by external machines.

The process of building this virtual network is as follows: First create a netns and name it net1.

  1. # ip netns add net1

Create a veth pair (veth1 - veth1_p), put one end of veth1 in net1, configure an IP for it, and start it.

  1. # ip link add veth1 type veth peer name veth1_p
  2.  
  3. # ip link set veth1 netns net1
  4.  
  5. # ip netns exec net1 ip addr add 192.168.0.2/24 dev veth1 # IP
  6.  
  7. # ip netns exec net1 ip link set veth1 up

Create a bridge and set an IP address for it. Then plug the other end of veth, veth1_p, into the bridge. Finally, start both the bridge and veth1_p.

  1. # brctl addbr br0
  2.  
  3. # ip addr add 192.168.0.1/24 dev br0
  4.  
  5. # ip link set dev veth1_p master br0
  6.  
  7. # ip link set veth1_p up
  8.  
  9. # ip link set br0 up

In this way, we have created a virtual network on Linux. The creation process is the same as in the article "Switch" implemented by software on Linux - Bridge!, but today, for the sake of convenience, only one network is created, while two were created in the previous article.

2. Request external resources

Now suppose we want to access the external network in the network environment net1 above. The external network here refers to the network outside the virtual network host.

We assume that the IP address of the other machine it wants to access is 10.153.*.*. The last two parts of 10.153.*.* are hidden because they are my internal network. You can replace them with your own IP address during the experiment.

Let’s visit it directly.

  1. # ip netns exec net1 ping 10.153.*.*
  2.  
  3. connect : Network is unreachable

It prompts that the network is not connected. What's going on? Use this error keyword to search in the kernel source code:

  1. //file: arch/parisc/include/uapi/asm/errno.h
  2. #define ENETUNREACH 229 /* Network is unreachable */
  3.  
  4. //file: net/ipv4/ping.c
  5. static   int ping_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
  6. size_t len)
  7. {
  8. ...
  9. rt = ip_route_output_flow(net, &fl4, sk);
  10. if (IS_ERR(rt)) {
  11. err = PTR_ERR(rt);
  12. rt = NULL ;
  13. if (err == -ENETUNREACH)
  14. IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
  15. goto   out ;
  16. }
  17. ...
  18. out :
  19. return err;
  20. }

In ip_route_output_flow, if the return value is ENETUNREACH, the function exits. The error message in the macro definition comment is "Network is unreachable".

This ip_route_output_flow is mainly used to perform routing selection. So we infer that there may be a problem with the routing, and take a look at the routing table of this namespace.

  1. # ip netns exec net1 route -n
  2. Kernel IP routing table  
  3. Destination Gateway Genmask Flags Metric Ref Use Iface
  4. 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 veth1

No wonder, it turns out that the default routing rule for the net1 namespace is only for the 192.168.0.* network segment. The IP we pinged is 10.153.*.*, and according to this routing table, no exit can be found. Naturally, the transmission fails.

Let's add a default routing rule to net. As long as no other rules are matched, it will be sent to veth1 by default. At the same time, specify that the next one is the bridge (192.168.0.1) to which it is connected.

  1. #ip netns exec net1 route add   default gw 192.168.0.1 veth1

Try pinging again.

  1. # ip netns exec net1 ping 10.153.*.* -c 2
  2.  
  3. PING 10.153.*.* (10.153.*.*) 56(84) bytes of data.
  4.  
  5. --- 10.153.*.* ping statistics ---  
  6.  
  7. 2 packets transmitted, 0 received, 100% packet loss, time 999ms

Ok, still no connection. The above routing helps us send the data packet from veth to the bridge correctly. Next, the bridge needs to forward the data packet to the eth0 network card. So we have to open the following two forwarding related configurations

  1. # sysctl net.ipv4.conf. all .forwarding=1
  2.  
  3. # iptables -P FORWARD ACCEPT

However, there is still a problem. That is, the external machines do not know the IP of the 192.168.0.* network segment. They all communicate with each other through 10.153.*.*. Imagine how we can access the Internet normally when our computers at work do not have an external IP? The external network only knows the external IP. Yes, that is the NAT technology we mentioned above.

Our requirement this time is to enable the internal virtual network to access the external network, so we need to use SNAT. It replaces the IP (192.168.0.2) in the namespace request with 10.153.*.* known to the external network, thereby achieving the effect of normal access to the external network.

  1. # iptables -t nat -A POSTROUTING -s 192.168.0.0/24 ! -o br0 -j MASQUERADE

Let’s try pinging again. Yay, it works!

  1. # ip netns exec net1 ping 10.153.*.*
  2.  
  3. PING 10.153.*.* (10.153.*.*) 56(84) bytes of data.
  4.  
  5. 64 bytes from 10.153.*.*: icmp_seq=1 ttl=57 time =1.70 ms
  6.  
  7. 64 bytes from 10.153.*.*: icmp_seq=2 ttl=57 time =1.68 ms

At this time, we can open tcpdump to capture packets and check. We can see that the packets captured on the bridge still have the original source IP and destination IP.

If you check on eth0 again, you will find that the source IP has been replaced with the IP on eth0 that can communicate with the external network.

At this point, the container can access resources on the external network through the host's network card. Let's summarize the sending process

3. Open container ports

Let's consider another requirement, which is to provide the services in this namespace to the external network.

Just like the above problem, the IP 192.168.0.2 in our virtual network environment is unknown to the outside world. Only the host machine knows who it is. So we also need NAT function.

This time we want to achieve external network access to internal addresses, so we need DNAT configuration. One difference between DNAT and SNAT configuration is that you need to clearly specify which port in the container corresponds to on the host. For example, in the use of docker, the correspondence of ports is specified through -p.

  1. # docker run -p 8000:80 ...

We configure the DNAT rules with the following command:

  1. # iptables -t nat -A PREROUTING ! -i br0 -p tcp -m tcp --dport 8088 -j DNAT --to-destination 192.168.0.2:80  

What this means is that the host machine determines before routing that if the traffic does not come from br0 and is accessing TCP 8088, it will forward it to 192.168.0.2:80.

Start a Server in the net1 environment

  1. # ip netns exec net1 nc -lp 80

Choose an external IP address, such as 10.143.*.*, and try telneting to 10.162.*.* 8088. It works!

  1. # telnet 10.162.*.* 8088
  2.  
  3. Trying 10.162.*.*...
  4.  
  5. Connected to 10.162.*.*.
  6.  
  7. Escape   character   is   '^]' .

Start packet capture, # tcpdump -i eth0 host 10.143.*.*. It can be seen that when requesting, the destination is the IP port of the host machine.

But after the data packet reaches the host protocol stack, it hits the DNAT rule we configured, and the host forwards it to br0. Since there are not so many network traffic packets on the bridge, you can capture the packets directly without filtering, # tcpdump -i br0.

The destination IP and port captured on br0 have been replaced.

Of course, bridge knows that 192.168.0.2 is veth 1. Therefore, the service listening to 80 on veth1 can receive requests from the outside world! Let's summarize this receiving process

Conclusion

Now many companies in the industry have migrated to containers. The code we develop is likely to run on containers. Therefore, it is very important to have a deep understanding of how container networks work. This will help you know how to deal with problems in the future.

At the beginning of this article, we briefly introduced the basic knowledge of veth, bridge, namespace, routing, iptables, etc. Veth implements connection, bridge implements forwarding, namespace implements isolation, routing table controls device selection when sending, and iptables implements NAT and other functions.

Then, based on the above basic knowledge, we built a virtual network environment in a purely manual way.

This virtual network can access external network resources and provide port services for external network calls. This is the basic principle of how Docker container network works.

I packaged the whole experiment into a Makefile and put it here: https://github.com/yanfeizhang/coder-kung-fu/tree/main/tests/network/test07

Finally, let's expand on this. Today we are discussing the issue of Docker network communication. Docker containers provide external services through port mapping. When external machines access container services, they still need to access them through the container's host IP.

In Kubernetes, there are higher requirements for cross-host network communication, and containers between different hosts must be able to directly interconnect. Therefore, the network model of Kubernetes is also more complex.

<<:  5G manufacturing involves much more than just 5G

>>:  How 5G will help wearable devices like smartwatches charge automatically

Recommend

Learn about routers, switches, and network hardware

Today we're taking a look at home network har...

The past and present of AlphaGo

Why did AlphaGo focus on Go instead of Mahjong? L...

Data center "cloudification" solves the embarrassment of virtualization

Virtualization technology is being used more and ...

Two threads, two mutexes, how can a dead loop be formed?

[[351971]] Fans’ questions must be arranged. How ...

6 AI Elements You Need for a Wireless Network Strategy

Thanks to advances in artificial intelligence (AI...

Accelerate the release of new infrastructure value with data as the core

[[341973]] Yu Yingtao, Co-President of Tsinghua U...

Model application in anti-fraud risk control of advertising traffic

1. Introduction to Ad Anti-Cheat 1.1 Definition o...

How 5G standardization will impact future innovation and growth

In 2019, mobile technologies and services contrib...

The 400G era is coming, and new optical fibers may be the best partner

With the continuous emergence of high-definition ...