How does user-mode Tcpdump capture kernel network packets?

How does user-mode Tcpdump capture kernel network packets?

[[422515]]

This article is reprinted from the WeChat public account "Kaida Neigong Xiuxian", written by Zhang Yanfei allen. To reprint this article, please contact the WeChat public account "Kaida Neigong Xiuxian".

Hello everyone, I am Fei Ge!

Today let’s talk about tcpdump, which is often used in our work.

In the process of sending and receiving network packets, most of the work is done in kernel mode. So the question is, how does the tcpdump program that we often use in user mode capture kernel mode packets? Some students know that tcpdump is based on libpcap, so what is the working principle of libpcap? If you are asked to write a packet capture program, do you have any ideas?

According to Fei Ge's style, I won't stop until I get to the bottom of the principle. So I conducted an in-depth analysis of the relevant source code. Through this article, you will thoroughly understand the following questions.

How does tcpdump work?

  • Can tcpdump capture packets filtered by netfilter?
  • How would you start if you were asked to write a packet capture program yourself?
  • With the help of these questions, let’s start today’s journey of exploration!

1. Network packet receiving process

In the article Illustrated Linux Network Packet Receiving Process, we introduced in detail how the network packet reaches the user process from the network card. This process can be simply represented by the following diagram.

Find the tcpdump packet capture point

We found the packet capture entry of tcpdump in the code of the network device layer. In the function __netif_receive_skb_core, the protocols on ptype_all will be traversed. Remember that we mentioned above that tcpdump registered a virtual protocol on ptype_all. Now it can be executed. Let's look at the function:

  1. //file: net/core/dev.c
  2. static   int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
  3. {
  4. ......
  5. //Traverse ptype_all (tcpdump hangs the virtual protocol here)
  6. list_for_each_entry_rcu(ptype, &ptype_all, list) {
  7. if (!ptype->dev || ptype->dev == skb->dev) {
  8. if (pt_prev)
  9. ret = deliver_skb(skb, pt_prev, orig_dev);
  10. pt_prev = ptype;
  11. }
  12. }
  13. }

In the above function, ptype_all is traversed and deliver_skb is used to call the callback function in the protocol.

  1. //file: net/core/dev.c
  2. static inline int deliver_skb(...)
  3. {
  4. return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
  5. }

For tcpdump, it will enter packet_rcv (we will talk about why it enters this function later). This function is in the net/packet/af_packet.c file.

  1. //file: net/packet/af_packet.c
  2. static   int packet_rcv(struct sk_buff *skb, ...)
  3. {
  4. __skb_queue_tail(&sk->sk_receive_queue, skb);
  5. ......
  6. }

It can be seen that packet_rcv puts the received skb into the receiving queue of the current packet socket. In this way, the captured packet can be obtained when recvfrom is called later!!

Find the netfilter filter point again

To explain the problem mentioned at the beginning, let's take a closer look at the protocol layer. In ip_rcv, we found a netfilter-related execution logic.

  1. //file: net/ipv4/ip_input.c
  2. int ip_rcv(...)
  3. {
  4. ......
  5. return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL ,
  6. ip_rcv_finish);
  7. }

If you use NF_HOOK as a keyword to search, you can find many netfilter filter points, but all of them are located at the IP protocol layer.

In the process of receiving packets, the data packets first pass through the network device layer and then reach the protocol layer.

Then we have the answer to one of the questions in the opening chapter. If we set the netfilter rule, in the process of receiving the packet, tcpdump working at the network device layer will start working first. Before netfilter filters, tcpdump will capture the packet!

Therefore, in the process of receiving packets, netfilter filtering does not affect tcpdump's packet capture!

2. Network packet sending process

Let's look at the network packet sending process. In the article "25 pictures, 10,000 words, disassembling the Linux network packet sending process", we described the network packet sending process in detail. The sending process can be summarized into a simple picture.

Find the netfilter filter point

During the sending process, it is also filtered by various netfilter rules at the IP layer.

  1. //file: net/ipv4/ip_output.c
  2. int ip_local_out(struct sk_buff *skb)
  3. {
  4. //Execute netfilter filtering
  5. err = __ip_local_out(skb);
  6. }
  7.  
  8. int __ip_local_out(struct sk_buff *skb)
  9. {
  10. ......
  11. return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL ,
  12. skb_dst(skb)->dev, dst_output);
  13. }

In this file, you can also see several netfilter filtering logics.

Find the tcpdump packet capture point

When the sending process is completed at the protocol layer and reaches the network device layer, there is also a tcpdump packet capture point.

  1. //file: net/core/dev.c
  2. int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
  3. struct netdev_queue *txq)
  4. {
  5. ...
  6. if (!list_empty(&ptype_all))
  7. dev_queue_xmit_nit(skb, dev);
  8. }
  9.  
  10. static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
  11. {
  12. list_for_each_entry_rcu(ptype, &ptype_all, list) {
  13. if ((ptype->dev == dev || !ptype->dev) &&
  14. (!skb_loop_sk(ptype, skb))) {
  15. if (pt_prev) {
  16. deliver_skb(skb2, pt_prev, skb->dev);
  17. pt_prev = ptype;
  18. continue ;
  19. }
  20. ......
  21. }
  22. }
  23. }

In the above code, we can see that in dev_queue_xmit_nit, the protocols in ptype_all are traversed and deliver_skb is called in sequence. This will execute the virtual protocol that tcpdump is hanging on.

In the process of sending network packets, the process is exactly the opposite of the receiving process, that is, the protocol layer processes it first and the network device layer processes it later.

If netfilter sets filtering rules, the packets will be directly filtered out at the protocol layer, and tcpdump working at the lower network device layer will no longer be able to capture the network packets.

3. TCPDUMP startup

In the previous two sections, we mentioned that the kernel captures packets by traversing ptype_all. Now let's take a look at how user-mode tcpdump mounts the protocol to the internal ptype_all.

We use the strace command to capture the system call of the tcpdump command, and there is a line of socket system call in the display result. The source of Tcpdump's secret is hidden in this line of calling the socket function.

  1. # strace tcpdump -i eth0
  2. socket(AF_PACKET, SOCK_RAW, 768)
  3. ......

The first parameter of the socket system call indicates the address cluster or protocol cluster to which the created socket belongs, and the value starts with AF or PF. In Linux, many protocol clusters are supported, and all definitions can be found in include/linux/socket.h. Here, a packet type socket is created.

Protocol family and address family: Each protocol family has its corresponding address family. For example, the protocol family definition of IPV4 is called PF_INET, and its address family definition is AF_INET. They correspond one to one, and the values ​​are exactly the same, so they are often mixed.

  1. //file: include/linux/socket.h
  2. #define AF_UNSPEC 0
  3. #define AF_UNIX 1 /* Unix domain sockets */
  4. #define AF_LOCAL 1 /* POSIX name   for AF_UNIX */
  5. #define AF_INET 2 /* Internet IP Protocol */
  6. #define AF_INET6 10 /* IP version 6 */
  7. #define AF_PACKET 17 /* Packet family */
  8. ......

In addition, the third parameter 768 above represents ETH_P_ALL, socket.htons(ETH_P_ALL) = 768.

Let's take a look at what happens during the creation of this packet type socket and find the socket creation source code.

  1. //file: net/socket.c
  2. SYSCALL_DEFINE3(socket, int , family, int , type, int , protocol)
  3. {
  4. ......
  5. retval = sock_create(family, type, protocol, &sock);
  6. }
  7.  
  8. int __sock_create(struct net *net, int family, int type, ...)
  9. {
  10. ......
  11. pf = rcu_dereference(net_families[family]);
  12. err = pf-> create (net, sock, protocol, kern);
  13. }

In __sock_create, the specified protocol is obtained from net_families and its create method is called to complete the creation.

net_families is an array. In addition to the commonly used PF_INET (ipv4), it also supports many other protocol families, such as PF_UNIX, PF_INET6 (ipv6), PF_PACKET, etc. Each protocol family can be found in a specific position of the net_families array. In this family type, the member function create points to the corresponding creation function of the protocol family.

According to the above figure, we can see that for packet type socket, pf->create actually calls packet_create function. Let's go into this function to find out, which is the key to understand how tcpdump works!

  1. //file: packet/af_packet.c
  2. static   int packet_create(struct net *net, struct socket *sock, int protocol,
  3. int kern)
  4. {
  5. ...
  6. po = pkt_sk(sk);
  7. po->prot_hook.func = packet_rcv;
  8.  
  9. //Registration hook
  10. if (proto) {
  11. po->prot_hook.type = proto;
  12. register_prot_hook(sk);
  13. }
  14. }
  15.  
  16. static void register_prot_hook(struct sock *sk)
  17. {
  18. struct packet_sock *po = pkt_sk(sk);
  19. dev_add_pack(&po->prot_hook);
  20. }

Set the callback function to packet_rcv in packet_create, and then complete the registration through register_prot_hook => dev_add_pack. After registration, a virtual protocol is added to the global protocol ptype_all linked list.

Let's take a look at how dev_add_pack registers the protocol to ptype_all. Looking back at the socket function call we saw at the beginning, the third parameter proto passed in is ETH_P_ALL. So dev_add_pack actually adds the hook function to ptype_all in the end. The code is as follows.

  1. //file: net/core/dev.c
  2. void dev_add_pack(struct packet_type *pt)
  3. {
  4. struct list_head *head = ptype_head(pt);
  5. list_add_rcu(&pt->list, head);
  6. }
  7.  
  8. static inline struct list_head *ptype_head(const struct packet_type *pt)
  9. {
  10. if (pt->type == htons(ETH_P_ALL))
  11. return &ptype_all;
  12. else  
  13. return &ptype_base[ntohs(pt->type) & PTYPE_HASH_MASK];
  14. }

We use ETH_P_ALL as an example throughout this article, but sometimes there are other cases. In other cases, the protocol may be registered in ptype_base instead of ptype_all. Similarly, the protocol in ptype_base will also be executed during the sending and receiving process.

Summary: The internal logic of tcpdump when it starts is actually very simple, it just registers a virtual protocol in ptype_all.

IV. Conclusion

Now let’s go back to the issues mentioned at the beginning.

1. How does tcpdump work?

The user-mode tcpdump command uses the socket system call to hook into the ptype_all function used in the kernel source code. Whether in the process of receiving or sending network packets, the protocol in ptype_all will be traversed at the network device layer and the callbacks will be executed. The tcpdump command works based on this underlying principle.

2. Can tcpdump capture packets filtered by netfilter?

Regarding this question, we can look at the receiving and sending processes separately. In the process of receiving network packets, tcpdump can capture the packets that hit the netfilter filtering rules because it is close to the water tower and gets the moon first.

But in the sending process, the opposite is true. The network packet first passes through the protocol layer. If it is filtered out by netfilter at this time, the underlying tcpdump will not see anything before it can see it.

3. How to write a packet capture program yourself

If you want to write a packet capture program similar to tcpdump, you can use packet socket. I wrote a simple demo in C to capture packets and parse the source and destination IP.

Source code address: https://github.com/yanfeizhang/coder-kung-fu/blob/main/tests/network/test04/main.c

Compile it and note that root privileges are required to run it.

  1. # gcc -o main main.c
  2. # ./main

The running results are previewed as follows.

Finally, please watch it again and forward it!

<<:  Denodo Named a Leader in the 2021 Gartner Magic Quadrant for Data Integration Tools for the Second Consecutive Year

>>:  The shortest path to microservice containerization, best practices for microservices on C

Recommend

5G, edge computing and IoT are expected to reshape networks

5G provides wireless cellular connectivity with h...

Ten reasons why it's time to retire traditional routers in branch offices

Over the years, we've dutifully upgraded our ...

Wireless sensor network standardization progress and protocol analysis

[[188829]] As an application-oriented research fi...

G Suite vs. Office 365: Which is the right productivity suite for your business?

Choosing an office suite used to be a simple matt...

The most anticipated technology trends in 2023

In 2023, technology will further develop, and new...

How does network monitoring work?

Network monitoring complements network management...

Let’s talk about gRPC that you don’t know today

Hello everyone, I am Zhibeijun. It is the last da...

Virtual Private Server Operation Beginner's Guide

A Virtual Private Server (VPS) is a popular hosti...

Enterprise 5G: A guide to planning, architecture and benefits

Enterprise 5G deployments require extensive plann...

How to connect a switch Switch usage tutorial

In the era of popular Internet, many families hav...