An article to introduce you to network protocols

An article to introduce you to network protocols

Author | Cai Zhuliang

1. Directory

  • Network Protocol
  • HTTP
  • HTTPS

I hope this article will help readers understand what network protocols are, as well as the http and https that we are most commonly exposed to.

2. Network Protocol

Network protocols are rules, standards or conventions established for data exchange in computer networks.

As we all know, the Internet is a "big network" composed of computers, communicating with each other and exchanging data. We also know that there must be differences between computers produced by different computer manufacturers, so how do they overcome these differences to communicate? Obviously, it is "language". Our language can communicate with each other because we have a consensus on these definitions, such as apple refers to a specific fruit, etc. Computers also complete communication by establishing this agreement. But be careful! This network protocol is not only used by computers to use each other, but also for all devices on the network (servers, personal PCs, switches, routers, firewalls, etc.). Most networks use a layered architecture, where each layer is built on top of its lower layer, providing certain services to its upper layer, and shielding the details of how to implement this service from the upper layer (this is similar to the interface in our code). The rule for the nth layer on one device to communicate with the nth layer on another device is the nth layer protocol. There are many protocols in each layer of the network. The protocols of the same layer of the receiver and the sender must be consistent, otherwise one party will not be able to recognize the information sent by the other party. Network protocols enable various devices on the network to exchange information with each other. It was mentioned above that most networks use layered architecture. Here is the layered model:

  • The OSI model (Open System Interconnection Reference Model) is a conceptual model proposed by the International Organization for Standardization. It is a standard framework that attempts to interconnect various computers into a network worldwide. It is specifically divided into seven layers:
  • Application layer (Layer 7)
  • Interface for application software, used for communication between applications
  • Presentation layer (layer 6)
  • Convert the data into a format that the receiving system can use
  • Session layer (Layer 5)
  • The session layer is built on top of the transport layer. It uses the interface provided by the transport layer to enable applications to establish and maintain sessions and synchronize sessions (simply understood as a channel for data transmission).
  • Transport layer (Layer 4)
  • Add the transmission header (TH, which contains information such as the protocol used) to the data (the data we want to transmit) to form a data packet
  • Network layer (Layer 3)
  • The network layer determines the transmission path and forwarding of data. It adds the network header (NH, which contains network data: IP, etc.) to the data packet.
  • Data Link Layer (Layer 2)
  • Data Link Layer is responsible for network addressing, error detection and error correction Physical Layer (Layer 1)
  • The physical layer ensures that raw data can be transmitted on various physical media

The similarities and differences between the TCP/IP protocol family layering and the OSI layering are shown in the following figure:

Next, we will draw a simple scene through network request.

Scenario: I wrote a simple static page called "hello world" for the company and deployed it on the company's server. I used my own computer at home to access this static page through the public network. For example, the URL is "http://www.xxx.com".

What did the browser do when I visited this URL? Let's look at the following picture:

TCP

TCP (Transmission Control Protocol) is a connection-oriented, reliable, byte stream-based, bidirectional transport layer communication protocol. It will go through three handshakes when establishing a connection, and will not start transmitting data until the three handshakes are completed; when terminating a connection, it needs four handshakes. The details are as follows:

(1) Establishing a connection

Image source: Baidu Encyclopedia

Three-way handshake:

  • The client sends a SYN message to the server and enters the SYN_SEND state
  • The server replies SYN and enters the SYN_RECV state
  • After receiving the SYN message from the server, the client replies with ACK

The client and server enter the Established state and can start sending and receiving data.

(2) Terminate the connection

Image source: Baidu Encyclopedia

  • Four waves (Note: the close action can be initiated by either end first, here we take the client as an example):
  • The client first calls close, executes active close, and sends FIN to indicate that the data has been sent, entering the FIN_WAIT_1 state.
  • After receiving the FIN, the server executes a passive close and sends an ACK to the client, entering the CLOSE_WAIT state.
  • The server sends a FIN to the client and enters the LAST_ACK state

The party that actively initiates the close is responsible for the final confirmation of FIN. In this example, the client needs to receive FIN and reply ACK to the server, entering the TIME_WAIT state. After the server receives ACK, it enters the CLOSED state.

Why do we wave four times when it ends?

Because one party actively initiates close and sends FIN, it only means that it will no longer send data, but it can still receive data, so the other party needs to close and send FIN to notify the other party. As for why ACK and FIN should be separated? Because ACK tells the other party "I know", while FIN tells the other party "I don't have any data to give you anymore". In reality, it is not necessarily that I have given all the data to the other party when I receive FIN, so they need to be separated.

HTTP

HTTP (HyperText Transfer Protocol), Hypertext Transfer Protocol, is implemented based on TCP protocol.

HTTP is a stateless protocol. When we visit a page as a visitor, the stateless protocol is simple and efficient. However, in e-commerce scenarios, it is necessary to record the user's login status or shopping cart product information (in addition to e-commerce, some middle-end systems also need to record user status, just for example), so some additional technical assistance is needed, such as cookies.

HTTP message format

The structures of HTTP request and response messages are basically the same.

The message consists of three parts:

  • Start row
  • Describes basic information about the request or response, such as: GET /** HTTP/1.1, HTTP/1.1 200 OK, etc.
  • Header field collection
  • Use key-value pairs to describe messages (think request headers and response headers)
  • Message body

HTTPS

HTTP is implemented based on TCP. Its messages are in plain text. The entire transmission process is completely transparent and can be easily intercepted and modified at any stage. This is very unsafe. Therefore, the secure HTTP protocol came into being - HTTPS. HTTPS is actually HTTP with SSL added.

(1) SSL/TLS

SSL stands for Secure Sockets Layer, which was renamed TLS (Transport Layer Security) in 1999.

There are a few concepts to clarify first:

  • Symmetric encryption
  • Encryption and decryption with the same "key"
  • Asymmetric encryption
  • There are two "keys" - public key and private key. If you encrypt with the public key, you need to use the private key to decrypt; if you encrypt with the private key, you need the public key to decrypt.
  • Digest Algorithm
  • Generate a fixed-length content from a random-length content. Common algorithms include: MD5, sha1, sha2, etc.
  • Security
  • There is no absolute security. The data security we talk about is based on a trust point. If we believe it is safe, then the security we talk about is valid. Otherwise, there is no security. For example, we believe in the security of asymmetric encryption and symmetric encryption algorithms, so we believe that as long as the key is not leaked, then it is safe.

(2) The HTTPS workflow is roughly as follows:

Complete the three-way handshake first, which is consistent with HTTP

  • The browser sends a list of encryption suites to the server (that is, telling the server the encryption algorithms it supports)
  • The server selects an encryption algorithm based on the encryption suite list and sends the public key to the browser
  • After the browser obtains the public key, it randomly generates a key used by the symmetric encryption algorithm, encrypts the key with the public key, and then sends the ciphertext to the server.
  • The server uses the private key to decrypt, and the content information of the session is encrypted using this key and transmitted to the browser

(3) Advantages

  • Asymmetric encryption ensures that the key transmitted by the browser cannot be cracked (because the private key is in your own hands and has not been transmitted over the network)
  • Using symmetric encryption algorithm to encrypt and decrypt content is highly efficient

(4) Disadvantages

  • When the server transmits the public key to the browser, there is no guarantee that the browser will not disclose the public key

Due to this shortcoming, we need to rely on third-party organizations to help make our HTTPS more secure and reliable.

The details are as follows:

  • For the third step, change the transmission of public key to transmission of public key digital certificate
  • Digital certificate composition:

Public key user information

Public Key

sign

Obtain data summary through hash (public key, company information, domain name and other application information); CA then encrypts the summary information, and this ciphertext is the signature

CA Information

Validity

Certificate serial number

  • Digital certificates are issued by a third-party organization (CA organization)
  • The company information, system domain name and public key need to be authenticated by the CA. After the authentication is passed, the CA will issue us a certificate. The content of the certificate is as above. Because the certificate is signed, the content of the certificate cannot be tampered with, so the security of the public key user information and public key in the certificate is guaranteed.
  • The reliability of certificates issued by CAs depends on root certificates, which are built into the operating system or browser (in other words, we have to trust the security of the operating system or browser)

To summarize, the security of our HTTPS is based on trust in the root certificate and the encryption algorithm, so we believe that it is safe.

As mentioned above, our security can only be discussed based on a certain trust point, so there is no absolute security. If a hacker hijacks the browser and makes all your requests go to him first, and then to the server, then all the data you request will go to the hacker first, and then it will be unsafe. For example: many of our ladders are proxies. The requests sent by the browser are proxied by it, and then go to the server that can bypass the firewall to request resources. The data obtained is naturally returned by the original route, so this transit server can perform many operations.

I believe that by now, everyone has already known that the network layered architecture we often talk about is generally defined as 5 layers or 7 layers, and the network protocol we talk about is the communication protocol for a certain layer. Here we take the most commonly used http and https as examples to explain, and talk about their differences, and also extend the content of network security.

About the Author

Cai Zhuliang, 51CTO community editor, has been engaged in Java backend development for 8 years. He has worked on traditional radio and television BOSS systems, and later devoted himself to Internet e-commerce, where he was responsible for orders, TMS, middleware, etc.

<<:  SPI Subsystem SPI Driver

>>:  In the cloud-native era, F5 distributed cloud services help SoftBank build a modern application architecture

Recommend

Diagram | Why HTTP3.0 uses UDP protocol

This article is reprinted from the WeChat public ...

Many hands make light work, 5G bearer will gradually enter the mature stage

It has become an industry consensus that 5G will ...

I'm a tough guy, so please take this detailed description of IPSec architecture.

IP Security Architecture, referred to as IPSec, i...

I found a mistake in the book!

I discussed some TCP issues with my friends over ...

Tell you the real strength of the four major communication operators' 5G

Recently, our country has determined the 2020 &qu...

Another major accident breaks out. How to solve the 5G network security dilemma?

The frequent outbreaks of cybersecurity incidents...

Everyone wants to know about BGP, routing strategy is handled like this

About the author: Xiao Honghui, graduated from th...

CC attack & TCP and UDP correct opening posture

introduction: 1: CC attack is normal business log...

Dalian leads the nation in "Internet + Government Services"

Come listen to the stories of several friends and...