HTTP interview, 99% of interviewers like to ask these questions

HTTP interview, 99% of interviewers like to ask these questions

[[322727]]

Differences between HTTP and HTTPS

HTTP is a Hypertext Transfer Protocol. HTTP is an agreement and specification in the computer world for transmitting hypertext data such as text, pictures, audio, video, etc. between two points.

The main content of HTTP is divided into three parts: Hypertext, Transfer, and Protocol.

  • Hypertext is not just the text, it can also transmit pictures, audio, video, and even jump to a hyperlink by clicking on text or pictures.
  • The above concepts can be collectively referred to as data. Transmission is the process of data being transmitted from one end system to another through a series of physical media. Usually, we call the party that transmits the data packet the requester, and the party that receives the binary data packet the responder.
  • Protocols refer to the rules for transmitting and managing information in networks (including the Internet). Just as people need to follow certain rules when communicating with each other, computers need to follow certain rules when communicating with each other. These rules are called protocols, which are just network protocols.

When talking about HTTP, we have to mention the TCP/IP network model, which is generally a five-layer model. As shown in the following figure:

However, it can also be divided into four layers, that is, the link layer and the physical layer are both represented as network interface layers:

Another is the OSI seven-layer network model, which adds the presentation layer and session layer on top of the five-layer protocol:

The full name of HTTPS is Hypertext Transfer Protocol Secure. From the name we can see that HTTPS has the concept of secure security. In fact, HTTPS is not a new application layer protocol. It is actually a combination of HTTP + TLS/SSL protocols, and the guarantee of security is exactly what TLS/SSL does.

In other words, HTTPS is HTTP covered with a layer of SSL.

So, what are the main differences between HTTP and HTTPS?

  • The simplest one is that the HTTP protocol in the address bar starts with http://, while the HTTPS protocol in the address bar starts with https://.
    1. http://www.cxuanblog.com/
    2. https://www.cxuanblog.com/
  • HTTP is a protocol without secure encryption. Its transmission process is easy to be monitored by attackers, data is easy to be stolen, and senders and receivers are easy to be forged. HTTPS is a secure protocol that can solve the above problems through key exchange algorithm - signature algorithm - symmetric encryption algorithm - digest algorithm.
  • The default port for HTTP is 80, and the default port for HTTPS is 443.

Differences between HTTP Get and Post

HTTP includes many methods. Get and Post are the two most commonly used methods in HTTP. Basically, 99% of the HTTP methods are using Get and Post methods, so it is necessary for us to have a deeper understanding of these two methods.

The get method is generally used for requests. For example, if you enter www.cxuanblog.com in the browser address bar, you are actually sending a get request. Its main feature is to request the server to return resources, while the post method is generally used for

  • Submitting a form is equivalent to submitting information to the server and waiting for the server to respond. Get is equivalent to a pull operation, while post is equivalent to a push operation.
  • The get method is unsafe because when you send a request, your request parameters will be spelled after the URL, which makes it easy for attackers to steal and damage and forge your information.
    1. /test/demo_form.asp? name1 = value1 & name2 = value2  

The post method puts the parameters in the request body, which is invisible to the user.

  1. POST /test/demo_form.asp HTTP/1.1
  2. Host: w3schools.com
  3. name1 = value1 & name2 = value2  
  • The URL of a get request has a length limit, while a post request places parameters and values ​​in the message body and has no requirements on the data length.
  • Get requests will be actively cached by the browser, but post requests will not be cached unless manually set.
  • The get request is harmless in the browser's repeated back/forward operations, while the post operation will submit the form request again.
  • A get request generates one TCP data packet during the sending process; a post request generates two TCP data packets during the sending process. For a get request, the browser sends the http header and data together, and the server responds with 200 (return data); for a post, the browser sends the header first, the server responds with 100 continue, the browser sends the data, and the server responds with 200 ok (return data).

What is a stateless protocol? Is HTTP a stateless protocol? How to solve it?

Stateless Protocol means that the browser has no memory of the transaction processing. For example, if a client requests a web page and then closes the browser, and then restarts the browser and logs in to the website, the server does not know that the client has closed the browser.

HTTP is a stateless protocol. It has no memory of user operations. Most users may not believe this. They may think that they don't need to re-enter their username and password each time they log in to a website. This is not what HTTP does. What works is a mechanism called cookies. It enables the browser to have memory.

If your browser allows cookies, you can check it at chrome://settings/content/cookies

That means your memory chip is powered on... When you send a request to the server, the server will send you an authentication message. When the server receives the request for the first time, it opens up a session space (creates a session object), generates a sessionId, and sends a response to the client requesting to set a cookie through the Set-Cookie: JSESSIONID=XXXXXXX command in the response header; after the client receives the response, it sets a cookie information of JSESSIONID=XXXXXXX on the local client, and the expiration time of the cookie is the end of the browser session;

Next, each time the client sends a request to the same website, the request header will carry the cookie information (including sessionId). Then, the server reads the cookie information in the request header, obtains the value named JSESSIONID, and obtains the sessionId of this request. In this way, your browser has the ability to remember.

Another way is to use the JWT mechanism, which is also a mechanism that allows your browser to have memory capabilities. Unlike cookies, JWT is information stored on the client side and is widely used in single sign-on situations. JWT has two characteristics:

  • The JWT cookie information is stored on the client side, not in the server side memory. In other words, JWT can be directly verified locally. After verification, the token will be sent to the server along with the request in the Session. In this way, server resources can be saved, and the token can be verified multiple times.
  • JWT supports cross-domain authentication. Cookies are only valid in the domain of a single node or its subdomain. If they try to access through a third node, they will be prohibited. Using JWT can solve this problem. Using JWT can authenticate users through multiple nodes, which is what we often call cross-domain authentication.

Differences between UDP and TCP

TCP and UDP are both located in the transport layer of the computer network model, and they are responsible for transmitting data generated by the application layer. Let's talk about the characteristics and differences between TCP and UDP.

1. What is UDP

UDP stands for User Datagram Protocol. It does not require the so-called handshake operation, which speeds up communication and allows other hosts on the network to transmit data before the receiver agrees to communicate.

A datagram is the unit of transmission associated with packet-switched networks.

The main features of UDP are:

  • UDP can support bandwidth-intensive applications that tolerate packet loss
  • UDP has the characteristic of low latency
  • UDP can send large amounts of data packets
  • UDP can allow DNS lookups, and DNS is an application layer protocol built on top of UDP.

2. What is TCP

TCP stands for Transmission Control Protocol. It helps you determine the connection between computers and the Internet and the data transmission between them. A TCP connection is established through a three-way handshake, which is the process used to start and confirm a TCP connection. Once the connection is established, data can be sent. When the data transmission is completed, the connection is disconnected by closing the virtual circuit.

The main features of TCP are:

  • TCP ensures the establishment of connections and the sending of data packets
  • TCP supports error retransmission mechanism
  • TCP supports congestion control, which can delay sending when the network is congested.
  • TCP provides error checking and can identify harmful packets.

3. Differences between TCP and UDP

Below are some differences between TCP and UDP for your convenience.

TCP three-way handshake and four-way wave

The TCP three-way handshake and four-way handshake are also popular test points in interview questions. They correspond to the TCP connection and release processes respectively. Let's take a brief look at these two processes.

1. TCP three-way handshake

Before understanding the specific process, we need to understand a few concepts first

  • SYN: Its full name is Synchronize Sequence Numbers. It is the handshake signal used when TCP/IP establishes a connection. It is the first signal sent when a TCP connection is established between a client and a server. When the client receives the SYN message, it generates a random value X in its own segment.
  • SYN-ACK: After receiving the SYN, the server opens the client connection and sends a SYN-ACK in reply. The acknowledgment number is set to one more than the received sequence number, that is, X + 1. The sequence number selected by the server for the data packet is another random number Y.
  • ACK: Acknowledge character, which indicates that the data sent has been confirmed to be received correctly. Finally, the client sends ACK to the server. The sequence number is set to the received confirmation value, which is Y + 1.

If we use real life as an example, it would be: Xiao Ming - Client Xiao Hong - Server

  • Xiao Ming calls Xiao Hong. After the call is connected, Xiao Ming says, "Hello, can you hear me?" This means that the connection is established.
  • Xiaohong responded to Xiaoming, "Yes, I can hear you. Can you hear what I am saying?" This is equivalent to a request for response.
  • After Xiao Ming hears Xiao Hong's response, he says, "OK," which is equivalent to a connection confirmation. After that, Xiao Ming and Xiao Hong can talk/exchange information.

2. TCP four-way handshake

In the connection termination phase, four waves are used, and each end of the connection will terminate independently. Let's describe this process below.

  • First, the client application decides to terminate the connection (the server can also choose to disconnect here). This causes the client to send a FIN to the server and enter the FIN_WAIT_1 state. When the client is in the FIN_WAIT_1 state, it waits for an ACK response from the server.
  • Then in the second step, when the server receives the FIN message, it will immediately send an ACK confirmation message to the client.
  • When the client receives the ACK response from the server, it enters the FIN_WAIT_2 state and waits for the FIN message from the server.
  • After the server sends the ACK confirmation message, it will send a FIN message to the client after a period of time (after it can be closed), informing the client that it can close.
  • When the client receives the FIN message sent from the server, the client changes from the FIN_WAIT_2 state to the TIME_WAIT state. The client in the TIME_WAIT state is allowed to resend ACK to the server to prevent information loss. The time the client spends in the TIME_WAIT state depends on its implementation. After waiting for a period of time, the connection is closed and all resources on the client (including port numbers and buffer data) are released.

We can still use the above call example to describe it:

  • Xiao Ming said to Xiao Hong, I have said everything and I am going to hang up now.
  • Xiaohong said, got it, but there are still some things I haven’t said.
  • After a few seconds, Xiaohong finished speaking and said, "I'm done, you can hang up now."
  • After receiving the message, Xiao Ming waited for some time and then hung up the phone.

Briefly describe the differences between HTTP1.0/1.1/2.0

1. HTTP 1.0

HTTP 1.0 was introduced in 1996, and since then its adoption has been phenomenal.

  • HTTP 1.0 only provides the most basic authentication, and the username and password are not encrypted at this time, so they are easily snooped.
  • HTTP 1.0 is designed to use short links, that is, each data transmission will go through TCP's three-way handshake and four-way handshake, which is relatively inefficient.
  • HTTP 1.0 only uses the If-Modified-Since and Expires headers as the criteria for cache expiration.
  • HTTP 1.0 does not support resumable downloads, which means that all pages and data will be transmitted each time.
  • HTTP 1.0 assumes that each computer can only be bound to one IP, so the URL in the request message does not pass the hostname.

2. HTTP 1.1

HTTP 1.1 was developed three years after HTTP 1.0, in 1999, and it made the following changes:

  • HTTP 1.1 uses the digest algorithm for authentication
  • HTTP 1.1 uses a persistent connection by default. A persistent connection is a connection that can be established once to transmit multiple data. After the transmission is completed, the connection only needs to be disconnected once. The duration of the persistent connection can be set by using the keep-alive parameter in the request header.
  • HTTP 1.1 adds new cache control headers such as E-tag, If-Unmodified-Since, If-Match, If-None-Match to control cache expiration.
  • HTTP 1.1 supports resuming downloads from a breakpoint, which is achieved by using the Range in the request header.
  • HTTP 1.1 uses a virtual network. Multiple virtual hosts (Multi-homed Web Servers) can exist on a physical server and they share an IP address.

3. HTTP 2.0

HTTP 2.0 is a standard developed in 2015. Its main changes are as follows:

  • Header compression: HTTP 1.1 often has fields such as User-Agent, Cookie, Accept, Server, and Range that may take up hundreds or even thousands of bytes, while the Body is often only a few dozen bytes, which results in a heavier header. HTTP 2.0 uses the HPACK algorithm for compression.
  • Binary format. HTTP 2.0 uses a binary format that is closer to TCP/IP and abandons ASCII code, which improves parsing efficiency.
  • Strengthen security. Since security has become a top priority, HTTP2.0 generally runs on HTTPS.
  • Multiplexing means that each request is used for connection sharing. One request corresponds to one id, so there can be multiple requests on one connection.

Please tell me about the common HTTP request headers

This question is relatively open, because there are many HTTP request headers, and only a few examples are given here.

HTTP headers are divided into four types: general headers, entity headers, request headers, and response headers. Let's introduce them one by one.

1. Common headers

There are three common headers: Date, Cache-Control and Connection

(1) Date

Date is a common header that can appear in both request headers and response headers. Its basic representation is as follows

  1. Date: Wed, 21 Oct 2015 07:28:00 GMT

It represents Greenwich Mean Time, which is eight hours behind Beijing time.

(2) Cache-Control

Cache-Control is a universal header that can appear in both request and response headers. There are many types of Cache-Control. Although it is a universal header, some of its features are only available in request headers, and some are only available in response headers. The main categories are cacheability, threshold, revalidation and reload, and other features.

(3) Connection

Connection determines whether the network connection will be closed after the current transaction (a three-way handshake and four-way handshake) is completed. There are two types of Connection. One is a persistent connection, which means that the network connection is not closed after a transaction is completed.

  1. Connection: keep-alive

The other is a non-persistent connection, which means that the network connection is closed after a transaction is completed.

  1. Connection: close

Other common headers of HTTP1.1 are as follows:

2. Entity Header

Entity headers are HTTP headers that describe the content of the message body. Entity headers are used in HTTP requests and responses. The headers Content-Length, Content-Language, Content-Encoding are entity headers.

  • The Content-Length entity header indicates the size of the entity body, in bytes, sent to the recipient.
  • The Content-Language entity header describes the languages ​​that the client or server can accept.
  • Content-Encoding This is another troublesome attribute. This entity header is used to encode the media type. Content-Encoding indicates what encoding is applied to the entity.

Common content encodings include: gzip, compress, deflate, identity. This attribute can be applied to request and response messages.

  1. Accept-Encoding: gzip, deflate //Request header
  2. Content-Encoding: gzip //Response header

Here are some entity header fields

3. Request header Host

(1) Host

The request header specifies the server's domain name (for virtual hosts), and (optionally) the TCP port number that the server is listening on. If no port number is given, the default port for the requested service is automatically used (for example, requesting an HTTP URL automatically uses port 80).

  1. Host: developer.mozilla.org

The above Accpet, Accept-Language, and Accept-Encoding are all request headers belonging to content negotiation.

(2) Referer

The HTTP Referer attribute is part of the request header. When a browser sends a request to a web server, it usually includes a Referer to tell the server which page the web page was linked from, so that the server can obtain some information for processing.

  1. Referer: https://developer.mozilla.org/testpage.html

(3) If-Modified-Since

If-Modified-Since is usually used with If-None-Match to confirm the validity of local resources owned by the proxy or client. The update date and time of the resource can be determined by confirming the header field Last-Modified.

In plain words, if the server resource is updated after Last-Modified, the server will respond with 200. If the resource has not been updated after Last-Modified, it will return 304.

  1. If-Modified-Since: Mon, 18 Jul 2016 02:36:04 GMT

(4) If-None-Match

The If-None-Match HTTP request header makes the request conditional. For GET and HEAD methods, the server will send back the requested resource with a 200 status only if it does not have an ETag that matches the given resource. For other methods, the request will be processed only if the ETag of the final existing resource does not match any of the listed values.

  1. If-None-Match: "c561c68d0ba92bbeb8b0fff2a9199f722e3a621a"

(5) Accept

The Accept request HTTP header tells the client what MIME types it understands.

(6) Accept-Charset

The accept-charset attribute specifies the character sets that the server accepts when processing form data.

Commonly used character sets are: UTF-8 - Unicode character encoding; ISO-8859-1 - character encoding for the Latin alphabet

(7) Accept-Language

The Accept-Language header field is used to inform the server of the natural language sets (such as Chinese or English) that the user agent can handle, as well as the relative priority of the natural language sets. Multiple natural language sets can be specified at one time.

We will briefly introduce these types of request headers. There will be an article later that will dig into all the response headers in detail. The following is a summary of the response headers based on HTTP 1.1

4. Response Headers

(1) Access-Control-Allow-Origin

A returned HTTP header may have Access-Control-Allow-Origin, which specifies an origin, telling the browser to allow resource access from that origin.

(2) Keep-Alive

Keep-Alive indicates the survival time of a non-persistent connection, which can be specified.

(3) Server

The Server header contains information about the software that the origin server used to handle the request.

Overly long and detailed Server values ​​should be avoided because they may reveal internal implementation details, which may make it easy for attackers to discover and exploit known security vulnerabilities.

  1. Server: Apache/2.4.1 (Unix)

(4) Set-Cookie

Set-Cookie is used by the server to send the session ID to the client.

(5) Transfer-Encoding

The header field Transfer-Encoding specifies the encoding method used when transmitting the message body.

The transfer encoding method of HTTP/1.1 is only valid for chunked transfer encoding.

(6) X-Frame-Options

HTTP header fields are self-expandable, so various non-standard header fields may appear in Web server and browser applications.

The header field X-Frame-Options belongs to the HTTP response header and is used to control the display of website content in the Frame tag of other websites. Its main purpose is to prevent clickjacking attacks.

The following is a summary of the response headers, based on HTTP 1.1

What happens when you enter the URL in the address bar?

This question is also a frequently asked interview question. Let's discuss the process from when you enter the URL to the response.

First, you need to enter the URL you want to visit in the browser, as follows:

You shouldn't be able to access it, right?

Then, the browser will check whether the domain name is cached by the local DNS according to the URL address you entered. Different browsers have different DNS settings. If the browser has cached the URL address you want to access, it will directly return the IP address. If your URL address is not cached, the browser will initiate a system call to query whether the local hosts file has a configured IP address. If it is found, it will be returned directly. If it is not found, a DNS query will be initiated to the network.

First, let's take a look at what DNS is. There are two ways to identify hosts on the Internet, through host names and IP addresses. We humans like to remember by name, but the routing in the communication link prefers fixed-length, hierarchical IP addresses. Therefore, a service that can convert host names to IP addresses is needed, and this service is provided by DNS. The full name of DNS is Domain Name System. DNS is a distributed database implemented by hierarchical DNS servers. DNS runs on UDP and uses port 53.

DNS is a hierarchical database with the following main hierarchical structure:

The hierarchical structure of general domain name servers is mainly the above three types. In addition, there is another important type of DNS server, which is the local DNS server. Strictly speaking, the local DNS server does not belong to the above hierarchical structure, but the local DNS server is crucial. Each ISP (Internet Service Provider), such as the ISP in a residential area or the ISP of an institution, has a local DNS server. When a host connects to an ISP, the ISP will provide an IP address of a host, which will have one or more IP addresses of its local DNS servers. By accessing the network connection, users can easily determine the IP address of the DNS server. When the host sends a DNS request, the request is sent to the local DNS server, which acts as a proxy and forwards the request to the DNS server hierarchy system.

First, the query request will find the local DNS server to query whether it contains an IP address. If the local DNS cannot query the target IP address, it will initiate a DNS query to the root domain name server.

Note: DNS involves two query methods: one is recursive query and the other is iterative query. "Computer Networks: A Top-Down Approach" does not explain the difference between recursive query and iterative query. I searched online for information and got a general understanding.

  • If the root domain name server cannot tell the local DNS server which top-level domain name server to access next, a recursive query will be used;
  • If the root name server can tell the DNS server which top-level name server to contact next, it will use iterative query.

After going from the root domain name server -> top-level domain name server -> authoritative DNS server, the authoritative server tells the local server the target IP address, and then the local DNS server tells the user the IP address they need to access.

  • In the third step, the browser needs to establish a TCP connection with the target server, which requires a three-way handshake process. For the specific handshake process, please refer to the answer above.
  • After the connection is established, the browser will initiate an HTTP-GET request to the target server, including the URL. After HTTP 1.1, a long connection is used by default, and only one handshake is required to transmit data multiple times.
  • If the target server is just a simple page, it will be returned directly. However, for some large websites, they often do not directly return the page where the host name is located, but redirect directly. The returned status code is not 200, but 301, 302, a redirection code starting with 3. After the browser obtains the redirection response, it finds the redirection address in the Location item in the response message, and the browser can access it again in the first step.
  • The browser then resends the request with the new URL and returns a status code of 200 OK, indicating that the server can respond to the request and return the message.

How HTTPS works

We have described the working principle of HTTP above. Now let's talk about the working principle of HTTPS. Because we know that HTTPS is not a new protocol, but a

Therefore, when we discuss the handshake process of HTTPS, it is actually the handshake process of SSL/TLS.

TLS is an encryption protocol designed to provide communication security for the Internet. The TLS handshake is the process of initiating and using a TLS encrypted communication session. During the TLS handshake, the communicating parties on the Internet exchange information, verify the cipher suite, and exchange session keys.

A TLS handshake occurs every time a user navigates to a specific website over HTTPS and sends a request. In addition, a TLS handshake also occurs whenever any other communication uses HTTPS, including API calls and DNS queries over HTTPS.

The specific handshake process of TLS will vary depending on the type of key exchange algorithm used and the cipher suites supported by both parties. We will discuss this process using RSA asymmetric encryption. The entire TLS communication flow chart is as follows:

  • Before communication, the HTTP three-way handshake will be performed first. After the handshake is completed, the TLS handshake process will be performed.
  • ClientHello: The client initiates the handshake process by sending a hello message to the server. This message contains the TLS version number (TLS1.0, TLS1.2, TLS1.3) supported by the client, the cipher suite supported by the client, and a string of client random numbers.
  • ServerHello: After the client sends a hello message, the server sends a message that contains the server's SSL certificate, the cipher suite selected by the server, and a random number generated by the server.
  • Authentication: The client's certificate authority authenticates the SSL certificate and then sends a Certificate message containing the public key certificate. Finally, the server sends ServerHelloDone as a response to the hello request. The first part of the handshake phase ends.
  • Encryption phase: After the handshake in the first phase is completed, the client will send ClientKeyExchange as a response, which contains a key string called The premaster secret, which is the string encrypted using the public key certificate above. The client will then send ChangeCipherSpec to tell the server to use the private key to decrypt the premaster secret string, and then send Finished to tell the server that it has completed the transmission.

The session key is actually the public key encrypted with the public key certificate.

Secure asymmetric encryption is implemented: Then, the server sends ChangeCipherSpec and Finished to tell the client that the decryption is complete, thus implementing RSA asymmetric encryption.

<<:  Wi-Fi 7 is already here before Wi-Fi 6 is used?

>>:  Driven by the new infrastructure, will data center construction be "rushed"?

Recommend

V.PS: €4.17/month KVM-1GB/20GB/1TB/Hong Kong Data Center

V.PS is a site under xTOM, providing VPS hosts ba...

How intermittent-link ribbon fiber revolutionizes the communications industry

Fiber optic technology has revolutionized communi...

Ten techniques for API protocol design

In this digital age, our daily lives are filled w...

Gartner: China's IT spending is expected to grow 7.7% in 2021

According to the latest forecast by Gartner, the ...

Hostodo: $24.99/year-2GB/20G NVMe/5TB/Las Vegas, Spokane, and Miami data centers

Hostodo has launched a February Special Deal prom...

E-commerce past | Farewell to public domain traffic

[[420018]] It is not difficult to see that 2020 i...

MWC19 Shanghai | Ruijie and the operator industry jointly create a 5G world

[[268489]] Mobile communications, starting with G...

Can the United States' 6G layout surpass 5G and surpass my country?

At the 2019 Mobile World Congress, Huawei brought...