Is the backend a bit cumbersome? Go to the client!

Is the backend a bit cumbersome? Go to the client!

Hello everyone, I am Xiaolin.

Among Internet positions, back-end development is the most competitive and has the most applicants, but client-side development has very few applicants. Students who apply for the back-end position will be picked up by the client department for interviews. So if you can't get into the back-end position but want to join a big company, you can try applying for client-side development, where the interview is relatively less competitive and the salary and benefits are the same as those for the back-end position.

Today I would like to share the interview experience of a Kuaishou client in the first, second and third rounds. The classmate's technology stack is C++ backend, but the interview will not ask about the backend content. The questions mainly revolve around Cpp+operating system+network protocol+algorithm. Compared with the backend, the content that needs to be prepared is less, such as mysql, redis, message queue and other backend components, but the depth of computing basics will be asked in a deeper level.

Especially on the third page, it was still a bit difficult to write out the two scenario code questions by hand.

Kuaishou

Congestion Control Introduction

When the network is congested, if a large number of data packets continue to be sent, it may cause data packet delays and losses. At this time, TCP will retransmit the data, but retransmission will cause a heavier burden on the network, resulting in greater delays and more packet losses. This situation will enter a vicious cycle and be continuously amplified....

Therefore, TCP cannot ignore what happens on the network. It is designed as a selfless protocol. When the network is congested, TCP will sacrifice itself and reduce the amount of data sent.

So, there is congestion control, the purpose of which is to prevent the "sender"'s data from filling up the entire network.

In order to regulate the amount of data to be sent on the "sender side", a concept called "congestion window" is defined.

Congestion control mainly consists of four algorithms:

  • Slow Start
  • Congestion Avoidance
  • Congestion occurs
  • Fast recovery

Slow Start

After TCP has just established a connection, it first has a slow start process. This slow start means increasing the number of data packets sent little by little. If a large amount of data is sent right away, wouldn't this cause congestion to the network?

The slow start algorithm only requires one rule to be remembered: each time the sender receives an ACK, the size of the congestion window cwnd increases by 1.

Here it is assumed that the congestion window cwnd and the send window swnd are equal. Here is an example:

  • After the connection is established, cwnd = 1 is initialized at the beginning, indicating that data of an MSS size can be transmitted.
  • When an ACK is received, cwnd is increased by 1, so 2 packets can be sent at a time.
  • After receiving 2 ACKs, cwnd increases by 2, so 2 more packets can be sent than before, so 4 packets can be sent this time.
  • When these 4 ACK confirmations arrive, the cwnd of each confirmation increases by 1, and the cwnd of the 4 confirmations increases by 4, so 4 more can be sent than before, so 8 can be sent this time.

The change process of the slow start algorithm is as follows:

Slow start algorithm

It can be seen that with the slow start algorithm, the number of packets sent increases exponentially.

So when will the slow start price stop?

There is a state variable called slow start threshold ssthresh (slow start threshold).

  • When cwnd < ssthresh, the slow start algorithm is used.
  • When cwnd >= ssthresh, the "congestion avoidance algorithm" will be used.

Congestion Avoidance

When the congestion window cwnd "exceeds" the slow start threshold ssthresh, the congestion avoidance algorithm will be entered.

Generally speaking, the size of ssthresh is 65535 bytes.

Then after entering the congestion avoidance algorithm, its rule is: every time an ACK is received, cwnd increases by 1/cwnd.

Continuing with the previous slow start example, let's assume that ssthresh is 8:

  • When 8 ACK confirmations arrive, each confirmation increases by 1/8, and the cwnd of 8 ACK confirmations increases by 1 in total, so 9 MSS of data can be sent this time, which becomes a linear growth.

The change process of the congestion avoidance algorithm is as follows:

Congestion Avoidance

Therefore, we can find that the congestion avoidance algorithm changes the exponential growth of the original slow start algorithm into linear growth. It is still in the growth stage, but the growth rate is slower.

As the traffic keeps growing, the network will gradually become congested, causing packet loss. At this time, the lost data packets will need to be retransmitted.

When the retransmission mechanism is triggered, the "congestion occurrence algorithm" is entered.

Congestion occurs

When the network is congested, data packets will be retransmitted. There are two main retransmission mechanisms:

  • Timeout retransmission
  • Fast Retransmit

When a "timeout retransmission" occurs, the congestion occurrence algorithm will be used.

At this time, the values ​​of ssthresh and cwnd will change:

  • ssthresh is set to cwnd/2,
  • cwnd is reset to 1 (it is restored to the initialization value of cwnd, I assume here that the initialization value of cwnd is 1)

The changes in the congestion occurrence algorithm are shown in the following figure:

Congested sending - timeout retransmission

Then, we restart the slow start, which will suddenly reduce the data flow. This is really like returning to the pre-liberation era once the "timeout retransmission" occurs. However, this method is too radical and the reaction is too strong, which will cause network lag.

There is a better way. We have talked about the "fast retransmit algorithm" before. When the receiver finds that an intermediate packet is lost, it sends the ACK of the previous packet three times, so the sender will retransmit quickly without waiting for a timeout.

TCP considers this situation not serious because most of the packets are not lost, only a small part is lost. Then the changes of ssthresh and cwnd are as follows:

  • cwnd = cwnd/2, which means it is set to half of the original value;
  • ssthresh = cwnd;
  • Enter the fast recovery algorithm

Fast recovery

Fast retransmit and fast recovery algorithms are generally used at the same time. The fast recovery algorithm believes that if you can still receive 3 duplicate ACKs, it means that the network is not that bad, so there is no need to be as strong as RTO timeout.

As mentioned before, before entering fast recovery, cwnd and ssthresh are updated:

  • cwnd = cwnd/2, which means it is set to half of the original value;
  • ssthresh = cwnd;

Then, enter the fast recovery algorithm as follows:

  • Congestion window cwnd = ssthresh + 3 (3 means that 3 data packets are confirmed to have been received);
  • Retransmit lost packets;
  • If a duplicate ACK is received, cwnd increases by 1;
  • If an ACK for new data is received, cwnd is set to the value of ssthresh in the first step. This is because the ACK confirms the new data, indicating that all data from the duplicated ACK has been received, and the recovery process is over. The state before recovery can be returned, that is, the congestion avoidance state is entered again.

The change process of the fast recovery algorithm is as follows:

Fast retransmit and fast recovery


In other words, the situation did not return to the pre-liberation era overnight like "timeout retransmission", but it remained at a relatively high value and subsequently increased linearly.

What is the difference between http and https?

  • HTTP is a hypertext transfer protocol, and information is transmitted in plain text, which poses a security risk. HTTPS solves the insecurity of HTTP by adding the SSL/TLS security protocol between the TCP and HTTP network layers, allowing messages to be transmitted in encrypted form.
  • It is relatively simple to establish an HTTP connection. After the TCP three-way handshake, HTTP message transmission can be carried out. However, after the TCP three-way handshake, HTTPS also needs to go through the SSL/TLS handshake process before it can enter the encrypted message transmission.
  • The default ports for the two are different. The default port number for HTTP is 80, and the default port number for HTTPS is 443.
  • The HTTPS protocol requires applying for a digital certificate from a CA (certificate authority) to ensure that the server's identity is credible.

Https four encryption processes?

The TLS handshake process based on the RSA algorithm is relatively easy to understand, so here we will use this to show you the TLS handshake process, as shown below:

HTTPS connection establishment process

Detailed process of establishing TLS protocol:

1. ClientHello

First, the client initiates an encrypted communication request to the server, which is a ClientHello request.

In this step, the client mainly sends the following information to the server:

(1) The TLS protocol version supported by the client, such as TLS 1.2.

(2) The random number generated by the client (Client Random), which is later used to generate one of the "session key" conditions.

(3) A list of cipher suites supported by the client, such as the RSA encryption algorithm.

2. SeverHello

After receiving the client's request, the server sends a response to the client, which is SeverHello. The server's response contains the following:

(1) Confirm the TLS protocol version. If the browser does not support it, disable encrypted communication.

(2) The random number generated by the server (Server Random) is also one of the conditions used to generate the "session key" later.

(3) A list of confirmed cipher suites, such as the RSA encryption algorithm.

(4) The server’s digital certificate.

3. Client response

After receiving the response from the server, the client first confirms the authenticity of the server's digital certificate through the CA public key in the browser or operating system.

If there is no problem with the certificate, the client will extract the server's public key from the digital certificate, and then use it to encrypt the message and send the following information to the server:

(1) A random number (pre-master key). This random number will be encrypted by the server public key.

(2) Notification of change in encryption algorithm, indicating that all subsequent communications will be encrypted using the "session key".

(3) Client handshake end notification, indicating that the client's handshake phase has ended. This item also summarizes the data of all previous contents for verification by the server.

The random number in the first item above is the third random number in the entire handshake phase and will be sent to the server, so this random number is the same for both the client and the server.

The server and the client have these three random numbers (Client Random, Server Random, pre-master key), and then use the encryption algorithm agreed upon by both parties to generate a "session key" for this communication.

4. Final response from the server

After the server receives the third random number (pre-master key) from the client, it calculates the "session key" for this communication through the negotiated encryption algorithm.

Then, send the final message to the client:

(1) Notification of a change in the encryption communication algorithm, indicating that all subsequent communications will be encrypted using the "session key".

(2) Server handshake end notification, indicating that the server's handshake phase has ended. This item also summarizes the data of all previous contents for client verification.

At this point, the entire TLS handshake phase is over. Next, the client and server enter into encrypted communication, using the normal HTTP protocol, but using the "session key" to encrypt the content.

What is the difference between get and post?

  • According to the RFC specification, the semantics of GET is to obtain the specified resource from the server. This resource can be static text, web pages, pictures, videos, etc. The parameter position of the GET request is generally written in the URL. The URL stipulates that it can only support ASCII, so the parameters of the GET request only allow ASCII characters, and the browser will limit the length of the URL (the HTTP protocol itself does not make any regulations on the length of the URL).
  • According to the RFC specification, the semantics of POST is to process the specified resource according to the request payload (message body), and the specific processing method varies depending on the resource type. The location of the data carried by the POST request is generally written in the message body. The data in the body can be in any format as long as the client and the server negotiate, and the browser will not limit the body size.

What is the difference between a process and a thread?

  • Resource usage: Each process has an independent address space, file descriptors and other system resources. The resources between processes are independent of each other, while threads share the address space and resources of their own processes, including file descriptors, signal processing, etc.
  • Scheduling and switching: Processes are the basic units for scheduling and allocating resources in the system, and the switching overhead between processes is relatively large. Threads are executed within processes, and the switching overhead of threads is relatively small.
  • Communication and synchronization: The communication between processes includes pipes, message queues, shared memory, etc., and the communication between processes is relatively complex. Threads share the memory space of the process, and communication and synchronization can be achieved by directly reading and writing shared data.

What resources does a thread have and what is stored in the stack?

Threads have some specific resources in the operating system, including:

  • Thread Control Block (TCB): used to save thread status information, such as program counter (PC), register value, thread ID, thread priority, etc.
  • Stack: Each thread has its own stack space, which is used to store local variables of function calls, function return addresses, and other temporary data. The stack is private to the thread, and the stacks of different threads are independent of each other.
  • Registers: Threads use registers during execution, including general registers (such as EAX, EBX, etc.), program counter (PC), etc. Registers save the current execution state of the thread.
  • Shared resources: Threads can share the resources of the process to which they belong, such as open files, signal handlers, etc. These resources can be shared and accessed between threads.

The stack mainly stores the following contents:

  • Local variables of function calls: When a function is called, its local variables are saved in the stack. These local variables are destroyed after the function is executed.
  • Function return address: When a function is executed, it needs to return to the address that called the function. The return address will be saved in the stack so that the function can return correctly after execution.
  • Temporary data during function calls: During the execution of a function, it may be necessary to save some temporary data, such as function parameters, intermediate calculation results, etc. These data will be saved in the stack.

When calling a function, what happens when the stack is pushed?

When a function is called, the following stack operations are performed:

  • Save the return address: Before calling a function, the call instruction will push the address of the next instruction (that is, the address that needs to be continued after the function is called) into the stack so that it can return to the calling point correctly after the function is executed.
  • Save the caller's stack frame pointer: Before calling a function, the call instruction will push the current stack frame pointer (that is, the caller's stack pointer) into the stack so that the execution state can be restored to the caller after the function is executed.
  • Passing parameters: When a function is called, the parameter values ​​are pushed into the stack one by one, and these parameter values ​​can be accessed through the stack inside the function.
  • Allocate local variable space: When a function is called, space is allocated for local variables, which are stored on the stack. The stack pointer moves accordingly to accommodate the new local variable space.

What is the difference between static link libraries and dynamic link libraries?

  • Linking method: The static link library will be completely copied into the executable file during compilation and linking, becoming a part of the executable file; while the dynamic link library will only include a reference to the library in the executable file during compilation and linking, and the actual library file will be dynamically loaded by the operating system at runtime.
  • File size: Static link libraries will increase the size of the executable file because the library code is copied completely into the executable file; dynamic link libraries will not increase the size of the executable file because the library code is not loaded until runtime.
  • Memory usage: Static link libraries will be completely loaded into memory at runtime, occupying a fixed memory space; dynamic link libraries will only be loaded at runtime and can be shared between multiple processes, reducing memory usage.
  • Scalability: Dynamic link libraries have better scalability and can replace or add new library files without modifying the executable file, while static link libraries need to be recompiled and linked.

How to load dynamic link library into memory?

By using mmap to map the library directly into the address space of each process, although each process believes that the library is loaded in its address space, there is actually only one copy of the library in the memory. In this way, mmap is magically linked to the dynamic link library.

picture

Introduction to virtual memory

  • First, virtual memory allows the process to run with memory that exceeds the size of physical memory. Because program execution conforms to the principle of locality, the CPU has a very obvious tendency to access memory repeatedly. For memory that is not frequently used, we can swap it out of physical memory, such as the swap area on the hard disk.
  • Second, since each process has its own page table, the virtual memory space of each process is independent of each other. There is no way for a process to access the page tables of other processes, so these page tables are private, which solves the problem of address conflicts between multiple processes.
  • Third, in addition to the physical address, the page table entry in the page table also has some bits that mark attributes, such as controlling the read and write permissions of a page, marking whether the page exists, etc. In terms of memory access, the operating system provides better security.

What is an interrupt?

In an operating system, an interrupt is an event triggered by hardware or software that temporarily stops the currently executing program and executes the interrupt-related handler instead. Interrupts can be internal, such as a division error or out-of-bounds access, or external, such as an input/output request from a hardware device or a clock interrupt.

The role of interrupts is to provide a mechanism to handle and respond to various events, such as processing input/output requests from hardware devices, handling exceptions, performing clock scheduling, etc. When an interrupt occurs, the operating system determines the interrupt handler to be executed based on the interrupt type, and resumes the original program execution after handling the interrupt.

The interrupt handler can perform a series of operations, such as saving the context of the current process, handling interrupt events, communicating with devices, scheduling other processes, etc. By using interrupts, the operating system can achieve multitasking, respond to external events in real time, and improve the reliability and stability of the system.

The operating system's lock, implement a read-write lock yourself

Readers only read data and do not modify it, while writers can both read and modify data.

Reader-Writer Problem Description:

  • "Read-Read" allows: multiple readers to read at the same time
  • "Read-Write" Mutual Exclusion: Readers can read when there are no writers, and writers can write when there are no readers
  • "Write-Write" Mutual Exclusion: A writer can only write when there are no other writers

Next, several solutions are proposed for analysis.

Solution 1

Use semaphore to try to solve the problem:

  • Semaphore wMutex: Mutex semaphore that controls write operations, with an initial value of 1;
  • Reader count rCount: the number of readers performing read operations, initialized to 0;
  • Semaphore rCountMutex: controls the mutual exclusive modification of the rCount reader counter, with an initial value of 1;

Next, let’s look at the implementation of the code:

picture

The above implementation is a reader-first strategy, because as long as there are readers reading, later readers can enter directly. If readers continue to enter, writers will be in a hungry state.

Solution 2

Since there is a reader-first strategy, there is also a writer-first strategy:

  • As long as there is a writer ready to write, the writer should perform the write operation as soon as possible, and subsequent readers must block;
  • If there are writers who keep writing, readers will be starved;

Based on Solution 1, the following variables are added:

  • Semaphore rMutex: The mutex semaphore that controls the reader's entry, with an initial value of 1;
  • Semaphore wDataMutex: The mutex semaphore that controls the writer's write operation, with an initial value of 1;
  • Writer count wCount: records the number of writers, the initial value is 0;
  • Semaphore wCountMutex: controls the mutex modification of wCount, the initial value is 1;

The specific implementation code is as follows:

Note that the role of rMutex here is that at the beginning, there are multiple readers reading data, and they all enter the reader queue. At this time, a writer comes and executes P(rMutex). Subsequent readers are blocked on rMutex and cannot enter the reader queue. When a writer arrives, they can all enter the writer queue, thus ensuring writer priority.

At the same time, after the first writer executes P(rMutex), it cannot start writing immediately. It must wait until all readers entering the reader queue have completed the read operation and wake up the writer's write operation through V(wDataMutex).

Option 3

Since both the reader-first strategy and the writer-first strategy will cause starvation, let's implement a fair strategy.

Fair strategy:

  • The priority is the same;
  • Writers and readers have mutually exclusive access;
  • Only one writer can access the critical section;
  • Multiple readers can access critical resources simultaneously;

Specific code implementation:

picture

After reading the code, I wonder if you have such a question: Why does adding a semaphore flag achieve fair competition?

Comparing with the reader priority strategy of Solution 1, it can be found that in reader priority, as long as there are readers arriving later, the readers can enter the reader queue, while the writers must wait until no readers arrive.

If no readers arrive, the reader queue will be empty, that is, rCount==0, and the writer can enter the critical section to perform write operations.

The role of the flag here is to prevent the reader from having this special permission (the special permission is that as long as the reader arrives, he can enter the reader queue).

For example, at the beginning, some readers come to read data and they all enter the reader queue. At this time, a writer comes and executes the P(flag) operation, causing subsequent readers to be blocked on the flag and unable to enter the reader queue. This will gradually empty the reader queue, that is, rCount is reduced to 0.

The writer cannot start writing immediately (because the reader queue is not empty at this time), and will be blocked on the semaphore wDataMutex. After all readers in the reader queue have finished reading, the last reader process executes V(wDataMutex) to wake up the previous writer, and the writer continues to start the write operation.

Algorithm Questions

  • Binary tree level-order traversal builds a binary tree test

Kuaishou Second Interview

The concept of pointer and reference value passing

  • Value passing means copying the value of a parameter and passing it to a function or method for operation. In value passing, the function or method can modify the parameter without affecting the original variable value.
  • Pointer reference means passing the memory address of a parameter to a function or method, so that the function or method can directly access and modify the value of the original variable. In a pointer reference, the modification of the parameter by the function or method will be directly reflected in the original variable.

What will cause the precision to be lost when int double string is forced to be converted?

  • Integer to floating point: Integer types are represented exactly, while floating point types are represented approximately, with a fixed number of significant digits. When converting an integer to a floating point number, a loss of precision may occur because the floating point number cannot exactly represent all the digits of the integer.
  • Floating point to integer: Floating point types have a fractional part and an exponent part, while integer types can only represent integer values. When converting a floating point number to an integer, the fractional part will be discarded, which may result in loss of precision.
  • String to numeric type: String represents data in text form, while numeric type represents data in numeric form. When converting a string to a numeric type, if the string cannot be parsed into a valid numeric value, or the numeric value represented by the string exceeds the range of the target type, it will cause precision loss or produce an incorrect result.

What is void*?

void* is a general pointer type, known as an "untyped pointer". It can be used to represent a pointer to any type, because a void* pointer does not specify a specific data type.

Since void* is typeless, it cannot be directly dereferenced or pointer arithmetic. When using a void* pointer, it needs to be converted to a specific pointer type before it can be operated.

Void* pointers are often used in situations where common operations between different types are required, such as passing pointer parameters of arbitrary types in functions or using them in dynamic memory allocation.

How to convert void* to int* in malloc parameter list?

You can use type conversion to convert a void* pointer to an int* pointer. The following is a sample code to convert a void* pointer to an int* pointer:

 void* voidPtr = malloc(sizeof(int)); // 分配内存并返回void*指针int* intPtr = (int*)voidPtr; // 将void*指针转化为int*指针// 现在可以通过intPtr指针访问int类型的数据*intPtr = 42;

In the above example, the malloc function is used to allocate the memory required to store an int type data and returns a void* pointer. Then, by converting the void* pointer to an int* pointer, it is assigned to the intPtr variable. Now, the int type data can be accessed and operated through the intPtr pointer.

Algorithm Questions

  • Find a number from an array that satisfies the condition that it is larger than the left side and smaller than the right side.

Kuaishou three sides

  • n threads print numbers in order
  • Design a snake game, snake data structure and snake body update and collision detection

The third interview did not ask any stereotyped questions, but directly asked two scenario code questions, which focused more on practical operation. It was too difficult.

<<:  TCP send window, receive window and how they work

>>:  Smartpedia | What is a quantum network?

Recommend

Top SD-WAN vendors and manufacturers in 2021

Software-defined WAN (SD-WAN), as the name implie...

A brief talk about link aggregation, have you learned it?

Speaking of airport expressways, people often use...

Cybersecurity risks of smart devices

Many people don’t consider the risks that smart d...

Understanding WiFi 6 Features for Wave 1 and Wave 2

The rollout of Wi-Fi 6 will consist of two waves ...

Technology “hidden” by life: WIFI

Nowadays, we use WIFI so many times every day tha...

Talk: It's time to talk about what IPv4 and IPv6 are

On November 25, 2019, the RIPE NCC, which is resp...