How to implement online documents for multi-person collaboration

How to implement online documents for multi-person collaboration

Due to business needs, I came into contact with online documents and online Excel at work. However, during the research stage, I found that there were few relevant articles in China, so I wrote a few articles to analyze some technical solutions for implementing online documents and online Excel based on my work practice and my own thoughts. In order to avoid involving company privacy, the design of some data structures and non-critical scenarios in the article are relatively brief. We mainly introduce how to implement online documents for multi-person collaboration from the aspects of demand analysis, solution design, and technology selection.

Demand Analysis

We use the idea of ​​domain-driven model to conduct demand analysis. The demand includes two entities: "person" and "document" . The main attributes of "person" are: user ID, user name. The main attributes of "document" are: document ID, document content, creator, creation time. The relationship between people and documents is very simple: one person can have multiple documents, and one document belongs to only one person, which is a one-to-many relationship.

Because the document content cannot be read or modified at will, permission management is also required. "Permission" is a value object. The values ​​of permission are: read, edit. People can have read permission or edit permission for documents.

Another key issue is "collaboration" . Collaboration is when multiple "people" work on a document at the same time. During the collaboration process, the content edited by multiple people needs to be merged and converted into the final saved document content. During the collaboration process, the document editor needs to be able to see the current "collaborating object" and the "real-time editing content" of the collaborative object.

In order to achieve the above functions, we split the system into five modules: personnel management, document management, permission management, collaboration and front-end document editor.

Solution Design

People Management

Because personnel management is not the focus of this article, we will only make a brief design, mainly to assist in explaining the subsequent design.

The main fields of the table structure are:

The main logic of this module is introduced below.

User Registration

  1. The front end encrypts the user name, password, mobile phone number and other information filled in by the user and sends it to the server.
  2. The server gets the data and stores it in the table together with the generated unique user ID.

User login

  1. The front end requires the user to enter the username + password and send it to the server, which verifies the correctness of the username and password.
  2. After the verification is passed, a time-limited Token is generated based on "user name + password + key + timestamp" and returned to the client.
  3. After logging in, all requests from the front end carry Token information. The server obtains the current logged-in user information based on the Token and determines whether the request is legal.

Document Management

Document table structure design:

The main logic of this module is introduced below.

Create a document

  1. The front end sends the document name and content to the server
  2. The server generates a unique document ID, obtains the user ID from the token, obtains the server time, and then stores the data in the database
  3. The server returns the document ID to the front end

Modify the document

The modification here refers to modifying the content of the document. In order to save the content edited by the user in time, we need to pass the data to the server in real time during the user editing process. If the entire document content is sent to the server every time, although the processing logic of the server will be simpler, there will be a lot of redundant data in each request, which wastes a lot of bandwidth. Therefore, it is best to only send the changed content to the server, and let the server merge the current document content and the changed content to generate the latest document content.

How to send the changed content? We can divide the user's operations on the document content into three categories: "add, modify, and delete" . Adding is to add content to the document, modifying is to modify a certain section of the document, and deleting is to delete a certain section of the document. We can use Json to represent these three types of operations:

  1. {
  2. op: "" , // Operation add: add, update: modify, delete: delete  
  3. start: "" , // start position index  
  4. end: "" , // end position subscript  
  5. text: "" , // Modify content  
  6. }

The process of modifying the document is

  1. The front end generates modified data and sends it to the server
  2. The server obtains the document content from the database, then merges the operations based on the user's behavior, and finally saves it to the database.

When users edit an article, they often need to transfer data many times. Although the Json data format can express semantics well, it also requires sending more bytes each time, wasting bandwidth; and the serialization and deserialization process of Json is relatively inefficient. We can use Google's Protobuf protocol instead. It is a binary-based transmission protocol that is stronger than Json in terms of transmission content size and parsing speed.

  1. message Doc {
  2. enum Op{
  3. add: 0 ;
  4. update: 1 ;
  5. delete: 2 ;
  6. }
  7. required Op op = 0 ;
  8. required int32 start = 1 ;
  9. required int32 end = 5 ;
  10. string text = "Modify content"      
  11. }

The protocol here is relatively simple, and you can concatenate strings according to the rules. Considering the scalability of subsequent functions, it is recommended to use the Protobuf protocol.

There is also a sequence problem in the process of modifying documents. Let's assume that the user's operation is as follows:

  1. The user first deleted 5 words "12345"
  2. Added 5 words "one two three four five"
  3. The first two words were changed to "Hello"

The final result under normal order is: "Hello three four five" . But if the execution order on the server becomes 3, 1, 2, the final result becomes: "one two three four five" . This is obviously not in line with expectations. So we must ensure that the server processes the requests sent by the front end in order.

There are several solutions to ensure sequential execution:

  1. Front-end request changes from asynchronous to synchronous
  2. The front-end generates a continuously increasing ID for each request. If the server determines that the increasing ID is not continuous, it will wait for a short time.
  3. The operations on the same document are proxied to the same server. A single process on the server receives the request and stores the data in an ordered queue. The consumer of the queue can consume the data normally.

Solution 1 has low processing efficiency when there are many requests, which will affect the user experience and can be directly eliminated. Solution 2 mainly relies on the client to generate an incremental ID, which is a good solution. Solution 3 relies on a single process and an ordered queue to ensure the order. Although a single process is difficult to withstand pressure when the concurrency is high, if load balancing is performed based on the document ID, the traffic can be better controlled. After all, the QPS for modifying a document is not that high.
Of course, you can also combine Option 2 and Option 3.

View Documentation

  1. The front end sends the document ID to be viewed to the server
  2. The server returns the document content based on the document ID

Deleting a document

  1. The front end sends the document ID to be deleted to the server
  2. The server deletes the corresponding document according to the document ID

Permission Management

The current required permission scenarios are particularly suitable for the "ABAC" permission model.
User attributes: Any normal logged-in user is OK Environment attributes: Normal document content Operation attributes: Read and write documents Object attributes: Document

Therefore, the main fields of the table structure where we store permission information are:

The main logic of this module is introduced below.

Activate permissions

  1. The front end sends the document ID and permission type (read/write) to the server
  2. The server adds a record to the permission table based on the document ID and the user ID in the token, and returns a success message.

Remove permissions

  1. The front end sends the document ID and permission type (read/write) to the server
  2. The server deletes the record in the permission table based on the document ID and the user ID in the token, and returns a success message.

Verify permissions

We can implement an intermediate key to determine whether the user is the creator when requesting a document. If not, we can query the permission table to see if the user has the right to view or edit the document. The same logic applies when modifying the document content, so I will not go into details.

cooperation

Merge Conflicts

When multiple people modify a document at the same time, there are several ways to handle content conflicts:

  1. Document lock: When someone modifies a document, the entire document is locked, and others can only view it but not edit it. Although it is simple to implement, the collaborative experience will be particularly poor.
  2. Diff+patch merging algorithm: Diff+patch is a commonly used document content comparison and merging algorithm. Linux itself provides diff and patch commands to support file comparison and merging. Git also uses the diff+patch method to merge files. When conflicts cannot be resolved, the conflict will be thrown to the user for manual merging.
  3. OT algorithm: Compared with diff+patch, OT algorithm can often bring better merging results. However, the implementation of OT algorithm is also more complicated. Currently, Google Docs, Tencent Docs, Graphite Docs, etc. all use OT algorithm. We will write a separate article to talk about diff+patch and OT algorithm later.

Collaboration Notification

For better collaboration, document editors need to see who is editing the document at the same time, and also need to see the content modified by others to reduce conflicts and achieve the purpose of collaboration.

The time when everyone opens the document editing page is not synchronized. In order for everyone to "see each other" and see each other's "modified content" , the server needs to actively push messages to the client. In this scenario, the long link solution is more appropriate.

As mentioned above, when documents are frequently modified, the frequency of data transmission will be very high. If it is an HTTP request, each request will carry header information, establish a connection and other overhead, so the data report of the modified document content can also use a long link. At the same time, the server maintains a "collaboration list" to store all documents being edited and the online users of each document, which can be compared to a chat room.

Document editor joins

  1. When the front end opens a document, it sends a request to the server, and the server checks whether the current document is in the collaboration list.
  2. If yes, add the current user to the document modifiers list; if no, add the current document to the collaboration list and write the current user ID into it.
  3. The server pushes a message to all other users in the document list through the long link to inform everyone that a user has joined the collaboration.

The logic for document modifiers to exit is basically the same as that for joining, so I will not go into details.

「Modify content」

  1. The front end sends the modified data to the server
  2. The server temporarily stores the operations of multiple users, merges the user operations according to the OT algorithm, and finally merges them with the document content stored in the database
  3. Save the merged document content to the database
  4. The server reads the users in the collaboration list based on the document ID and sends the merged result to all users.
  5. The client merges the merge result with the local document content

Document editor

「Edit function」

The document editor needs to support a series of functions such as document content editing, text style adjustment, image insertion, link insertion, etc. The solutions to achieve content editability include textarea tag and contenteditable attribute. However, the textarea tag is difficult to support other requirements and is not easy to control. Therefore, we choose contenteditable=true to implement document content editing.

Reporting function

When modifying a document, the content reporting needs to be handled by the document editor. It is not possible to report every time a user enters a character. Such frequent data transmission will put a lot of pressure on the server and is unnecessary. Therefore, the client's reporting needs to be de-shaked. During the reporting process, the reporting may fail due to network interruptions and other reasons. In this case, a retry is required.

The service may fail to report data due to a long network interruption or server abnormality. In this case, the front end can first store the user's changes in the browser's local LocalStorage, but it should be noted that the browser's local cache usually has a size limit of 5M. In addition to playing a role in network abnormalities, local storage can also play a role in recording previous operations when implementing Ctrl+Z operations.

"Long Link"

A separate module is needed to manage long links, uniformly process the reported interfaces and data sent by the server, and do a good job of data packaging and parsing; and provide timely feedback to users on successful connection, successful data saving, connection abnormalities and other information.

「Loading large documents」

For loading large documents, we need to do asynchronous processing. According to the scroll bar scroll position, more data is asynchronously obtained on the server side.

「Other functions」

We can split other modules of the front-end editor according to their functional scope: for example, modules that control text size and color; modules that control text alignment; modules that control content insertion; modules that support Ctrl+C, Ctrl+V, Ctrl+Z, etc. After splitting, you can implement them according to their functions, so I won't analyze them one by one here.

Technology Selection

"storage"

In terms of storage, a relational database is more suitable for the current scenario. We can choose a suitable database based on the number of documents and users. Usually, MySQL is sufficient for data volumes of tens of millions. If the data volume is larger, we can choose TIDB or MySQL sharding.
Of course, MongoDB and PG can also meet the needs, and we can choose based on the company's DBA operation and maintenance capabilities.

In addition, if you consider searching for document content, it is not enough to choose only one storage structure. You need to create an index for the document separately. You can choose to use an ES cluster to create a full-text index for the document. Moreover, index creation is a more time-consuming operation compared to MySQL's addition, deletion, and modification, so index creation is often placed in an asynchronous process. Moreover, users rarely search for a document immediately after creating it.

「Long link」

Currently, the commonly used long connection solutions are "HTTP/2 + SSE" and WebSocket. WebSocket is more mature and is the preferred choice.

"Message Queue"

RocketMQ is recommended because

  1. RocketMQ supports synchronous/asynchronous disk flushing and synchronous/asynchronous copy writing, which ensures the reliability of messages.
  2. RocketMQ supports sequenced messages.

Of course, the write performance is weaker than Kafka, but in the online document scenario, the reliability and order of messages are more important.

Architecture Design

Based on the above analysis, the deployment architecture diagram we designed is as follows

The access layer is responsible for user authentication and long link maintenance; other modules are responsible for their own functions. We use MQ to send the document modification queue, and the document management module consumes it. We use Redis to store the corresponding relationship between documents and users when multiple people collaborate. Of course, when the amount of data is not large, MQ can also be temporarily replaced by Redis.

Summarize

The above is my analysis and design of multi-person collaborative online documents, which includes the front-end and back-end interaction process, document storage, and service deployment. In order to highlight the main issues and logic, many design and technical points in the article are just mentioned briefly. If you find any problems that may cause trouble, you can search for keywords or leave a message to communicate.

<<:  The 5G era is accelerating. When will edge computing replace "core" computing?

>>:  It has been almost three years since number portability was implemented. Who is the winner among the three major operators?

Blog    

Recommend

MIIT talks about 6G: Breakthroughs in key core technologies are needed

According to the news from the Ministry of Indust...

How do these countries plan their 5G breakthrough amid the COVID-19 crisis?

5G is a new technology field that all countries a...

A brief analysis of the technical difficulty of "number portability"

According to the Ministry of Industry and Informa...

Many manufacturers are competing to lay out the Wi-Fi 6 industry chain

Recently, manufacturers such as Samsung, Huawei, ...

Getting to the bottom of HTTP and WebSocket protocols

I was chatting with my boss that day and we menti...

How do the three major operators promote cloud-network integration?

In recent years, with the rapid development of cl...

How is the operator's name displayed on your phone?

[[379606]] This article is reprinted from the WeC...

Record an incident where a network request connection timed out

[[338985]] This article is reprinted from the WeC...

Understanding Cloud Networks in One Article

​Enterprise digital transformation has promoted t...

RackNerd March Promotion: KVM for 5 Data Centers starts at $14.99 per year

Although it is the end of February, RackNerd has ...

Mobile performance optimization series - startup speed

Mobile performance has a crucial impact on user e...