1. IntroductionThe fulfillment time is the lifeline of e-commerce and is directly related to the user's consumption experience. According to Xinhuanet's [5] 2022 Double Eleven report, 37.4% of respondents hope for next-day delivery, and 29.91% hope for same-day delivery. Compared with other items, respondents have higher requirements for the logistics timeliness of mobile phones, computers, and digital products, and prefer to receive the goods on the same day or within 1-2 days. In the Dewu fulfillment scenario, the main stages include production in the warehouse and delivery by third-party carriers. When the user pays, Dewu will give the user a promised time limit based on the warehouse's production situation and transportation resources. 1.1 Why do we need to predict the carrier’s route efficiency?During the fulfillment process, Dewu needs to monitor the flow of orders and promptly discover orders that may be overdue (compared with the time limit promised to the user). This includes monitoring of warehouse production and third-party distribution. In the actual process, we found that when the distribution node changes, the forecast given by the carrier is conservative. In the following example, the carrier only gave a more accurate estimated delivery time when it arrived at the sales department, so it is easy to misreport when using the carrier's estimated delivery time at the sorting center.
The figure below is the relaxation index of the estimated delivery time returned by the carrier interface. It can be seen that the promised time is more accurate when approaching the destination. 2. How does the carrier network work?Before building a carrier network, you need to understand how the carrier network works. The following is a schematic diagram of the delivery from point A to point E, which is divided into the following contents: (1) Nodes, including collection and delivery outlets and sorting centers. (2) Lines, including trunk lines and branch lines. For example, the line from a network point to a sorting center is a branch line, and the line from a sorting center to another sorting center is a trunk line. (3) Shifts: In order to balance costs and timeliness, carriers will set up production shifts. After arriving at the sorting center, the goods need to be sorted according to the destination. When a certain amount of goods arrives, they will depart from the sorting center and head to the next node. When setting shifts, carriers will consider the number of orders, as well as the cost and timeliness of transportation. In the above picture: Taking purple as an example, at outlet A, the order is closed at 8 am, that is, the goods handed over to the carrier before 8 am will be sealed at around 8:20 am, and then depart from the outlet to sorting center B. The time to arrive at sorting center B is 11:40, and at this time it catches up with the shift with a cut-off time of 12 o'clock at sorting center B. Sorting center B will complete the sorting at 12:30 and go to the next sorting center, and so on to complete the entire delivery process. When building a carrier's network, modeling is required. In addition to nodes, routes, and shifts, the core also includes the following two models: (5) Finished product line, that is, from point A to point E through all nodes. In the above figure: point A-sorting center B-sorting center C-sorting center D-point E constitutes a finished product line. (6) Finished product line wave: Because there are waves at the nodes, there are also waves at the finished product line. In fact, the wave number of the finished product line is the same as the wave number of the first node. 3. How to build a carrier networkAfter understanding how the carrier network works, you need to start building the carrier network. The carrier will push the tracking information to Dewu, and the content is similar to the following text. [ 3.1 Structured cleaningThe text of the track needs to be cleaned structurally to get the meaning of the track. For each waybill, its track will pass through many nodes, and the data type of each node is as follows: 1. waybill_no indicates the waybill number. The same waybill number may have multiple node records. 3.2 Is there really flight information in the trajectory?The working principle of the carrier network mentions that the carrier will produce in shifts. Can we find evidence of shift production from the trajectory results? Through analysis, we guess that the time when the same flow (for example, from sorting center A to sorting center B) leaves a sorting center (for example, leaving sorting center A) should be relatively concentrated. In real time, some simple clustering methods confirmed our conjecture. In the figure below, the horizontal axis represents the hours of delivery from the sorting center, and each point represents a shipping order in history. The vertical axis has no business meaning and is only for the convenience of display. The kmeans clustering algorithm is used to draw the above graph. The kmeans clustering algorithm requires the number of clusters to be specified. Therefore, it is necessary to use algorithms such as Knee/Elbow to detect the number of clusters. At the same time, it is sensitive to outliers, so DBSCAN is finally used in the implementation. 3.3 How to choose clustering parametersAlthough DBSCAN does not require the number of clusters to be specified, it does require the distance between points and the density of points to be specified. After repeated adjustments, the two core parameters are finally determined as follows: clustering = DBSCAN(eps=0.25, min_samples=max(5, int(x.size * 0.02)), metric=metric).fit(x_after_reshape) The eps is 0.25, which is 15 minutes. The point density is 5 and the maximum value is 2% of the total. 3.4 How to solve the problem of cross-dayFrom the cluster diagram above, we can see that the points of the same wave may appear across days, that is, some points may leave the distribution center at 23:50, while some points of the distribution center may be at 00:10. The Euclidean distance between these two points is relatively large, so we need to rewrite the distance metrics function. def metric ( x , y ) : 3.5 How are the lines connected in series?It is not enough to analyze the production shifts of the nodes and the shifts of the lines. They also need to be connected in series to obtain the shifts of the finished product lines, so that they can be applied in pre-sales or sales. Some simplifications are made in the processing here. On the one hand, the sorting waves of the sorting center cannot be identified, and on the other hand, the sorting waves of the sorting center do not need to be paid attention to. In practice, the process of connecting the finished product line shifts is as follows: The core code is as follows: for ( int i = 1 ; i < tmp .getResourceList ( ) .size ( ) ; ++ i ) { 3.6 How is the relationship between the fourth-level address and the delivery outlets established?From the application perspective, the input condition is the buyer's fourth-level address, but the end point of the carrier network is the delivery site, so it is necessary to establish a mapping relationship between the carrier's delivery site and the fourth-level address. The establishment of the mapping relationship is relatively simple. Among the sites responsible for delivering the fourth-level address in the past period of time, the one with the largest number of orders delivered to the address is selected. 4. Challenges of project implementationPart 3 is more like a theorist's endless talk. How to implement it in engineering? This includes the development of ODPS SQL, UDF and DDD. In short, it requires all kinds of skills. 4.1 How to perform simple machine learning in ODPSIn the process of shift analysis, the DBSCAN clustering algorithm is used. What if these algorithms are used on odps? In fact, the DBSCAN algorithm has been implemented in Python, and odps supports writing UDFs in Python. However, the current odps operating environment does not have DBSCAN-related packages installed, so it needs to be installed manually. For installation tutorials, refer to Alibaba Cloud's official documentation 4.2 Issues with Online ServiceThe above cleaning process needs to be run every day or at least once a week. The data of a time window in the past is selected for training to obtain the carrier's network, so that the changes in the carrier's network can be perceived in a timely manner. This means that the information of finished product lines, finished product line waves, and node waves will be updated regularly. In the process of online service, we directly store these data in redis. In order not to take up too much memory, some memory optimization is carried out by using hash data structure. Of course, one disadvantage of hash is that it is impossible to set a timeout for a field, which means that a field data of a certain key is actually expired data, but it will not be deleted, which will cause leakage, but this leakage can be solved by other technical means. 5. Progress and planningWe have built a third-party carrier network, with the accuracy of the first point prediction being around 65% and the accuracy of the last sorting prediction being around 85%. Future optimization points include: shift aggregation (for some routes with sparse data, shift aggregation is required), time decay (data cleaning requires selecting data from the past period of time, and data that is too old should be decayed to make its contribution to the results smaller), etc. We believe that the accuracy can be further improved. 6. References[1]. Knee/Elbow Point Detection [2]. arvkevi/kneed [3].https://datascience.stackexchange.com/questions/46106/kmeans-vs-dbscan [4]. https://redis.io/docs/management/optimization/memory-optimization/ [5]. User survey: This year’s 11.11 consumers are most concerned about “certainty”, and JD.com is the first choice for 80% of users - Xinhua Daily Telegraph |
>>: Distributed ID Solution Detailed Explanation
In 2019, Samsung and Apple were the first to intr...
Regardless of whether people are pessimistic or o...
According to the information disclosed by the 201...
"Carrier cloud" represented by China Te...
As data centers upgrade to 100Gbps at an accelera...
The emergence of 5G technology has the potential ...
Suddenly, 5G has truly come into our lives. With ...
V5.NET mainly provides independent server rental ...
HostDare is a foreign VPS hosting company founded...
At the RSA Conference, HPE Aruba Networking annou...
By Jin Gang, Chief of Information Department, Thi...
Imported from abroad, confirmed locally, the sudd...
From November 7 to 9, the fifth World Internet Co...
Friendhosting has released a new promotional offe...
Data center companies often encounter hardware an...