Improving time efficiency and accuracy: Carrier routing network mining

Improving time efficiency and accuracy: Carrier routing network mining

1. Introduction

The fulfillment time is the lifeline of e-commerce and is directly related to the user's consumption experience. According to Xinhuanet's [5] 2022 Double Eleven report, 37.4% of respondents hope for next-day delivery, and 29.91% hope for same-day delivery. Compared with other items, respondents have higher requirements for the logistics timeliness of mobile phones, computers, and digital products, and prefer to receive the goods on the same day or within 1-2 days.

In the Dewu fulfillment scenario, the main stages include production in the warehouse and delivery by third-party carriers. When the user pays, Dewu will give the user a promised time limit based on the warehouse's production situation and transportation resources.

1.1 Why do we need to predict the carrier’s route efficiency?

During the fulfillment process, Dewu needs to monitor the flow of orders and promptly discover orders that may be overdue (compared with the time limit promised to the user). This includes monitoring of warehouse production and third-party distribution. In the actual process, we found that when the distribution node changes, the forecast given by the carrier is conservative. In the following example, the carrier only gave a more accurate estimated delivery time when it arrived at the sales department, so it is easy to misreport when using the carrier's estimated delivery time at the sorting center.


Outlets

Departure time

Carrier Estimated Delivery Time

xxx outlets

2022-12-02 07:05:47

2022-12-10 22:00:00

A Cargo Collection and Sorting Center

2022-12-02 14:09:19

2022-12-10 22:00:00

B Cargo Collection and Sorting Center

2022-12-04 07:42:03

2022-12-10 22:00:00

C Bulk Cargo Sorting Center

2022-12-05 04:58:28

2022-12-09 22:00:00

D Sales Department

2022-12-05 08:47:58

2022-12-05 15:00:00

The figure below is the relaxation index of the estimated delivery time returned by the carrier interface. It can be seen that the promised time is more accurate when approaching the destination.

2. How does the carrier network work?

Before building a carrier network, you need to understand how the carrier network works. The following is a schematic diagram of the delivery from point A to point E, which is divided into the following contents:

(1) Nodes, including collection and delivery outlets and sorting centers.

(2) Lines, including trunk lines and branch lines. For example, the line from a network point to a sorting center is a branch line, and the line from a sorting center to another sorting center is a trunk line.

(3) Shifts: In order to balance costs and timeliness, carriers will set up production shifts. After arriving at the sorting center, the goods need to be sorted according to the destination. When a certain amount of goods arrives, they will depart from the sorting center and head to the next node. When setting shifts, carriers will consider the number of orders, as well as the cost and timeliness of transportation.

In the above picture: Taking purple as an example, at outlet A, the order is closed at 8 am, that is, the goods handed over to the carrier before 8 am will be sealed at around 8:20 am, and then depart from the outlet to sorting center B. The time to arrive at sorting center B is 11:40, and at this time it catches up with the shift with a cut-off time of 12 o'clock at sorting center B. Sorting center B will complete the sorting at 12:30 and go to the next sorting center, and so on to complete the entire delivery process.

When building a carrier's network, modeling is required. In addition to nodes, routes, and shifts, the core also includes the following two models:

(5) Finished product line, that is, from point A to point E through all nodes. In the above figure: point A-sorting center B-sorting center C-sorting center D-point E constitutes a finished product line.

(6) Finished product line wave: Because there are waves at the nodes, there are also waves at the finished product line. In fact, the wave number of the finished product line is the same as the wave number of the first node.

3. How to build a carrier network

After understanding how the carrier network works, you need to start building the carrier network. The carrier will push the tracking information to Dewu, and the content is similar to the following text.

 [
{
"code" : "180" ,
"desc" : "Express delivery arrived at [xxx Sales Department]" ,
"location" : {
"city" : "xxx city" ,
"district" : "xxx County" ,
"point" : {
"latitude" : xxx ,
"longitude" : xxx
} ,
"province" : "xxx"
} ,
"node" : "Received" ,
"opeTitle" : "Site Packing" ,
"time" : "2022-09-04 17:29:27"
} ,
{
"code" : "xxx" ,
"desc" : "Receive express delivery" ,
"location" : {
"city" : "xxx" ,
"district" : "xxx" ,
"point" : {
"latitude" : 28.65 ,
"longitude" : 120.07
} ,
"province" : "xx"
} ,
"node" : "Received" ,
"opeTitle" : "The deliveryman completes the collection" ,
"time" : "2022-09-04 17:29:27"
}
]

3.1 Structured cleaning

The text of the track needs to be cleaned structurally to get the meaning of the track. For each waybill, its track will pass through many nodes, and the data type of each node is as follows:

 1. waybill_no indicates the waybill number. The same waybill number may have multiple node records.
2. station_index indicates the index of the current node
3. station_enum indicates the type of this node, whether it is a sorting center or a delivery point
4. station_name indicates the name of the node, such as xxx Sales Department in the above example
5. station_status indicates the status of this node, such as whether it is entering or leaving
6. operate_time indicates the operation time of the current node

3.2 Is there really flight information in the trajectory?

The working principle of the carrier network mentions that the carrier will produce in shifts. Can we find evidence of shift production from the trajectory results? Through analysis, we guess that the time when the same flow (for example, from sorting center A to sorting center B) leaves a sorting center (for example, leaving sorting center A) should be relatively concentrated.

In real time, some simple clustering methods confirmed our conjecture. In the figure below, the horizontal axis represents the hours of delivery from the sorting center, and each point represents a shipping order in history. The vertical axis has no business meaning and is only for the convenience of display.

The kmeans clustering algorithm is used to draw the above graph. The kmeans clustering algorithm requires the number of clusters to be specified. Therefore, it is necessary to use algorithms such as Knee/Elbow to detect the number of clusters. At the same time, it is sensitive to outliers, so DBSCAN is finally used in the implementation.

3.3 How to choose clustering parameters

Although DBSCAN does not require the number of clusters to be specified, it does require the distance between points and the density of points to be specified. After repeated adjustments, the two core parameters are finally determined as follows:

clustering = DBSCAN(eps=0.25, min_samples=max(5, int(x.size * 0.02)), metric=metric).fit(x_after_reshape)

The eps is 0.25, which is 15 minutes. The point density is 5 and the maximum value is 2% of the total.

3.4 How to solve the problem of cross-day

From the cluster diagram above, we can see that the points of the same wave may appear across days, that is, some points may leave the distribution center at 23:50, while some points of the distribution center may be at 00:10. The Euclidean distance between these two points is relatively large, so we need to rewrite the distance metrics function.

 def metric ( x , y ) :
ret = abs ( x [ 0 ] - y [ 0 ] )
if ret > 12 :
ret = abs ( 24 - ret )
return ret

3.5 How are the lines connected in series?

It is not enough to analyze the production shifts of the nodes and the shifts of the lines. They also need to be connected in series to obtain the shifts of the finished product lines, so that they can be applied in pre-sales or sales. Some simplifications are made in the processing here. On the one hand, the sorting waves of the sorting center cannot be identified, and on the other hand, the sorting waves of the sorting center do not need to be paid attention to.

In practice, the process of connecting the finished product line shifts is as follows:

The core code is as follows:

 for ( int i = 1 ; i < tmp .getResourceList ( ) .size ( ) ; ++ i ) {
List <NetworkResourceWaveDTO>
next = tmp .getResourceList ( ) .get ( i )
.getWaveList ( ) ;
next .sort ( Comparator .comparing ( NetworkResourceWaveDTO :: getOffTime ) ) ;
boolean match = false ;
for ( NetworkResourceWaveDTO nextWave : next ) {
if ( nextWave .getOffTime ( ) > p .getEndTime ( ) ) {
match = true ;
duration += nextWave .getDurationDay ( ) ;
p = nextWave ;
break ;
}
}
if ( ! match ) {
duration += next .get ( 0 ) .getDurationDay ( ) + 1 ;
p = next .get ( 0 ) ;
}
productLineWave .add ( p ) ;
}

3.6 How is the relationship between the fourth-level address and the delivery outlets established?

From the application perspective, the input condition is the buyer's fourth-level address, but the end point of the carrier network is the delivery site, so it is necessary to establish a mapping relationship between the carrier's delivery site and the fourth-level address. The establishment of the mapping relationship is relatively simple. Among the sites responsible for delivering the fourth-level address in the past period of time, the one with the largest number of orders delivered to the address is selected.

4. Challenges of project implementation

Part 3 is more like a theorist's endless talk. How to implement it in engineering? This includes the development of ODPS SQL, UDF and DDD. In short, it requires all kinds of skills.

4.1 How to perform simple machine learning in ODPS

In the process of shift analysis, the DBSCAN clustering algorithm is used. What if these algorithms are used on odps? In fact, the DBSCAN algorithm has been implemented in Python, and odps supports writing UDFs in Python. However, the current odps operating environment does not have DBSCAN-related packages installed, so it needs to be installed manually. For installation tutorials, refer to Alibaba Cloud's official documentation

4.2 Issues with Online Service

The above cleaning process needs to be run every day or at least once a week. The data of a time window in the past is selected for training to obtain the carrier's network, so that the changes in the carrier's network can be perceived in a timely manner. This means that the information of finished product lines, finished product line waves, and node waves will be updated regularly. In the process of online service, we directly store these data in redis. In order not to take up too much memory, some memory optimization is carried out by using hash data structure. Of course, one disadvantage of hash is that it is impossible to set a timeout for a field, which means that a field data of a certain key is actually expired data, but it will not be deleted, which will cause leakage, but this leakage can be solved by other technical means.

5. Progress and planning

We have built a third-party carrier network, with the accuracy of the first point prediction being around 65% and the accuracy of the last sorting prediction being around 85%. Future optimization points include: shift aggregation (for some routes with sparse data, shift aggregation is required), time decay (data cleaning requires selecting data from the past period of time, and data that is too old should be decayed to make its contribution to the results smaller), etc. We believe that the accuracy can be further improved.

6. References

[1]. Knee/Elbow Point Detection

[2]. arvkevi/kneed

[3].https://datascience.stackexchange.com/questions/46106/kmeans-vs-dbscan

[4]. https://redis.io/docs/management/optimization/memory-optimization/

[5]. User survey: This year’s 11.11 consumers are most concerned about “certainty”, and JD.com is the first choice for 80% of users - Xinhua Daily Telegraph

<<:  To lead the high-quality development of information technology innovation, the Shenzhou Cloud Technology National Tour is officially launched!

>>:  Distributed ID Solution Detailed Explanation

Recommend

Wi-Fi 7 is on the way, how powerful is it?

In 2019, Samsung and Apple were the first to intr...

5G+AI win-win symbiosis, artificial intelligence has great potential!

Regardless of whether people are pessimistic or o...

How Should Operators Carry Out Cross-industry Integration?

According to the information disclosed by the 201...

The Why and How of a Two-Tier Network Monitoring Topology

As data centers upgrade to 100Gbps at an accelera...

Investigating the environmental and social impacts of 5G technology

The emergence of 5G technology has the potential ...

Four leading geese: the starting point for large-scale commercial use of 5G to B

Suddenly, 5G has truly come into our lives. With ...

HostDare: 35% off KVM in Los Angeles, starting at $25.99/year

HostDare is a foreign VPS hosting company founded...

Why do many colleagues recommend Ether IPL? Until this hospital expansion...

By Jin Gang, Chief of Information Department, Thi...

Ruijie Cloud Desktop supports Beijing's COVID-19 fight

Imported from abroad, confirmed locally, the sudd...

How to reduce the incidence of human error in data centers

Data center companies often encounter hardware an...