[51CTO.com original article] On July 21-22, 2017, the WOTI2017 Global Innovation Technology Summit with the theme of artificial intelligence, hosted by 51CTO, was grandly held at the Beijing Renaissance Hotel. During the summit, 30+ AI stars, dozens of wonderful speeches and roundtable forums on the theme of artificial intelligence were slowly unveiled. In addition to the wonderful speeches in the venue, there were also hands-on laboratories and technology experience areas built specifically for AI enthusiasts outside the venue, all of which made this conference full of highlights.
On the afternoon of July 21, at the main venue of WOTI2017, Zhang Hao, Vice President of Ele.me, gave a wonderful speech entitled "Food Delivery Drivers Sent by AI". The following is the transcript of the speech, let's take a sneak peek! [[197646]] Zhang Hao, Vice President of Ele.me Good afternoon, everyone! I am very happy to have the opportunity to share with you, and I also thank WOTI for providing this sharing platform. The industry has moved from the earliest takeaways to local life. I will start my speech below. My sharing today is divided into three stages: First, a brief introduction to Ele.me. Second, talk about the application of AI in Ele.me. The third is my theme, four cases, to share with you the application examples of operations optimization and machine learning.
When you open the app, the first thing you see is the trading platform, just like what we see on Taobao and JD.com, except that you don’t search for clothes, but mainly for trading, search, recommendation, food mining, etc. Today, I want to talk to you more about the second part, the local logistics network. Logistics has been around for many years, and this industry has applied everything from operations research to statistical optimization to machine learning. These three together are of great help to our modern logistics system. We now have 260 million users, 1.3 million B-side merchants nationwide, and more importantly, our delivery staff. The number of delivery staff registered on our platform has reached 3 million, covering 2,000 cities across the country.
In the Internet industry, Taobao, Ctrip, Didi, and Ele.me represent four directions, namely clothing, food, housing, and transportation. I would like to lay the groundwork for this industry because of its characteristics. What are the differences and unique challenges compared to the other three companies? First, Taobao is purely based on users and merchants. When you place an order offline, you go to the open platform offline. The timeliness is usually calculated in days, so its challenge is to maximize the conversion rate and find what you want in a very short time. Ctrip is also a trading platform that is purely based on online users and merchants. There are basically no offline orders. Whether it is booking a hotel or something else, it is done online. Relatively speaking, Didi and Ele.me are two relatively large aspects of O2O. The difference here is that Didi's recommended search is not particularly important. It will not go up to search for this car or not like to search for another car. It is more recommended to you through the system's optimal results. This is a relatively large difference. Another relatively large difference is the time requirement. Generally, we require a few minutes for the car to arrive, but after getting on the car, it is the driver and the passenger's problem. The platform will not be responsible for this. Finally, there is Ele.me. Its online business is mainly composed of users and merchants, but the difference is that its offline business is more complicated. What is more important is timeliness, such as on-time delivery, 30 minutes, or 40 minutes. Once this time is exceeded, the platform will compensate the user. This is a big challenge for me.
In the local logistics industry, or instant delivery, it is better to say instant delivery because the logistics are so huge. We hope that 30 minutes after you place an order, the entire transaction is not only delivered to you, but we hope to control the entire process within 30 minutes. So there are several major differences here. For pure online transactions, the first thing is operational optimization, the second is learning, and more importantly, big data. We use offline calculations and real-time algorithms.
This is a rough overview. The algorithms in our business are divided into three parts. The first is based on LBS. We search for the local delivery range based on your location, so LBS is our most important service. The second is machine learning and optimization. For us, the site grid delivery range is the most important. Only with these can the delivery have the most reasonable results. The second is site selection. In addition to restaurants, many peers are also running independent restaurants. Franchisees are building brands in our kitchens, so site selection is very critical.
The rest is easy. We recommend search, and we are a trading platform. Next is supply and demand forecasting, order and waybill forecasting. Finally, our goods are delivered to you by people, whether they are walking or riding electric bikes. We need to make plans one quarter or even half a year in advance. We need to infer based on historical data and algorithms. For example, it often rains in Beijing in summer, so how much transportation capacity and waybill are needed, so supply and demand forecasting and waybill forecasting are very important. User-merchant stratification is a refined operation that must be achieved. The following intelligent subsidies and route planning are very important in actual logistics. Dynamic pricing means that when we match orders, we must adjust and control the flow according to the current transportation capacity and plan the route at the same time. The estimation of meal preparation time and delivery time are the key points that I will talk about in detail in the algorithm module below, so I will skip them first. I will divide it into two parts. The first part is about the application of machine learning in the Ele.me business. The second part is about the application of operations optimization and machine learning in Ele.me based on machine learning.
The first is the meal delivery time estimate. When you order a meal, we hope it can be delivered within 30 minutes, which includes the restaurant's delivery time. Therefore, the restaurant's delivery time estimate is the most difficult part of our delivery process, because we have no control over it and we don't know when it will be ready. This is the difference from Didi's departure time estimate. As you can see in the picture on the left of Didi's scenario, you usually hope that a car will arrive within one to two kilometers at that place, but this platform doesn't need to worry about it.
The right side is the Ele.me scenario. The restaurant meal preparation time is affected by many factors, including the factors of dine-in, the type of food, the cooking method, and the size of the order. In reality, the restaurant will not tell you that the food is ready after it is ready. Machine learning methods must be used to predict it. How do we do it? We have gone through three versions from the beginning to now. Of course, anyone who does machine learning will start with a linear model. There is no doubt that there is nothing special to say.
In terms of error, we adjusted a lot of features at the time, and the average was about 6 minutes. Later, we used a nonlinear model, and the error was about 240 seconds. The simplest statistical features include the time it took for this restaurant to serve food in the past, how many people ate there, whether it was Friday or Sunday, and the types of dishes in this restaurant. For example, if it is Yunnan cuisine, it will take longer to serve its dishes. If it is McDonald's, it will definitely be served in 30 seconds. These features are all included. Another feature we use is the weather, especially the weather. These will affect the error. After we achieved 240 seconds, it was difficult to improve it for a long time.
Lastly, we started using deep learning this year, because we can treat the food delivery method as a time series, using RNN and LSTN models. Now the error is 3 minutes. The average time does not mean anything, because when the restaurant is overwhelmed or in special circumstances, the error will be larger. This is the model we use, and it will be different in practice. We use a two-layer RNN model, each layer has about 1,500 (English), and uses 65% dropout. The picture in the upper right corner is our formula, which will be randomly captured according to the probability at that time.
The second is the travel time estimation, which is relatively easier. When the rider picks up the meal, the whole time from the restaurant to the customer is the travel time estimation. Because we have GPS sampling, we know how long he spends in the process. What is the challenge here? The challenge here is that data is very difficult to collect. Because we are different from other travel industries, one-third to one-half of our entire process, especially in big cities, is in the building. During rush hour and working hours, our white-collar workers are also in the building. GPS positioning in the building is very inaccurate.
So we cooperate with other companies to use WiFi to improve the accuracy of positioning. In terms of meal delivery time, the more difficult thing is walking. Unlike Didi and Uber, we can drive on the road. Sometimes we walk, wait for the elevator to go up and down the stairs, and other methods. In addition, the traffic in the building is very complicated. The figure on the left is a clustering algorithm figure. You can see these points. These points are the GPS locations we received. The error here is very large. If all GPS locations are used, point O and point D, in the transportation industry, point O is the starting point and point D is the end point.
We first have a clustering to remove those with large errors, and then after clustering, we get the GPS location to the POI and match it to a point. We do the same thing to remove most of the noise. The second is trajectory clustering. When you know the starting point and end point of the trip, you must know the trajectory. The picture on the right happens to be a river, which is very annoying. We often don’t know and can’t predict how the rider will go. Some may take a small road, and some may not know to take the main road, so it is difficult to predict. There are also some experiences in trajectory clustering.
We removed the noise points of the GPS location to make the trajectory more accurate. The next one is about the order opening scenario and the problem of combination optimization. You will not take only one order from a point, but many orders. When you decide who to assign the order to, you have to rely on the service. There are also travel time estimates, bad weather, various activities, and holidays. If it is a weekend or holiday, these will also have an impact.
Second, the combination of machine learning and operations optimization algorithms in the application scenario of Ele.me is the most important intelligent order splitting. What is intelligent order splitting? Before intelligent order splitting, order splitting was done by people. The entire order splitting process was based on a local network, so the order would not be split throughout the process. In the past, it was done by people. When you may only have dozens or hundreds of orders a day, there is a map that can see who is closer to the order and solve the problem quickly. However, when the volume increases, people are unreliable and cannot achieve the best. This is very difficult when the volume is large. Compared with other recommendation systems, it has more roles. In addition to merchants, there are riders. Riders also have differences between users and teams, and the complexity is also higher.
You can look at the picture on the right. A rider usually delivers five to ten meals at the same time, so it is an exponential problem when splitting orders. We need to know if he already has three to five orders, and whether to give him another five orders. The requirements for timeliness and accuracy are relatively high. This is the first version we made. When we got this problem, we naturally thought that this is a very classic path planning problem, because you have to go through so many orders from this point, and finally return to the starting point, because usually everyone will gather in one place. This is a very traditional vehicle path planning problem. The input here is orders, riders, rider capacity, and costs. The output is the match between orders and riders and the walking route. The optimization goal is to minimize time or driving distance, constraints, such as the number of orders carried by riders, the number of riders, the latest arrival time, etc. When you place an order, we say 40 minutes to arrive. This is the latest arrival time. We hope that 99% of them can arrive within 40 minutes.
The VRP problem is that after you arrive at the destination, you still need to pull your things back. There is the TSPB problem on the left, and the TSPTW problem. When you have multiple vehicles, it is MTSP. On this basis, the VRP variants are evolved, such as the maximum distance that cannot exceed a certain value. These are all variants of the problem, and they are not new to us at all.
One of the methods we use is the simulated annealing algorithm. Let's take a look at the picture on the left and let me explain the background. This is a very traditional operations research optimization problem. There are many methods. When we have a large amount of data, such as dynamic optimization, we all know that it is difficult to find the optimal solution for VRP problems or general combinatorial problems, so more people use (English). This is a random iterative algorithm. There are other algorithms. There are a lot of literature here. If you search, there are thousands of articles and many books. If you are interested, you can take a look. One of the algorithms we use is the simulated annealing algorithm, which is a random optimal algorithm.
Look at the figure on the right. First, some solutions are randomly generated. We define the objective function. The most important thing here is to randomly find the optimal solution for the current solution, just like physical annealing. It doesn’t matter if there is no optimal solution. There is an upper limit on the number. This is a probabilistic algorithm, which may not necessarily find the optimal one. Therefore, the following judgment is whether the number of iterations has been reached, because we may solve it infinitely and terminate after reaching it. This is the process of the entire algorithm. Now this simulation algorithm has matured. More challenges are how to make distributed computing more efficient when the scale is relatively large. There are also many literatures to read about this. There are also many solutions used, such as large-scale simulation, production scheduling, control engineering, machine learning, neural networks, signal control, etc.
Our solution now uses 2.0, which is based on the optimization problem of the cost function. What are the challenges encountered by the VRP solution? I just talked about the problem of computational complexity, which is only one aspect. What is more important is that for each (English), the entire result is completely illogical, which is the biggest difficulty we encounter. The inaccuracy of time estimation causes the estimated time from point A to point B to be inaccurate. The results at this time are often very poor, especially in small towns. Big cities are okay because there is enough sample size, such as Beijing, which can roughly estimate the time. But small cities are difficult. Many times POIs are at the town government instead of the restaurant, so our VRP is completely useless in this case. In addition, there are basic food delivery habits. Many times, the riders will not deliver according to your recommendations. They will deliver this first and that later, which has a greater impact.
So our current solution is this. Simply put, this is a cost matrix. We have N order packages and N riders. This is a two-dimensional matrix. We hope to assign each order package to one person. The middle is the cost matrix. We hope to output the optimal match between the order package and the rider. The cost definition and calculation method here have also gone through several iterations. At the beginning, there is no doubt that when you decide to assign an order to a rider, the cost is that other orders will not be affected. You don’t want him to run too long, because an electric car running two kilometers to deliver one order is definitely the worst choice. So we started with the rule method, through a lot of offline analysis, such as about 20 features, how much weight each of these features has, after calculating it, the next is the optimization problem, and the final result is the matrix on the right. An order is assigned to this rider. What algorithm is used for matching? We use the more mature optimal matching KM algorithm. The KM algorithm seeks the maximum weight matching under complete matching. The KM algorithm also has many processes, which is close to the initial VRP solution and is continuously optimized. At the beginning, the algorithm process initializes the value of the feasible top mark and uses the Hungarian algorithm to find a complete match.
Of course, we still have many shortcomings. We have roughly solved the basic problems, but later we realized that if an order comes in and it is immediately assigned to a rider to pick up, this is often not the best choice. If you wait two more minutes, there will be more orders on the same route for the same restaurant, so there is water storage. Maybe wait two minutes, and wait for more orders. After water storage, we packaged the two orders out. This is version 2.1. There are still many shortcomings when packaging. Any machine learning must use rules to determine what kind of packages can be put together and picked up and delivered at the same time. If the GPS is inaccurate, the result of picking up and delivering at the same time is that the order cannot be completed, such as Building A in this building and Building B in this building. Therefore, in version 2.2, the packaging rules are removed. We use machine learning methods to learn what kind of orders are divided into what kind of packages and models during manual scheduling, and use order similarity models, but this is not enough.
Because we found in the process of promotion that there are different habits in different places, and this habit will have a great impact on the satisfaction of riders and the difficulty of promotion. So we have come up with a new model called order and rider matching. We decide what is good. What we just talked about is more about our offline calculations. It is impossible to use a formula to match the order distribution logic of thousands of stations across the country. This rule will definitely not work well. So we made an order and rider matching model, took out the data according to the local station, and we studied and learned the manual scheduling habits. We used machine learning to grasp this feature and truly localize it. This is our 2.3 standard. Each station and each region has a unique model. We are working on version 3.0, enhanced learning, and everyone is familiar with this.
It was difficult to improve the model at 2.3, but in any case, the model update must be an offline process, which may take a week or a day. We hope to make it faster, but how can we make it faster? Reinforcement learning is now popular. We use online feedback to check whether the rider likes the order. If the rider doesn't like the order, he will change the order. We have captured this signal before, but we have compensated it offline and learned it automatically online through reinforcement learning, which is still under development.
The whole algorithm process is like this, but the biggest problem is our basic data. I mentioned POI points repeatedly just now. No matter how the algorithm divides, it sees that we tell it to go from this point to that point. The accuracy of POI has always been the biggest challenge. For any company that provides LBS services, POI is equivalent to an address library, just like why Google map is valuable, because its POI is well done. The navigation algorithm is optimized by you. Algorithms alone are useless. The focus is on the accuracy of POI. Of course, Google is not only in the United States, but also in the world. This is its wealth, that is, data. We are also constantly accumulating POI. ETA, meal delivery time prediction, rider model, restaurant portrait, all of these need long-term improvement. The rider model talks about the rider's habits, his ability, and his familiarity with the route, including, for example, whether he can deliver five orders on time, when the order can be assigned to a certain person, and he is more reliable. Restaurant portrait is also very important.
The last one is the optimal restaurant location problem. This is different from Cainiao's warehousing, but the meaning is the same. For example, Cainiao decides to build a main station network across the country. Which area will the warehouse be responsible for? How to design the route from warehouse A to warehouse B? This is a classic FACILITY LOCATION PROBLEM. In the city, we hope that restaurants are staggered to cover the largest number of users. The formula on the left shows that the cost between the two points is also the smallest. This is the business we are still operating now, called Future Restaurant. We hope to be able to choose the cheapest location to maximize the potential GMV growth.
Finally, let me summarize. Many friends don’t know that the local life scene algorithm is actually very large. There may be more food delivery, but imagine it is not food delivery, but operations optimization, especially because it is a local life circle problem that leads to many irregular things, so the challenges are very, very many, no less than any Internet company. The first problem is the integrity and accuracy of the basic data. No one can help us with this because the industry characteristics are to rely on ourselves and collect these data through people for a long time. Restaurants, riders, and even the difficulty of elevators have an impact on our algorithms.
There is also the understanding of human behavior. When we split the order, we think this is the best, but in actual operation we encounter a lot of resistance. Others don’t like to do this. For example, chasing orders, I took an order on the fifth floor, and just went downstairs to split the same restaurant order. The rider didn’t like it. He didn’t want to pick up the order after finally coming down. We found that the total time will be less if you go back at this time, but the rider doesn’t like it. He thinks it’s troublesome to run up, and often has to wait for the elevator, or there are all kinds of behaviors he doesn’t like. These are what we realized later. Many times, we need to combine with the business to have a complete solution. The last one is the combination of optimization algorithms and machine learning. In logistics scenarios, it is more about cost optimization, and machine learning is more about learning factors. Therefore, only by combining optimization algorithms and machine learning can we perfectly solve these problems for us. This is my content sharing, thank you!
51CTO reporters will continue to bring you exciting reports from the WOTI2017 Global Innovation Technology Summit, so stay tuned!
[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |