WOT Li Jian: The evolution of the eleme container platform

WOT Li Jian: The evolution of the eleme container platform

[51CTO.com original article] The Global Software and Operation Technology Summit hosted by 51CTO was held in Beijing from May 18 to 19, 2018. Technical elites from global companies gathered in Beijing to discuss the forefront of software technology and explore new boundaries of operation technology. In this conference, in addition to the star-studded main forum, the 12 sub-forums were also unique. At the "Microservice Architecture Design" sub-forum on the afternoon of the 19th, Mr. Li Jian, a senior engineer from the Ele.me Computing Delivery Department, delivered a wonderful speech.

As the development leader of multiple container-based cloud computing projects within Ele.me, Li Jian has many years of rich experience in container system construction and has promoted the containerization process of the Ele.me platform. He is particularly good at implementing the agility and standardization of containers at the enterprise level. In order to cope with the opportunities and challenges brought by continuous computing in the era of the Internet of Everything, Li Jian is committed to creating more convenient computing services and promoting the in-depth integration of multiple computing models such as high-performance computing, big data and cloud computing.


Li Jian, senior engineer of Ele.me computing power delivery department, gave a speech

The theme of Li Jian's speech this time is "eleme container platform", which is a management system based on containers. Speaking of management systems, Li Jian believes that although there are many public clouds now, a scenario that everyone has to face is hybrid cloud. His speech is mainly divided into four parts, namely computing power delivery, technology selection, computing power takeaway and expansion solutions based on kubernetes. He admitted that in the initial rapid growth of the business, the scale of resources grew very rapidly, resulting in a large number of server types and heavy management tasks. Because we need to adapt to the business, the delivery requirements are also diverse.

1. Computing power delivery

Li Jian believes that computing power delivery, as an abstract concept, is actually an output that we abstract from physical resources, and an output for developers. Ele.me has a large amount of physical resources and information resources, but limited manpower. We cannot expand indefinitely, so we want to implement a way to standardize it, which can greatly reduce costs and make it easier to manage more machines. Based on the above, our computing delivery department came into being. We believe that all delivery behaviors are applications, so the key issue facing Ele.me has become how to deliver this application and how to manage this application.

At this point, we have to mention container technology. Container technology has been around for a long time, and Docker has made a great contribution to containers. It is truly application-oriented, portable, and cross-platform. In particular, its packaging method makes all services a unified packaging method. This is a standard for applications. Based on this standard, we can run this application on any platform we want. Only in this way can we further achieve the goals of reducing labor costs and improving resource utilization, whether it is automated operation and maintenance, AIOPS, or big data, etc.

Specifically, we will deliver three things. The first is the number of users. That is, you give me the application, and I will help you deploy it and make the service run. The second is the one-click delivery of standard services. For example, big data requires an environment, or a certain department needs an environment. An environment contains many services, such as ABC. These services are isolated from other services, and a connection must be established between services, and they must be reproducible. The third is the delivery of servers, which is actually also allocated. Development requires a server, and at this time we can very well deliver computing power as a server.

2. Technology Selection

There are many technology options now, and kubernetes is a popular one. Choosing kubernetes means establishing a standard, and the benefit it brings is cost reduction. When I use the same things as others, my costs are reduced, and the advantages of solving problems are self-evident. If I choose a very popular project, this project, first of all, for my small and medium-sized company, when I encounter a problem, I can at least look up the problem on Google, and I have a basis to rely on. Another scenario that is considered is whether what you need fits your selection, which is also a very worthy issue. In addition, scalability, ecological development, and the ease of large companies to establish an ecosystem are also forward-looking.

3. Computing power outsourcing

Li Jian talked about computing power delivery with great interest. According to him, when Ele.me does delivery, many things are easy to combine with food. For example, our conference room may be called "Durian Pastry". What we need for computing power delivery is a scene, such as going to a restaurant for dinner. This is a scene. Another example is a development environment, which can be destroyed after using this environment, or I can order a set meal, a set of "environment", just like when we order takeout and buy a set meal, it contains a lot of content.

At the application level, Li Jian described it as being very similar to our appetite, including various types of services. So there is such a box, which can actually be abstracted into a table of food or a takeaway box. Here we can see that we call each service through the domain. And this box is replicable. No matter how many boxes are created from this template, when calling different services in each box, its domain name is unique.

For example, when calling the second service from the first service, the domain name of the second service is JOB. When the first service calls the second service, it can be called from B, which reduces the complexity for developers. When we enable a service, we automatically generate a unique network identifier for it, which may be an IP or domain name. If I want to change my configuration, I don’t need to change its configuration. I just need to pull up this environment and the application can run. Because of my configuration, the network identifiers of the corresponding services are the same, they are all called X.

4. Expansion solution based on Kubernetes

The last part of Li Jian's speech was "Expansion Solution Based on Kubernetes". Its specific implementation is to use Kubernetes as the underlying container engine. It can be seen that in an Internal, including domainl, we can have a copy of this service in pod, and make a replica balance. There may be dependencies between these services. For example, service A depends on service B. If service B goes down, service A needs to do some processing. Service A depends on service B, and there will be a startup number. We also did the same thing in the box.

Of course, in terms of technological development, many people often think that there should be no dependencies in new technologies. However, the problem we are facing now does exist dependencies. In this scenario, our service business must be compatible with some current projects or their habits in order to promote standardization.

Some services are started and stopped, which is initialization. For example, after some services are completed, another pod will be called for initialization. There are also some public services that need to transfer some data. We can convert external services to external identifiers through internal identifiers. Our internal identifiers are actually always unchanged. External relationships are linked through service discovery or mechanisms, so there is no need to consider configuration changes or service discovery and other issues.

This is the simplest way to abstract our services through takeout. Of course, we will consider the scale of our services and what problems will occur after the scale is expanded. Kubernetes relies on etcd. We know that etcd cannot support such a large scale in the kubelet scenario, but we have some requirements for stability. So we can only split it and split it as much as possible.

If we artificially split it into three or four kubelet clusters, if it is split too finely, resource utilization will be reduced, and some clusters may be starved to death, while others may be full to death. We hope to make these services, kubelets, different etcd clusters, have a way to allow services to drift and be scheduled between clusters. This solves both the problem of resource efficiency and the problem of reliability.

From the structure of the above figure, except for the yellow part, the other parts are some of the original components of kubelet. We have developed a service similar to the API Server of kubelet. According to our original idea, for example, after I complete the scheduling, we divide it into two levels of scheduling. We thought that we could just transfer the resources to cluster A or cluster B, but later we found that it was actually difficult to do. We added a small version to the docker we are currently using. The main problem with the small version is that when the log volume is very frequent (millisecond level), it will be out of order, making it difficult for the business department to investigate. Therefore, we added a sequence number in docker.

Li Jian said frankly that our company's software environment actually grew up by itself. We can't change the existing content because of an open source software. We hope to make the open source software adapt to the current software status in some way. This is some of the monitoring of Docker. For example, we found some problems or bugs through monitoring.

In addition, if we encounter some problems in container management, such as containerizing traditional businesses, we will make improvements. Everything is in the container. The internal process is customized by us. The internal process is to switch some directories and environment variables managed by the technical layer. The other is a macro replacement for the basic configuration file. When our service is running, some services are migrated. They were previously configured, but now they are changed to containers. In the environment variables, reading this configuration from the environment variables may incur some costs. So we provide a way to write the variables in it, and the configuration will be automatically replaced according to the environment variables of the container. Especially in the container environment, if the IP of the container is not clear before it is started, the problem must be solved in the internal process.

The speeches of the speakers at this WOT Summit are compiled and edited by 51CTO. If you want to know more, please log in to WWW.51CTO.COM to view them.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  Five reasons why the Internet of Things needs its own network

>>:  Huawei Cloud builds EI intelligent body to promote the evolution of "inclusive AI"

Recommend

The main tasks of 5G in the 13th Five-Year Plan are determined

[[181279]] Recently, the Ministry of Science and ...

Kerlink and Radio Bridge provide LoRaWAN solutions for private IoT networks

According to recent announcements, Kerlink and Ra...

A thorough investigation of the history behind Huawei's high-quality Wi-Fi ONTs

[51CTO.com original article] Only after careful c...

...

8 Software-Based Network Trends for 2019

As networks become increasingly software-based, l...

Novos: €8/month KVM-2GB/40G NVMe+1TB/25TB/Belgium

According to information from LEB, Novos.be is a ...

Which 5G core patents cannot be circumvented by others? Huawei responds

According to the Huawei Voice Forum on June 29, o...

The SD-WAN track has changed. When will the dragon trainer appear?

In an environment where cloud computing, mobile a...