High-quality content and customized services enhance the core competitiveness of enterprises Affected by the epidemic in 2020, under the slogan of "suspending classes but not learning", the scale of the online education market has increased rapidly, reaching 485.8 billion yuan. After the rapid development of the online education industry in the past few years, the market has become relatively mature, and users have also put forward different demands for different types of online education institutions. Therefore, traffic alone can no longer win loyal users. But for the education industry, the core competitiveness is still high-quality content and services. Only with high-quality course content, personalized plans based on customer learning habits and foundations, high-quality product experience and stability, combined with higher business operation efficiency, can enterprises win long-term development. Looking at the entire online education industry, in the constant adjustment, the companies that finally survive must also return to the essence of education and win long-term development with high-quality products, content and services. Combined with artificial intelligence, the characteristic teaching is unique After further adjustments in the industry, companies in the online education sector will gradually shift their focus from incremental growth to content construction. However, in the overall environment, the syllabus is the same and the teaching methods are very different. Although the courses are different, they are still not amazing, and most companies cannot rely on content to stand out. Product experience is the key, but improving system stability is a challenge With the rapid development of Liulishuo's business, the number of users has increased significantly, from the initial several million users to over 200 million. The changes in data traffic during peak and low periods, business complexity and analysis difficulty have brought huge challenges to operation and maintenance. In the overall Internet environment, experience is one of the most critical competitive advantages. According to statistics, every 1 second of delay will lead to an average of 7% user loss. As a company without a separate operation and maintenance department, the operation and maintenance system of the Liulishuo basic platform is mainly developed by the cloud-infra team. The team's core demands are not only SLA, performance monitoring, alarms and providing relevant data for problem location, but also include cloud-infra's technical value operations, such as utilization, cost savings, business relationship networks, etc. Under these core demands, the requirements for the intelligent operation and maintenance platform are: One-stop intelligent operation and maintenance solution, connecting the entire chain from data collection to computing The intelligent operation and maintenance platform built by Liulishuo needs to process not only time-series data, but also the core business availability data needs to be calculated and analyzed through various logs. Therefore, two data solutions, Logs and Metrics, need to be selected overall. There are different community solutions or commercial solutions for these two types of data, such as ES, Loki, SLS, Prometheus, OpenTSDB, InfluxDB, etc. In the end, Alibaba Cloud SLS was selected as the log solution, and Prometheus+SLS was selected as the time series solution. The main reasons are as follows: At the same time, in order to achieve automation to the greatest extent possible, Alibaba Cloud Log Service (SLS) has developed a mechanism for dynamic discovery of IaaS and PaaS resources suitable for cloud scenarios. It can add newly purchased and created resources to monitoring and collection in real time, avoiding most manual operations. In each data scenario, Alibaba Cloud Log Service SLS is also specially customized to meet the needs of Liulishuo: 1. Log Logs of different businesses are directly collected into different log repositories through Logtail of SLS. Not all logs need to be stored and indexed for a long time, so we classify the logs. Those that require auditing will be delivered to OSS for long-term storage. Logs for business troubleshooting are only kept for 2 weeks, and full-text indexing is enabled. AccessLog only enables indexing of some fields, which can save a lot of indexing costs. 2. Data monitoring Prometheus was chosen as the monitoring solution. For the scenario of Liulishuo, we developed some Exporters to obtain metrics from various cloud products and self-built components. 3. Indicator calculation The calculation of core indicators is partly derived from NGINX's AccessLog. From the entrance, you can get the QPS, error rate, and latency (average, PXX, etc.) of each business, which is not intrusive to the business. Indicators such as resource utilization, middleware, and infrastructure are derived from the time series library written by Prometheus. Based on the Catalog, the relevant indicators of each department and business can be aggregated and calculated. After the calculation, the indicator information is completed. Since the amount of data is very small, it can be easily stored in MySQL and ES, and a copy can be sent to OSS for backup. Build a unified intelligent operation and maintenance platform, transforming it from a cost center to an innovative productivity tool Currently, this intelligent operation and maintenance platform system carries almost all the core operations and maintenance of the company. It has been running stably since its launch, and can easily cope with sudden increases in data volume during various activities. The overall business value is mainly reflected in: Monitoring: The first value of monitoring is to do all kinds of monitoring and alarming, especially SLA-related. Since the data has been associated with specific departments and business applications, it is easy to get the SLA of each department and application, and to promote and improve the company-wide unified problem troubleshooting and fault isolation: Based on Istio's access logs, combined with Catalog information, the call relationship of each application can be calculated, so the business relationship grid can be generated in real time, and the quality of each relationship (edge) can be known. After understanding the business relationship, when a problem occurs, the root cause and fault isolation can be quickly located. Write to the end In the cloud-native era, digitalization is driving business innovation in all industries. Only by improving user experience, accelerating innovation, updating infrastructure and architecture, and making good use of diverse data can we stand out in the overall environment. The intelligent operation and maintenance platform launched by Alibaba Cloud is not only to help engineers reduce their workload, but also to free operation and maintenance engineers from various mechanized work. We will take care of all the "dirty and tiring work", greatly reduce the time of failures, and allow operation and maintenance personnel to focus more creativity on digital innovation and enterprise business innovation, providing enterprises with better competitiveness. High-quality content and customized services enhance the core competitiveness of enterprises Combined with artificial intelligence, the characteristic teaching is unique After further adjustments in the industry, companies in the online education sector will gradually shift their focus from incremental growth to content construction. However, in the overall environment, the syllabus is the same and the teaching methods are very different. Although the courses are different, they are still not amazing, and most companies cannot rely on content to stand out. Product experience is the key, but improving system stability is a challenge With the rapid development of Liulishuo's business, the number of users has increased significantly, from the initial several million users to over 200 million. The changes in data traffic during peak and low periods, business complexity and analysis difficulty have brought huge challenges to operation and maintenance. In the overall Internet environment, experience is one of the most critical competitive advantages. According to statistics, every 1 second of delay will lead to an average of 7% user loss. One-stop intelligent operation and maintenance solution, connecting the entire chain from data collection to computing The intelligent operation and maintenance platform built by Liulishuo needs to process not only time-series data, but also the core business availability data needs to be calculated and analyzed through various logs. Therefore, two data solutions, Logs and Metrics, need to be selected overall. There are different community solutions or commercial solutions for these two types of data, such as ES, Loki, SLS, Prometheus, OpenTSDB, InfluxDB, etc. In the end, Alibaba Cloud SLS was selected as the log solution, and Prometheus+SLS was selected as the time series solution. The main reasons are as follows: At the same time, in order to achieve automation to the greatest extent possible, Alibaba Cloud Log Service (SLS) has developed a mechanism for dynamic discovery of IaaS and PaaS resources suitable for cloud scenarios. It can add newly purchased and created resources to monitoring and collection in real time, avoiding most manual operations. In each data scenario, Alibaba Cloud Log Service SLS is also specially customized to meet the needs of Liulishuo: 1. Log Logs of different businesses are directly collected into different log repositories through Logtail of SLS. Not all logs need to be stored and indexed for a long time, so we classify the logs. Those that require auditing will be delivered to OSS for long-term storage. Logs for business troubleshooting are only kept for 2 weeks, and full-text indexing is enabled. AccessLog only enables indexing of some fields, which can save a lot of indexing costs. 2. Data monitoring Prometheus was chosen as the monitoring solution. For the scenario of Liulishuo, we developed some Exporters to obtain metrics from various cloud products and self-built components. 3. Indicator calculation The calculation of core indicators is partly derived from NGINX's AccessLog. From the entrance, you can get the QPS, error rate, and latency (average, PXX, etc.) of each business, which is not intrusive to the business. Indicators such as resource utilization, middleware, and infrastructure are derived from the time series library written by Prometheus. Based on the Catalog, the relevant indicators of each department and business can be aggregated and calculated. After the calculation, the indicator information is completed. Since the amount of data is very small, it can be easily stored in MySQL and ES, and a copy can be sent to OSS for backup. Build a unified intelligent operation and maintenance platform, transforming it from a cost center to an innovative productivity tool Currently, this intelligent operation and maintenance platform system carries almost all the core operations and maintenance of the company. It has been running stably since its launch, and can easily cope with sudden increases in data volume during various activities. The overall business value is mainly reflected in: Monitoring: The first value of monitoring is to do all kinds of monitoring and alarming, especially SLA-related. Since the data has been associated with specific departments and business applications, it is easy to get the SLA of each department and application, and to promote and improve the company-wide unified problem troubleshooting and fault isolation: Based on Istio's access logs, combined with Catalog information, the call relationship of each application can be calculated, so the business relationship grid can be generated in real time, and the quality of each relationship (edge) can be known. After understanding the business relationship, when a problem occurs, the root cause and fault isolation can be quickly located. Write to the end In the cloud-native era, digitalization is driving business innovation in all industries. Only by improving user experience, accelerating innovation, updating infrastructure and architecture, and making good use of diverse data can we stand out in the overall environment. The intelligent operation and maintenance platform launched by Alibaba Cloud is not only to help engineers reduce their workload, but also to free operation and maintenance engineers from various mechanized work. We will take care of all the "dirty and tiring work", greatly reduce the time of failures, and allow operation and maintenance personnel to focus more creativity on digital innovation and enterprise business innovation, providing enterprises with better competitiveness. |
<<: Which industry will be the hot spot for artificial intelligence in the 5G era?
Hello everyone, I am Xiaozaojun. Today I would li...
IPv6, which is "not fast enough to keep up w...
[51CTO.com original article] Recently, Cisco anno...
OneTechCloud (Yikeyun) launched this year's s...
[51CTO.com Quick Translation] Increasing the brow...
How about DMIT.io? This is a foreign hosting comp...
In today’s article, let’s talk about the core net...
HostUS has a low presence now, but it was very po...
【51CTO.com original article】In 2019, IPv6 transfo...
HTTP (Hypertext Transfer Protocol) has become the...
Recently, WeChat, Douyin, Weibo, public accounts ...
We have talked a lot about network protocols befo...
LOCVPS is a Chinese VPS service provider founded ...
TMThosting is a foreign hosting company establish...
Preface summary Ruijie Cloud Desktop EST protocol...