[51CTO.com original article] On September 15, the 23rd 51CTO Technology Salon was successfully held at Jiacheng Impression next to Lama Temple. The theme of this salon was "Accurate and Fast Operation and Maintenance Based on Big Data". In order to truly help operation and maintenance and developers solve the problems encountered in the operation and maintenance development process, 51CTO invited four heavyweight guests to attend the event, including Cao Ronghai, General Manager of the Solution Center of Donghua Software Co., Ltd., Wang Shishuai, AWS Solution Architect, Shen Jianlin, Senior Architect of JD Finance, and Zhao Huan, Senior Operation and Maintenance Engineer of Ele.me. The speeches of the four big names attracted more than a hundred developers and operation and maintenance personnel to the scene. In the question-and-answer session, everyone rushed to ask questions, and the lecturers also taught their lessons. The interactive learning time on this Saturday afternoon passed quietly in the exchanges. The editor has promptly compiled the highlights of the four lecturers’ speeches and shared them with more people, hoping to provide some reference and reference for everyone to solve problems and develop new ideas at work. Cao Ronghai from Donghua Network Intelligence: Integrated operation and maintenance management in the new era Cao Ronghai first analyzed several major problems and challenges that users currently face in the process of information-based operation and maintenance. For example, as IT architecture becomes increasingly complex, how can problems be discovered before users do? Driven by new technologies, how can integrated operation and maintenance break through the bottleneck of traditional IT architecture? In the context of integrated operation and maintenance, how can the efficiency and standardization of IT service management be improved? He said that driven by new technologies, there are various difficulties in data collection, processing, analysis, and presentation, especially the dispersion of basic data and management tools, making it difficult to achieve integrated operation and maintenance. Faced with these practical operation and maintenance challenges, Cao Ronghai also analyzed and introduced the golden triangle model of IT operation and maintenance management based on Donghua's many years of operation and maintenance management experience and perspective. He analyzed the ideas for building an operation and maintenance management system with integrated supervision and control from three perspectives: efficient organizational goal management, rich operation monitoring methods, and standardized operation and maintenance process system. Finally, Cao Ronghai introduced Donghua's integrated operation and maintenance management solution to the participants. Donghua first decided to establish an application-centric integrated operation and maintenance management approach, then established rules and regulations, standardized the integrated operation and maintenance process, consolidated the integrated operation and maintenance basic master data, realized resource allocation sharing, and finally established an integrated cloud management platform based on virtualization to establish an integrated intelligent operation and maintenance inspection method. According to Cao Ronghai, Donghua's integrated operation and maintenance platform can now build a harmonious integrated operation and maintenance service ecosystem for users, including IT comprehensive monitoring, application performance management, operation and maintenance process management, data full-cycle management, cloud management platform and intelligent inspection. "Donghua's vision is to create a harmonious integrated operation and maintenance service ecosystem with customers," Cao Ronghai concluded. AWS Wang Shishuai: How to apply intelligent operation and maintenance on the AWS platform Wang Shishuai first introduced what AIOps is, the origin of AIOps, the relationship between artificial intelligence, machine learning, and deep learning, and the application scenarios. He then introduced how to collect and automate data on the AWS platform, and how to reduce the daily low-level and complicated work of operation and maintenance personnel through application hosting services and serverless architecture services. In Wang Shishuai's view, secure operation and maintenance is an important part of operation and maintenance. He introduced two services provided by AWS for secure operation and maintenance - Amazon GuardDuty and Amazon Macie. It is understood that Amazon GuardDuty is a threat detection service that continuously monitors malicious or unauthorized behavior to help protect users' AWS accounts and workloads. The service monitors activities that indicate that an account may have been stolen, such as abnormal API calls or potential unauthorized deployments. GuardDuty also detects potentially compromised instances or detections from attackers. Amazon Macie is a security service that automatically discovers, classifies, and protects sensitive data in AWS through machine learning. Amazon Macie identifies sensitive data such as personally identifiable information (PII) or intellectual property and provides you with dashboards and alerts so you can see how that data is accessed or moved. This fully managed service continuously monitors data access activity for anomalies and issues alerts when it detects unauthorized access or risk of accidental data leakage. At the end of his speech, Wang Shishuai shared the technology stack that AWS provides to users in AI/ML. Users can quickly and easily build, train, and deploy machine learning models of any scale through the end-to-end machine learning platform Amazon Sagemaker. JD Finance Shen Jianlin: Precise Operation and Maintenance under Massive Services Shen Jianlin first started from the operation and maintenance pain points currently faced by medium and large enterprises, such as the ever-expanding number of services, increasingly complex online environments, and intricate service dependencies. He analyzed in detail the multiple demands of operation and maintenance personnel, such as automatic sorting of service dependencies, automatic topology generation, real-time tracking of calls, detailed analysis of exceptions, tracking of call sources, real-time capacity planning, root cause analysis of problems, and other basic operation and maintenance demands. He then shared how to respond promptly to the rapid monitoring and operation and maintenance of various business scenarios by the R&D team through abstraction and modeling in an environment of rapid business development. He focused on the design ideas and thoughts of business monitoring models such as classification monitoring, ratio monitoring, and process monitoring. Finally, Shen Jianlin analyzed the design principles, key points, difficulties of the operation and maintenance and monitoring systems, as well as the "pitfalls" he encountered during the product iteration process, which was full of useful information. Zhao Huan from Ele.me: Ele.me Zookeeper multi-site active automated operation and maintenance practice Zhao Huan first introduced the service connotation, main functions and operation mechanism of Zookeeper to the audience. Then he introduced the deployment of Ele.me's multi-site active-active. Why do we need to do multi-site active-active? Zhao Huan explained that one reason is to solve the disaster recovery of the physical computer room, because as the business grows, it will reach a critical point. Once a failure occurs, the loss will be far greater than the technical investment. In addition, the capacity of a single computer room is limited, and it is necessary to break through the physical limitations. The second reason is to guide user traffic. He introduced the overall structure of Ele.me's multi-active servers in detail. Currently, the number of Ele.me's IDCWatch exceeds 100 million, and the number of multi-active nodes exceeds 1 million. The difficulty of operation and maintenance can be imagined. "Since 90% of Ele.me users access the site through mobile terminals, the mobile phone will provide a latitude and longitude with the user's geographical location. Once the user places an order, Ele.me will divide the traffic through sharding to ensure fast distribution and ensure user experience." Zhao Huan said. Then he also demonstrated four application scenarios of Zookeeper in Ele.me, as well as the cross-computer room replication architecture. After each lecturer's speech, developers asked questions enthusiastically. The whole event was efficient and active. After the event, people left the venue with gains. The 23rd 51CTO Salon ended successfully. [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: 5G is here, and you can’t hide from it
Cloud computing technology is creating a new and ...
【Attention】This merchant has run away!!! Limewave...
Speaking of the Communications Design Institute, ...
VULTR has long been offering free registrations f...
Take a simple topology: In this topology, G0/0/1 ...
At the start of every new year, experts and forec...
Living in this era of the Internet, where you can...
LOCVPS released the promotion information for Jul...
From the perspective of network implementation, n...
VMware SD-WAN by VeloCloud prioritizes centralize...
[[389252]] Yesterday, the three major operators a...
GSA's latest report shows that 154 mobile ope...
[[381273]] This article is reprinted from the WeC...
About the Author Lightworker, a network technolog...
Network outages, freezes, unstable connections, a...