EBS Lens, a powerful tool for block storage monitoring and service stress testing and tuning, is released

EBS Lens, a powerful tool for block storage monitoring and service stress testing and tuning, is released

EBS Monitoring Status

Block storage is a block device product provided by Alibaba Cloud for ECS cloud servers, featuring high performance and low latency. As Double Eleven approaches, disk IO is often the focus of operation and maintenance during the big promotion. If the disk is overloaded, critical business may stagnate or even crash. There are currently several problems with EBS monitoring

1. The native monitoring provided by block storage is limited to the single instance level. You can only view the performance monitoring of a single cloud disk, and lack the monitoring of the global cloud disk status. If there are many cloud disks, then the monitoring of the cloud disk status is very troublesome

2. Through SLS Logtail, Telegraf or cloud monitoring agent, the status of all cloud disks of a single ECS instance can be monitored. However, these monitoring methods are invasive. The installation of agents, maintenance of monitoring disks, fine-grained control of cloud disk instance monitoring, and monitoring across ECS instances all have a high learning cost and price for users.

3. Single analysis dimension. In the above scenarios, the monitoring and analysis of cloud disks are still based on cloud disk IDs, while the attributes of cloud disk assets themselves also contain a lot of information. For example, it is difficult for users to see a large picture of all their cloud disk assets, the distribution of cloud disks in various regions, the proportion of various cloud disk types, and other information.

Taking into account the above pain points of users in using EBS monitoring, the SLS team and the EBS team released EBS Lens ( Lens means lens, and the name Lens means insight into subtle changes in cloud products ), which provides data analysis and resource monitoring functions for block storage. It can help users obtain block storage resource information and performance monitoring data on the cloud, improve the management efficiency of block storage resources on the cloud, and efficiently analyze business fluctuations and resource performance consumption.

EBS Lens Product Features

Automated data collection

After EBS Lens is turned on, SLS will automatically pull the cloud disk list from the user's EBS assets. The first page displayed after entering the APP is the access management page. On this page, you can see a global management view of the EBS cloud disk, including the following information:

Displays the total number of currently connected cloud disks, the number of cloud disks for data collection, the region of the cloud disks, and the number of target storage repositories.
Displays EBS instance information, such as instance ID, tag, cloud disk type, cloud disk type, availability zone information, collection status, collection operation, etc. If the user creates, updates, or deletes an EBS cloud disk after opening EBS Lens, SLS will automatically update the cloud disk list here.

Collection Configuration

After the EBS cloud disk assets are synchronized, users need to enable monitoring data collection for cloud disk assets. Here, we provide two collection methods: one is manual collection for users to perform refined management, and the other is automatic collection for users to facilitate global management when there are too many EBS cloud disks.

Manual collection

Supports management of the collection status of a single instance

Considering that there will be a large number of EBS instances, batch opening/closing operations are supported under a single page.

Automated collection

When a user has hundreds or even thousands of cloud disks, manual collection management obviously cannot meet the needs, so we also provide an automatic collection function. Automatic collection provides a graphical configuration interface:

You can use attributes such as region, instance ID, payment type, disk type, and tags to set collection conditions.
In standard mode, the conditions are in an AND relationship. In advanced mode, you can flexibly combine and nest conditions.
After the configuration is saved, automatic collection is immediately enabled. All cloud disks that meet the conditions will automatically enable log collection, thus eliminating the need for manual operations. In addition, when instances increase or decrease, automatic collection can also sense the changes in instances and make corresponding adjustments.

Repository information display

After enabling cloud disk monitoring data collection, SLS will pull monitoring data from the EBS cloud disk and deliver it to the user-configured target repository, storing it in the form of time series data. In the target repository tab, the following functions are supported:

Supports viewing the region and data retention time of the storage target library, and supports adjustment of data retention time

Click the target library to enter the SLS project page and view the original monitoring data.
After EBS cloud disk asset synchronization and log collection are enabled, EBS Lens has monitoring data for EBS cloud disk assets and cloud disks. Based on these two data, EBS Lens creates two monitoring dashboards, resource overview and performance analysis pages.

Multi-dimensional data aggregation and rich data indicator types

The resource overview page provides a global asset overview. By default, it provides statistical information of all cloud disks under the user account based on the user dimension, including:

  • Total number of cloud disks
  • Total capacity of cloud disk
  • The number of regions to which the cloud disk instance belongs
  • The number of availability zones to which the cloud disk instance belongs
  • Enable snapshot cloud disk ratio
  • Proportion of encrypted cloud disks
  • Top 10 capacity regions
  • Top 10 capacity availability zones
  • Cloud disk type and capacity distribution
  • Capacity distribution by payment type

In addition to the account dimension, it also supports filtering by region, payment type, and disk type, fully meeting the various statistical needs of users.

High-precision data monitoring granularity

The performance analysis page provides a global cloud disk monitoring dashboard. By default, it will collect statistics on the key indicators of all disks under the user account, including the total throughput, the throughput change curve, the read/write throughput storage TOP100 instances, and the throughput change curve.
IOPS
The total IOPS change curve shows the top 100 read and write IOPS instances, as well as the IOPS change curve performance analysis page. It also supports filtering by region, payment type, cloud disk type, and cloud disk ID to meet the user's demand for refined monitoring. The cloud disk monitoring granularity is 10s, and the monitoring delay is within 10s, which can effectively monitor jitter scenarios.

Usage scenarios

EBS Lens has such a convenient management method and rich, multi-dimensional monitoring indicators. Below we list several common scenarios to explain the functions of EBS Lens in detail:

Monitoring scenarios

Next, we simulate a common disk IO anomaly scenario to demonstrate the application of EBS Lens in monitoring scenarios.

Environment Preparation

First, we create a cloud disk, or use an existing cloud disk, and mount it to the ECS instance. For the operation of mounting a cloud disk, see: https://help.aliyun.com/document_detail/25446.html?spm=a2c6h.13066369.0.0.57b1e42fgsiBLE&source=5176.11533457&userCode=ffsbbyn0&type=copy. Note that after the cloud disk is mounted to the ECS instance, partitions and file systems must be created to make the cloud disk available.
Configure all cloud disks under the account through automatic collection and enable monitoring data collection

3. Open the performance analysis page and confirm that the cloud disk monitoring data has been connected

Abnormal simulation

We enter the ECS instance and use dd to simulate an abnormal write operation to the disk:

EBS Lens monitoring results

On the EBS Lens performance analysis page, we found that the throughput and IOPS of a disk quickly increased to TOP1 in the large disk. In order to view the detailed indicators of the disk, we enter the disk ID in the filter box, and we can see the changes in the throughput and IOPS of this disk within the selected time range. And this instance ID is exactly the disk with abnormal writing that we simulated. Online, if similar problems occur, then we should go to detailed problem location, such as abnormal service log printing, unreasonable data drop to disk, etc. By adjusting the time range, EBS Lens supports displaying the data within the set ttl range on this page, which is also very helpful for fault review and analysis.

With the alarm function of SLS, users can automatically monitor the performance of cloud disks and accurately locate abnormal cloud disks.

Service stress testing and performance tuning

In addition to monitoring scenarios, EBS Lens also plays a very important role in service stress testing and performance tuning scenarios. For all performance tests, the most critical infrastructure is monitoring indicators. The EBS Lens performance analysis dashboard can provide real-time performance indicators of cloud disks, which can effectively help users quickly locate whether there are performance bottlenecks in cloud disks. We simulate a simple write scenario: a large amount of data needs to be written to the disk at the fastest speed.

Environment Preparation

1. We use the same ECS environment as above. In this scenario, we specify a fixed cloud disk for testing.

2. Open the monitoring data collection of the cloud disk on the EBS Lens page

Scenario simulation

In the first version, FIO is used to simulate a random write implementation scenario with poor performance:

fio -filename=/mnt/test1 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest

Through EBS Lens monitoring, we found that the throughput and IO of the cloud disk are relatively low, far from reaching the performance limit of the cloud disk, which are 15MB/s and 900 respectively. For reference, please refer to the block storage performance indicator document: https://help.aliyun.com/document_detail/25382.html

Therefore, we further optimized the writing script and changed the random writing implementation to a better sequential writing implementation:

fio -filename=/mnt/test2 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest

Through the monitoring of EBS Lens, the throughput reached 47MB/s and the IOPS reached about 3000.

From the block storage performance index document, we know that the performance of SSD cloud disk varies with the data block size. The smaller the data block, the lower the throughput and the higher the IOPS. Therefore, in order to improve the throughput, we consider increasing the data block size for a single write:

fio -filename=/mnt/test2 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=64k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest

Through the monitoring of EBS Lens, the throughput reached 143MB/s, while the IOPS dropped to about 2300. This shows how convenient it is to test and tune the performance of disk IO with EBS Lens.

appendix

illustrate

EBS Lens is currently in the public beta phase. If you are interested in trying it out, you can contact us through a ticket at https://selfservice.console.aliyun.com/ticket/category/sls/recommend/3868. If you have any questions during the trial, you can also contact us directly.
All functions related to the EBS Lens public beta period are free of charge. The public beta period will be announced in advance. After the public beta period ends, the fee calculation can refer to https://help.aliyun.com/document_detail/31694.html

Reference Documentation

EBS Lens Help Document: https://help.aliyun.com/document_detail/338394.html
EBS Lens front-end entrance: https://sls.console.aliyun.com/lognext/profile

<<:  Ruijie Cloud Desktop supports Beijing's COVID-19 fight

>>:  New data momentum - Fanruan’s 4th Smart Data Conference concluded successfully!

Recommend

Boomer.host: $4.95/year-512MB/5GB/500GB/Texas (Houston)

The tribe once shared information about Boomer.ho...

China's operators' semi-annual report: 5G package users close to 500 million

On August 19, China Unicom announced its first-ha...

In-depth understanding of UDP programming

What is UDP? UDP is the abbreviation of User Data...

Http protocol: Under what circumstances does an options request occur?

background: A new colleague asked me that there w...

Building the fiber optic network for the next 20 years

The pace of fiber network deployment is accelerat...

How to Evaluate DCIM Tools for the Modern Data Center

There are almost too many data center infrastruct...

5 exciting 5G use cases

As the fifth generation of wireless technology, 5...