1. Overview This article mainly explains MaxCompute Spark resource tuning. The purpose is to guide users to better optimize the use of Spark job resources, maximize resource utilization, and reduce costs while ensuring the normal operation of Spark tasks. 2. Sensor Sensor provides a visual way to monitor the running Spark process. Each worker (Executor) and master (Driver) has its own status monitoring graph, and the entry can be found through Logview, as shown in the following figure: After opening the Sensor, you can see the following figure that provides the CPU and memory usage of the Driver/Executor during its life cycle: Users can intuitively see the CPU utilization of the task from the cpu_usage graph Memory Metrics mem_rss represents the resident memory occupied by the process. This part of memory is the actual memory used by the Spark task. It usually requires the user's attention. If the memory exceeds the amount of memory requested by the user, OOM may occur, causing the Driver/Executor process to terminate. In addition, this curve can also be used to guide users to optimize memory. If the actual usage is much less than the amount requested by the user, you can reduce memory requests, maximize resource utilization, and reduce costs. mem_cache (page_cache) is used to cache data in disk into memory, thereby reducing disk I/O operations. It is usually managed by the system. If the physical machine has sufficient memory, mem_cache may be used a lot, and users do not need to worry about the allocation and recovery of this memory. 3. Resource parameter tuning (1) Executor Cores Related parameters: spark.executor.cores (2) Executor Number Related parameters: spark.executor.instances (3) Executor Memory Related parameters: spark.executor.memory The first line of the Logview result after the task is completed appears: The job has been killed by "OOM Killer", please check your job's memory usage. Memory usage is very high in Sensor java.lang.OutOfMemoryError: Java heap space appears in the Executor log Frequent GC information found in the Spark UI may be an indirect manifestation of OOM: some Executors have errors such as No route to host: workerd********* / Could not find CoarseGrainedScheduler. Possible causes and solutions: (4) Driver Cores Related parameters spark.driver.cores (5) Driver Memory Related parameter 1: spark.driver.memory The Spark application becomes unresponsive or stops directly. A Driver OutOfMemory error is found in the Driver log (Logview->Master->StdErr). Possible causes and solutions: (6) Local disk space Related parameters: spark.hadoop.odps.cupid.disk.driver.device_size: Symptoms of insufficient disk space: Solution: The easiest way is to directly add more disk space and increase spark.hadoop.odps.cupid.disk.driver.device_size Repartition the data to solve the data skew problem Reduce the task concurrency of a single Executor spark.executor.cores Reduce the concurrent reading of the table spark.hadoop.odps.input.split.size Increase the number of executors spark.executor.instances Note: Also, because the disk needs to be mounted before the JVM is started, this parameter must be configured in the spark-defaults.conf file or the dataworks configuration item, and cannot be configured in the user code. In addition, it should be noted that the unit of this parameter is g, and g cannot be omitted. 4. Conclusion The above mainly introduces the resource shortage problem that may be encountered during the use of MaxCompute Spark and the corresponding solution ideas. In order to maximize the utilization of resources, it is first recommended to apply for a single worker resource in a ratio of 1: 4, that is, 1 core: 4 gb memory. If OOM occurs, you need to check the log and sensor to preliminarily locate the problem, and then make corresponding optimizations and resource adjustments. It is not recommended to set too many cores for a single executor. Usually, a single executor is relatively safe at 2-8 cores. If it exceeds 8, it is recommended to increase the number of instances. Appropriately increasing the off-heap memory (reserving some memory resources for the system) is also a common tuning method, which can usually solve many OOM problems in practice. Finally, users can refer to the official document https://spark.apache.org/docs/2.4.5/tuning.html, which contains more memory tuning techniques, such as gc optimization, data serialization, etc. |
>>: How to quickly build an enterprise full-scenario database management platform in one stop?
The wave of digitalization is driving the world e...
From August 21st to 23rd, the 2018 (4th) China Sm...
In order to expand investment in strategic emergi...
2018 has passed. This year was a year of hard wor...
Recently, China Mobile announced the bidding resu...
[[417538]] 2021 is the third year of 5G commercia...
The official development of ACI began in January ...
I believe many of my friends have encountered suc...
Last week we shared information about spinservers...
DediPath has released its latest summer promotion...
High-density cabling products and standard modula...
Sharktech is a long-established foreign hosting c...
I would like to share some information about high...
[Shanghai, China, September 18, 2019] During HUAW...