Nginx log analysis: writing shell scripts for comprehensive log statistics

Nginx log analysis: writing shell scripts for comprehensive log statistics

Nginx is a high-performance HTTP and reverse proxy server, as well as an IMAP/POP3/SMTP proxy server. Nginx is widely used in both high-traffic websites and small personal blogs. In actual production environments, the analysis of Nginx logs helps us understand the access situation of the website, find potential problems and optimize them. This article will implement a comprehensive statistical analysis of Nginx logs by writing a Shell script.

Nginx log format

First, we need to make sure that the Nginx log format is similar to the following:

 log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';

Assume that our log file is called access.log.

Shell script implementation

Next, we write a Shell script to perform statistical analysis on Nginx logs. This script includes the following functions:

  • Count the number of various status codes
  • Count the most visited Referers
  • Count the most visited URIs
  • Statistics of most visited IP and User-Agent
  • Statistics of number of requests per minute, traffic, request time, status code, etc.

Script code implementation:

1. Count the number of various status codes:

 awk ' { Arry[$12] += 1; total++; } END { for (s in Arry) { printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s } } '$LOG_FILE | sort -nr -k 1,1

① Arry[$12] += 1;:

  • $12 is the twelfth field in the log file, usually indicating the HTTP status code.
  • Arry is an associative array that uses HTTP status codes as keys and accumulates the number of occurrences of each status code into the array Arry.
  • Arry[$12] += 1; means the number of times status code $12 appears is increased by 1.

② total++;:

  • Record the total number of log lines.

③ for (s in Arry):

  • Iterate over each status code s in the array Arry.

④ printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s:

  • Print the number of occurrences, percentage, and status code of each status code.
  • Arry[s] is the number of occurrences of status code s.
  • Arry[s] / total is the percentage of the status code that appears (as a percentage of the total number of requests).
  • s is the status code.
  • The output format is: number of occurrences\t ratio\t status code.

Running the above command will produce the following output:

Count the number of various status codes

2. Count the most visited Referers

 awk -F\" ' { Arry[$4] += 1; # 将每个引用的字段($4)出现的次数累加到数组Arry中total++; # 记录总的日志行数} END { for (s in Arry) { # 遍历数组Arry中的每个引用字段printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s # 打印每个引用字段的出现次数、占比和引用字段本身} } ' $LOG_FILE | sort -nr -k 1,1 # 按出现次数降序排序

After executing the above command, the output is as shown below:

Count the most visited Referers

3. Count the most visited URIs

 awk ' { Arry[$9] += 1; # 将每个引用的字段($18)出现的次数累加到数组Arry中total++; # 记录总的日志行数} END { for (s in Arry) { # 遍历数组Arry中的每个引用字段printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s # 打印每个引用字段的出现次数、占比和引用字段本身} } '$LOG_FILE | sort -nr -k 1 # 按出现次数降序排序

After executing the above command, the output is as shown below:

Count the most visited URIs

4. Count the most visited IP and User-Agent

(1) Count the number of IP visits

 awk ' { Arry[$1] += 1; # 将每个IP地址出现的次数累加到数组Arry中total++; # 记录总的日志行数} END { for (s in Arry) { # 遍历数组Arry中的每个IP地址printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s # 打印每个IP地址的出现次数、占比和IP地址本身} } '$LOG_FILE | sort -nr -k 1,1

After executing the above command, the output is as shown below:

Count the number of IP visits

(2) Count the most visited User-Agents

 awk ' { Arry[$18] += 1; # 将每个引用的字段($18)出现的次数累加到数组Arry中total++; # 记录总的日志行数} END { for (s in Arry) { # 遍历数组Arry中的每个引用字段printf "%d\t%.4f\t%s\n", Arry[s], Arry[s] / total, s # 打印每个引用字段的出现次数、占比和引用字段本身} } '$LOG_FILE | sort -nr -k 1 # 按出现次数降序排序

After executing the above command, the output is as shown below:

Statistics visit Statistics visit

5. Statistics on the number of requests per minute, traffic, request time, status code, etc.

 awk -F '|'' BEGIN { printf "时间\t数量\t流量[MB]\t请求时间\t20x\t30x\t40x\t50x\t60x\n" } { # 提取时间的分钟部分minute = substr($2, 12, 5) # 累计流量、请求数和请求时间tms[minute] += $13 cnt[minute] += 1 reqt[minute] += $15 # 统计状态码status_code = $9 if (status_code ~ /^2/) { sc20x[minute]++ } else if (status_code ~ /^3/) { sc30x[minute]++ } else if (status_code ~ /^4/) { sc40x[minute]++ } else if (status_code ~ /^5/) { sc50x[minute]++ } else { sc60x[minute]++ } } END { for (t in tms) { printf "%s\t%d\t%.4f\t%.4f\t%d\t%d\t%d\t%d\t%d\n", t, cnt[t], tms[t] / 1024 / 1024, (cnt[t] > 0 ? reqt[t] / cnt[t] : 0), sc20x[t], sc30x[t], sc40x[t], sc50x[t], sc60x[t] } } '"$LOG_FILE"

After executing the above command, the following results are output:

Summarize

Through the above Shell script, we can quickly and comprehensively analyze Nginx logs to understand the website's access and performance. This not only helps us find potential problems, but also provides strong data support for subsequent optimization work. In actual applications, you can further expand and customize this script according to your needs.

How to obtain the script

The above scripts have been uploaded to gitee. You can get them if you need them. The repository on gitee mainly shares some commonly used scripts in work. You can frok or watch the repository so that you can pay attention to updates in time.

Script Repository

Warehouse address: https://gitee.com/didiplus/script

<<:  Users say | "Double High Construction" improves the level of informatization: Etherlight helps Xi'an Railway Vocational and Technical College in its intelligent digital transformation

>>:  7.2 Our computer room is disconnected from the Internet! What should I do?

Recommend

How will the next generation of Wi-Fi change the smart home?

Strategy Analytics predicts that the number of Wi...

How is HostYun? Simple test of HostYun Hong Kong EQ-CMI line VPS

There is no discount information, so continue tes...

How can 5G fixed wireless access replace fiber optic access to the last mile?

[[180048]] Verizon, a US operator, announced that...

The story of spectrum: Gigabit is just the beginning

At the end of 4G development, the most advanced m...

Private wireless networks provide secure solutions for digital transformation

When many businesses first installed wireless IoT...

What is a router in a network? Core functions explained

A router is a core element of internet connectivi...

User complaints have dropped significantly, so why can’t operators smile?

[[403552]] This article is reprinted from the WeC...

Three "fairy tale" ways to build a data center

There is a very important indicator for evaluatin...

Seven chatbot building platforms: Easily build your own bot

【51CTO.com Quick Translation】 Chatterbot is a pro...