NGINX provides a built-in status page that allows you to monitor NGINX status. You can use Logtail plug-ins to collect NGINX monitoring logs. You can also perform various operations on the collected logs, such as querying and analyzing the logs and configuring alerts for the logs. This way, you can monitor your NGINX cluster in a comprehensive manner.
Prerequisites
Logtail is installed on your server. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.Step 1: Prepare the environment
Perform the following steps to enable the NGINX status module:
- Run the following command to check whether the NGINX status module is supported. For more information, see Module ngx_http_stub_status_module.
nginx -V 2>&1 | grep -o with-http_stub_status_module with-http_stub_status_module
If
with-http_stub_status_module
is returned, the NGINX status module is supported. - Configure the NGINX status module. Enable the NGINX status module in the NGINX configuration file. The default NGINX configuration file is /etc/nginx/nginx.conf. The following sample code provides an example on how to enable the NGINX status module. For more information, see Enable Nginx Status Page.Note In the following example, allow 10.10.XX.XX specifies that only a server whose IP address is 10.10.XX.XX can access the NGINX status module.
location /private/nginx_status { stub_status on; access_log off; allow 10.10.XX.XX; deny all; }
- Run the following command to check whether the server on which Logtail is installed can access the NGINX status module:
$curl http://10.10.XX.XX/private/nginx_status
If the following information is returned, the NGINX status module is enabled:Active connections: 1 server accepts handled requests 2507455 2507455 2512972 Reading: 0 Writing: 1 Waiting: 0
Step 2: Collect NGINX monitoring logs
- Log on to the Log Service console.
- In the Import Data section, click Custom Data Plug-in.
- Select the project and Logstore. Then, click Next.
- Create a machine group.
- If a machine group is available, click Use Existing Machine Groups.
- If no machine groups are available, perform the following steps to create a machine group. In this example, an Elastic Compute Service (ECS) instance is used.
- On the ECS Instances tab, select Manually Select Instances. Then, select the ECS instance that you want to use and click Create.
For more information, see Install Logtail on ECS instances.
Important If you want to collect logs from an ECS instance that belongs to a different Alibaba Cloud account, a server in an on-premises data center, or a server of a third-party cloud service provider, you must manually install Logtail. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.After you manually install Logtail, you must configure a user identifier for the server. For more information, see Configure a user identifier.
- After Logtail is installed, click Complete Installation.
- In the Create Machine Group step, configure the Name parameter and click Next.
Log Service allows you to create IP address-based machine groups and custom identifier-based machine groups. For more information, see Create an IP address-based machine group and Create a custom identifier-based machine group.
- On the ECS Instances tab, select Manually Select Instances. Then, select the ECS instance that you want to use and click Create.
- Select the new machine group from Source Server Groups and move the machine group to Applied Server Groups. Then, click Next. Important If you apply a machine group immediately after you create the machine group, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Log Service. To resolve this issue, you can click Automatic Retry. If the issue persists, see What do I do if no heartbeat connections are detected on Logtail?
- In the Specify Data Source step, configure Config Name and Plug-in Config. Then, click Next.
- inputs specifies the collection configurations of your data source. This parameter is required. Important You can specify only one type of data source in the inputs parameter.
- processors specifies the processing configurations that are used to parse data. You can extract fields, extract log time, desensitize data, and filter logs. This parameter is optional. You can specify one or more processing methods. For more information, see Overview.
{ "inputs": [ { "type": "metric_http", "detail": { "IntervalMs": 60000, "Addresses": [ "http://10.10.XX.XX/private/nginx_status", "http://10.10.XX.XX/private/nginx_status", "http://10.10.XX.XX/private/nginx_status" ], "IncludeBody": true } } ], "processors": [ { "type": "processor_regex", "detail": { "SourceKey": "content", "Regex": "Active connections: (\\d+)\\s+server accepts handled requests\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+Reading: (\\d+) Writing: (\\d+) Waiting: (\\d+)[\\s\\S]*", "Keys": [ "connection", "accepts", "handled", "requests", "reading", "writing", "waiting" ], "FullMatch": true, "NoKeyError": true, "NoMatchError": true, "KeepSource": false } } ] }
The following table describes the key parameters.
Parameter Type Required Description type String Yes The type of the data source. Set the value to metric_http. IntervalMs Int Yes The interval between two consecutive requests. Unit: milliseconds. Addresses Array Yes The URLs that you want to monitor. IncludeBody Boolean No Specifies whether to collect the body information of requests. Default value: false. If you set this parameter to true, the body information is collected and stored in the content field. - inputs specifies the collection configurations of your data source. This parameter is required.
_address_:http://10.10.XX.XX/private/nginx_status
_http_response_code_:200
_method_:GET
_response_time_ms_:1.83716261897
_result_:success
accepts:33591200
connection:450
handled:33599550
reading:626
requests:39149290
waiting:68
writing:145
Step 3: Query and analyze logs
- Log on to the Log Service console.
- In the Projects section, click the project that you want to view.
- Choose . On the Logstores tab, click the Logstore that you want to view.
- Enter a query statement in the search box, and then specify a time range.
A query statement consists of a search statement and an analytic statement in the Search statement|Analytic statement format. For more information, see Search syntax and SQL syntax and functions.
- Query logs
- Query the information about an IP address.
_address_ : 10.10.0.0
- Query the requests whose response time is greater than 100 ms.
_response_time_ms_ > 100
- Query the requests whose HTTP status code is not 200.
not _http_response_code_ : 200
- Query the information about an IP address.
- Analyze logs
- Obtain the average numbers of waiting connections, reading connections, writing connections, and connections at 5-minute intervals.
*| select avg(waiting) as waiting, avg(reading) as reading, avg(writing) as writing, avg(connection) as connection, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
- Obtain the top 10 servers that have the largest number of waiting connections.
*| select max(waiting) as max_waiting, address, from_unixtime(max(__time__)) as time group by address order by max_waiting desc limit 10
- Obtain the number of IP addresses.
* | select count(distinct(address)) as total
- Obtain the number of IP addresses from which failed requests are initiated.
not _result_ : success | select count(distinct(address))
- Obtain the IP addresses from which the most recent 10 failed requests are initiated.
not _result_ : success | select _address_ as address, from_unixtime(__time__) as time order by __time__ desc limit 10
- Obtain the total number of requests at 5-minute intervals.
*| select avg(handled) * count(distinct(address)) as total_handled, avg(requests) * count(distinct(address)) as total_requests, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
- Obtain the average request latency at 5-minute intervals.
*| select avg(_response_time_ms_) as avg_delay, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
- Obtain the numbers of successful requests and failed requests.
not _http_response_code_ : 200 | select count(1)
_http_response_code_ : 200 | select count(1)
- Obtain the average numbers of waiting connections, reading connections, writing connections, and connections at 5-minute intervals.
- Query logs