prometheus节点CPU、内存、磁盘使用率的报警规则 以及 node-exporter应建的报警规则 均以yaml文件 promql 格式表示
时间: 2024-02-06 12:03:37 浏览: 337
好的,以下是 prometheus 节点 CPU、内存、磁盘使用率的报警规则和 node-exporter 应建的报警规则的 yaml 配置文件示例:
1. prometheus 节点 CPU、内存、磁盘使用率的报警规则
```yaml
groups:
- name: prometheus_rules
rules:
- alert: NodeCPUUsageHigh
expr: 100 * (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "CPU usage is high on instance {{ $labels.instance }}"
description: "CPU usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: NodeMemoryUsageHigh
expr: 100 * (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage is high on instance {{ $labels.instance }}"
description: "Memory usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: NodeDiskUsageHigh
expr: 100 * node_filesystem_size_bytes{fstype!="tmpfs", fstype!="rootfs"} - node_filesystem_free_bytes{fstype!="tmpfs", fstype!="rootfs"} > 90 * node_filesystem_size_bytes{fstype!="tmpfs", fstype!="rootfs"}
for: 5m
labels:
severity: warning
annotations:
summary: "Disk usage is high on instance {{ $labels.instance }}"
description: "Disk usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
```
2. node-exporter 应建的报警规则
```yaml
groups:
- name: node_exporter_rules
rules:
- alert: NodeCPUUsageHigh
expr: 100 * (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "CPU usage is high on instance {{ $labels.instance }}"
description: "CPU usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: NodeMemoryUsageHigh
expr: 100 * (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage is high on instance {{ $labels.instance }}"
description: "Memory usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: NodeDiskUsageHigh
expr: 100 * node_filesystem_size_bytes{fstype!="tmpfs", fstype!="rootfs"} - node_filesystem_free_bytes{fstype!="tmpfs", fstype!="rootfs"} > 90 * node_filesystem_size_bytes{fstype!="tmpfs", fstype!="rootfs"}
for: 5m
labels:
severity: warning
annotations:
summary: "Disk usage is high on instance {{ $labels.instance }}"
description: "Disk usage is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: NetworkTrafficHigh
expr: sum(rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m])) by (instance) > 1e+08
for: 5m
labels:
severity: warning
annotations:
summary: "Network traffic is high on instance {{ $labels.instance }}"
description: "Network traffic is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: ProcessCountHigh
expr: count(processes) by (instance) > 600
for: 5m
labels:
severity: warning
annotations:
summary: "Process count is high on instance {{ $labels.instance }}"
description: "Process count is high on instance {{ $labels.instance }} (value: {{ $value }})."
- alert: SystemLoadHigh
expr: node_load1 > 2.5 or node_load5 > 2 or node_load15 > 1.5
for: 5m
labels:
severity: warning
annotations:
summary: "System load is high on instance {{ $labels.instance }}"
description: "System load is high on instance {{ $labels.instance }} (value: {{ $value }})."
```
需要注意的是,上述示例中的 promql 表达式只是简单的示例,具体的阈值和表达式应该根据实际情况进行调整。另外,需要将上述 yaml 配置文件保存为一个文件,比如 node-exporter-rules.yaml,并将该文件加入到 Prometheus 的配置文件中以启用这些报警规则。
阅读全文