prometheus之alertmanager 通过webhook 钉钉报警

一、安装概览

  • 依赖go 语言环境
  • 依赖prometheus
  • 依赖node_export

本文章依赖以上服务,请确认已安装

  1. 修改prometheus 配置
  2. 安装alertmanager
  3. 安装webhook

二、修改prometheus配置

1、修改prometheus配置文件prometheus.yml

vim  /opt/app/Prometheus/prometheus.yml 

2、在 rule_files 之后添加/opt/app/Prometheus/*.rules 用于编写报警规则

  - /opt/app/Prometheus/*.rules
图片[1]-prometheus之alertmanager 通过webhook 钉钉报警-百分数

3、编写报警规则

在/opt/app/Prometheus 目录下创建 .rules结尾的文件

 vim /opt/app/Prometheus/hoststats-alert.rules

插入以下代码

groups:
- name: cpu-node
  rules:
  - record: job_instance_mode:node_cpu_seconds:avg_rate5m
    expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
- name: base_monitor_rule
  rules:
  #/硬盘挂载点
  - alert: mountpoint_root_usage
    expr: ( (node_filesystem_size_bytes {mountpoint ="/"} - node_filesystem_free_bytes {mountpoint ="/"}) /node_filesystem_size_bytes {mountpoint ="/"} * 100 ) > 90
    labels:
      severity: warning
    annotations:
      description: "{{$labels.instance}}: 挂载点 {{$labels.mountpoint}} 使用率超过 90% ,当前值: {{ $value }}"
  #/home硬盘挂载点
  - alert: mountpoint_home_usage
    expr: ( (node_filesystem_size_bytes {mountpoint ="/home"} - node_filesystem_free_bytes {mountpoint ="/home"}) /node_filesystem_size_bytes {mountpoint ="/home"} * 100 ) > 90
    labels:
      severity: warning
    annotations:
      description: "{{$labels.instance}}: 挂载点 {{$labels.mountpoint}} 使用率超过 90% ,当前值: {{ $value }}"
  #/opt硬盘挂载点
  - alert: mountpoint_opt_usage
    expr: ( (node_filesystem_size_bytes {mountpoint ="/opt"} - node_filesystem_free_bytes {mountpoint ="/opt"}) /node_filesystem_size_bytes {mountpoint ="/opt"} * 100 ) > 90
    labels:
      severity: warning
    annotations:
      description: "{{$labels.instance}}: 挂载点 {{$labels.mountpoint}} 使用率超过 90% ,当前值: {{ $value }}"
  #主机cpu负载大于90%
  - alert: HostHighCpuLoad
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 90
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: "cpu 负载报警  (instance {{ $labels.instance }})"
      description: " cpu 负载> 90%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
   #内存剩余小于10%
  - alert: HostOutOfMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: " (instance {{ $labels.instance }})主机已使用内存"
      description: "主机可用内存 (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
   #主机网卡接收超过100M/s
  - alert: HostUnusualNetworkThroughputIn
    expr: sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary:  "(instance {{ $labels.instance }}) 网络吞吐量异常"
      description: "主机网络接口接收的数据可能过多 (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
   #主机网卡发送过多数据>100M/s
  - alert: HostUnusualNetworkThroughputOut
    expr: sum by (instance) (rate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: Host unusual network throughput out (instance {{ $labels.instance }})
      description: "主机网络接口发送过多数据 (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
   #主机硬盘读取速度超过50M
  - alert: HostUnusualDiskReadRate
    expr: sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "硬盘读取速度报警 (instance {{ $labels.instance }})"
      description: "硬盘读取速度 (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  #主机硬盘写入速度超过50M
  - alert: HostUnusualDiskWriteRate
    expr: sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "硬盘写入速度报警 (instance {{ $labels.instance }})"
      description: "硬盘写入速度 (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

重启prometheus后生效

systemctl restart prometheus

三、安装webhoook钉钉通知插件

下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

备用下载地址:prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

解压并创建软连接

tar zxvf /opt/app/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz -C /opt/app
ln -s /opt/app/prometheus-webhook-dingtalk-1.4.0.linux-amd64 /opt/app/webhook

钉钉创建机器人 增加自定义关键词 “WARNING”

图片[2]-prometheus之alertmanager 通过webhook 钉钉报警-百分数

创建报警模版

vim /opt/app/webhook/contrib/templates/legacy/notice.tmpl

#添加以下代码

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

{{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
{{ end }}{{ end }}

{{ define "default.__text_alert_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}

**运营团队:** {{ .Labels.team | upper }}

**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}


{{ end }}

**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{ define "default.__text_alertresovle_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}

**运营团队:** {{ .Labels.team | upper }}

**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}

**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}


{{ end }}

**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}

{{/* Default */}}
{{ define "default.title" }}{{ template "__subject" . }}{{ end }}
{{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ if gt (len .Alerts.Firing) 0 -}}

{{ template "default.__text_alert_list" .Alerts.Firing }}


{{- end }}

{{ if gt (len .Alerts.Resolved) 0 -}}
{{ template "default.__text_alertresovle_list" .Alerts.Resolved }}


{{- end }}
{{- end }}

{{/* Legacy */}}
{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
{{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}

{{/* Following names for compatibility */}}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}

修改webhook 配置文件

vim /opt/app/webhook/config.yml

#添加以下代码,注意修改url后钉钉webhook地址
templates:
   - '/opt/app/webhook/contrib/templates/legacy/notice.tmpl'
targets:
  webhook:
    url: https://oapi.dingtalk.com/robot/send?access_token=79349d937973ccc5c7520420bf60a82f11600cab28d1b621c351a2a8841739b1

创建启动脚本

 vim /opt/app/webhook/start.sh
#添加以下代码
#!/bin/bash
source /etc/profile
/opt/app/webhook/prometheus-webhook-dingtalk  --config.file=/opt/app/webhook/config.yml > /opt/app/webhook/webhook-dingtalk.log 2>&1

#赋予执行权限
chmod +x /opt/app/webhook/start.sh
vim /usr/lib/systemd/system/webhook.service 
#添加以下代码
[Unit]
Description=webhook.service
After=network.service
[Service]
Type=simple
ExecStart=/opt/app/webhook/start.sh

KillMode=mixed
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target

启动webhook

systemctl start webhook

四、安装alertmanager

下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz

备用下载地址:alertmanager-0.21.0.linux-amd64.tar.gz

解压并创建软连接

tar zxvf /opt/app/alertmanager-0.21.0.linux-amd64.tar.gz -C /opt/app/
ln -s /opt/app/alertmanager-0.21.0.linux-amd64 /opt/app/alertmanager

修改配置文件

mv /opt/app/alertmanager/alertmanager.yml{,.bak}
vim /opt/app/alertmanager/alertmanager.yml

#添加以下代码(如果webhook与alertmanager 不在一台服务器请修改下方localhost地址为webhook服务器地址)
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'webhook'
receivers:
- name: 'webhook'
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook/send
    send_resolved: true                     

创建启动脚本

vim /opt/app/alertmanager/start.sh

#添加以下代码
#!/bin/bash
source /etc/profile
/opt/app/alertmanager/alertmanager --config.file=/opt/app/alertmanager/alertmanager.yml --cluster.advertise-address=0.0.0.0:9093 > /opt/app/alertmanager/alertmanager.log 2>&1

#赋予执行权限

chmod +x /opt/app/alertmanager/start.sh
vim /usr/lib/systemd/system/alertmanager.service

#添加以下代码
[Unit]
Description=alertmanager.service
After=network.service
[Service]
Type=simple
ExecStart=/opt/app/alertmanager/start.sh

KillMode=mixed
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target

启动alertmanager服务

systemctl start alertmanager.service

五、修改prometheus 配置

编辑 prometheus.yml 配置文件

vim /opt/app/Prometheus/prometheus.yml
# 在11行 添加 alertmanager访问地址(详情查看下方图片)
规则:['ip:端口']
例如:['172.15.35.23:9093']
图片[3]-prometheus之alertmanager 通过webhook 钉钉报警-百分数

重启prometheus

systemctl restart prometheus.service

© 版权声明
THE END
喜欢就支持一下吧
点赞0赞赏
分享
评论 抢沙发