k8s 监控 prometheus

Prometheus 通过各种 exporter 暴露数据,并由 prometheus server 定时去拉数据,然后存储。
Prometheus Operator, 尽管该项目还处于 Beta 阶段,但是它给在 K8S 中搭建基于 Prometheus 的监控提供了很大的便利

创建 namespace:

1
kubectl apply -f namespace.yaml

namespace.yml

1
2
3
4
apiVersion: v1
kind: Namespace
metadata:
name: monitoring

RBAC

rbac.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-k8s
namespace: monitoring

创建 Promethes 的配置文件

其中的内容主要参考 Prometheus 官方提供的示例 Prometheus 官方文档

rbac.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-core
namespace: monitoring
data:
prometheus.yaml: |
global:
scrape_interval: 30s
scrape_timeout: 30s
scrape_configs:
- job_name: 'kubernetes-apiservers'

kubernetes_sd_configs:
- role: endpoints
scheme: https

tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https

# Scrape config for nodes (kubelet).
- job_name: 'kubernetes-nodes'
scheme: https

tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:
- role: node

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics

# Scrape config for Kubelet cAdvisor.
- job_name: 'kubernetes-cadvisor'

scheme: https

tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:
- role: node

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

- job_name: 'kubernetes-service-endpoints'

kubernetes_sd_configs:
- role: endpoints

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name

- job_name: 'kubernetes-services'

metrics_path: /probe
params:
module: [http_2xx]

kubernetes_sd_configs:
- role: service

relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name

- job_name: 'kubernetes-ingresses'

metrics_path: /probe
params:
module: [http_2xx]

kubernetes_sd_configs:
- role: ingress

relabel_configs:
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name

- job_name: 'kubernetes-pods'

kubernetes_sd_configs:
- role: pod

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

部署 Prometheus

prometheus.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus-core
namespace: monitoring
labels:
app: prometheus
component: core
spec:
replicas: 1
template:
metadata:
name: prometheus-main
labels:
app: prometheus
component: core
spec:
serviceAccountName: prometheus-k8s
containers:
- name: prometheus
image: taobeier/prometheus:v2.6.0
args:
- '--storage.tsdb.retention=24h'
- '--storage.tsdb.path=/prometheus'
- '--config.file=/etc/prometheus/prometheus.yaml'
ports:
- name: webui
containerPort: 9090
resources:
requests:
cpu: 500m
memory: 500M
limits:
cpu: 500m
memory: 500M
volumeMounts:
- name: data
mountPath: /prometheus
- name: config-volume
mountPath: /etc/prometheus
volumes:
- name: data
emptyDir: {}
- name: config-volume
configMap:
name: prometheus-core

查看部署情况:

1
kubectl  -n monitoring get all

使用 Service 将 Promethes 的服务暴露出来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
component: core
name: prometheus
namespace: monitoring
spec:
ports:
- protocol: TCP
port: 9090
targetPort: 9090
selector:
app: prometheus
component: core
type: NodePort

安装 Node exporter

node_exporter

当我们直接将 Node exporter 部署在宿主机上时,我们最起码需要保证两点:

1
2
1. Promethes 服务可与它正常通信(Promethes 采用 Pull 的方式采集数据)
2. 需要服务保活,如果 exporter 挂掉了,那自然就取不到数据

所以我们要监控集群中所有的机器时,DaemonSet 是一种很合适的的部署方式,可直接将 Node exporter 部署至集群的每个节点上

daemonSet.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: prometheus-node-exporter
namespace: monitoring
labels:
app: prometheus
component: node-exporter
spec:
template:
metadata:
name: prometheus-node-exporter
labels:
app: prometheus
component: node-exporter
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- image: taobeier/node-exporter:v0.17.0
name: prometheus-node-exporter
ports:
- name: prom-node-exp
containerPort: 9100
hostPort: 9100
hostNetwork: true
hostPID: true

让 Promethes 抓取数据:

这里我们直接使用了添加 annotations 的方式,让 Promethes 自动的通过 Kubernetes SD 发现我们新添加的 exporter (或者说资源)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: prometheus-node-exporter
namespace: monitoring
labels:
app: prometheus
component: node-exporter
spec:
clusterIP: None
ports:
- name: prometheus-node-exporter
port: 9100
protocol: TCP
selector:
app: prometheus
component: node-exporter
type: ClusterIP

Prometheus 支持多种 exporter 暴露各种指标,而且我们还可以使用 Grafana 作为我们监控的展示手段
如果要做 Dashboard 推荐使用 Kubernetes cluster monitoring (via Prometheus)

另外,监控其实涉及的内容很多,包括数据持久化方式。以及是否考虑与集群外的 Prometheus 集群做邦联模式等。
至于应用监控,也可使用它的 SDK 来完成。

参考链接:
prometheus
PromQL
Prometheus Operator