Prometheus 通过各种 exporter 暴露数据,并由 prometheus server 定时去拉数据,然后存储。
Prometheus Operator, 尽管该项目还处于 Beta 阶段,但是它给在 K8S 中搭建基于 Prometheus 的监控提供了很大的便利
创建 namespace:
1
| kubectl apply -f namespace.yaml
|
namespace.yml
1 2 3 4
| apiVersion: v1 kind: Namespace metadata: name: monitoring
|
RBAC
rbac.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus-k8s namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: [""] resources: - configmaps verbs: ["get"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-k8s namespace: monitoring
|
创建 Promethes 的配置文件
其中的内容主要参考 Prometheus 官方提供的示例 和 Prometheus 官方文档
rbac.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
| apiVersion: v1 kind: ConfigMap metadata: name: prometheus-core namespace: monitoring data: prometheus.yaml: | global: scrape_interval: 30s scrape_timeout: 30s scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https # Scrape config for nodes (kubelet). - job_name: 'kubernetes-nodes' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics # Scrape config for Kubelet cAdvisor. - job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-services' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name - job_name: 'kubernetes-ingresses' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: ingress relabel_configs: - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] regex: (.+);(.+);(.+) replacement: ${1}://${2}${3} target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_ingress_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_ingress_name] target_label: kubernetes_name - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name
|
部署 Prometheus
prometheus.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
| apiVersion: extensions/v1beta1 kind: Deployment metadata: name: prometheus-core namespace: monitoring labels: app: prometheus component: core spec: replicas: 1 template: metadata: name: prometheus-main labels: app: prometheus component: core spec: serviceAccountName: prometheus-k8s containers: - name: prometheus image: taobeier/prometheus:v2.6.0 args: - '--storage.tsdb.retention=24h' - '--storage.tsdb.path=/prometheus' - '--config.file=/etc/prometheus/prometheus.yaml' ports: - name: webui containerPort: 9090 resources: requests: cpu: 500m memory: 500M limits: cpu: 500m memory: 500M volumeMounts: - name: data mountPath: /prometheus - name: config-volume mountPath: /etc/prometheus volumes: - name: data emptyDir: {} - name: config-volume configMap: name: prometheus-core
|
查看部署情况:
1
| kubectl -n monitoring get all
|
使用 Service 将 Promethes 的服务暴露出来:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| apiVersion: v1 kind: Service metadata: labels: app: prometheus component: core name: prometheus namespace: monitoring spec: ports: - protocol: TCP port: 9090 targetPort: 9090 selector: app: prometheus component: core type: NodePort
|
安装 Node exporter
node_exporter
当我们直接将 Node exporter 部署在宿主机上时,我们最起码需要保证两点:
1 2
| 1. Promethes 服务可与它正常通信(Promethes 采用 Pull 的方式采集数据) 2. 需要服务保活,如果 exporter 挂掉了,那自然就取不到数据
|
所以我们要监控集群中所有的机器时,DaemonSet 是一种很合适的的部署方式,可直接将 Node exporter 部署至集群的每个节点上
daemonSet.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: prometheus-node-exporter namespace: monitoring labels: app: prometheus component: node-exporter spec: template: metadata: name: prometheus-node-exporter labels: app: prometheus component: node-exporter spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - image: taobeier/node-exporter:v0.17.0 name: prometheus-node-exporter ports: - name: prom-node-exp containerPort: 9100 hostPort: 9100 hostNetwork: true hostPID: true
|
让 Promethes 抓取数据:
这里我们直接使用了添加 annotations 的方式,让 Promethes 自动的通过 Kubernetes SD 发现我们新添加的 exporter (或者说资源)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: prometheus-node-exporter namespace: monitoring labels: app: prometheus component: node-exporter spec: clusterIP: None ports: - name: prometheus-node-exporter port: 9100 protocol: TCP selector: app: prometheus component: node-exporter type: ClusterIP
|
Prometheus 支持多种 exporter 暴露各种指标,而且我们还可以使用 Grafana 作为我们监控的展示手段
如果要做 Dashboard 推荐使用 Kubernetes cluster monitoring (via Prometheus) 。
另外,监控其实涉及的内容很多,包括数据持久化方式。以及是否考虑与集群外的 Prometheus 集群做邦联模式等。
至于应用监控,也可使用它的 SDK 来完成。
参考链接:
prometheus
PromQL
Prometheus Operator