十二、Kubernetes 进阶之日志与监控篇

1、日志

在Kubernetes或者说Docker中都可以查看容器的日志,但如果直接通过命令行的方式去看的话会很麻烦,要不断的去敲命令,去找容器名称,很不方便操作!

在没有使用容器的项目中,我们也知道有ELK这一套组合拳是专门用来操作日志的,那K8S肯定也是可以用它们来进行日志收集、查看的。

1.1 日志查看

1.1.1 命令方式

1.1.1.1 docker

  • docker ps 查看 container id

    image.png
  • docker logs container_id 查看日志

    image.png

1.1.1.2 Kubernetes命令

  • kubectl 命令查看

    • kubectl logs -f <pod-name> [-c <container-name>]
    image.png
  • Pod日志

    • kubectl describe pod <pod-name>

      kubectl describe除了能够查看pod的日志信息,还能查看比如Node、RC、Service、Namespace等信息。

      注意 :要是想查看指定命名空间之下的,可以-n=namespace

    image.png
  • 组件服务级别

    比如kube-apiserver、kube-schedule、kubelet、kube-proxy、kube-controller-manager等都可以使用journalctl进行查看

    如果K8S中出现问题了,一般就可以通过这些命令来查看是不是组件出异常了

    image.png

1.1.2 ELK

Elasticsearch搜索引擎与名为Logstash的数据收集和日志解析引擎以及名为Kibana的分析和可视化平台一起开发。这三个产品被设计成一个集成解决方案,称为 Elastic Stack(ELK Stack)

日志采集可以有多种方式选择,ELK中的Logstash可以由其他具有相同功能的多个组件替换,如Logpilot、fluentd等,本章节我们采用 Logpilot 进行演示。

1.1.2.1 结构图

  • Pod中的日志通过mount挂载到宿主机中的某个目录
  • Logpilot采集这个目录下日志之后交给Elasticsearch去做搜索引擎的规则
  • 最后通过Kibana做可视化的展示
image.png

1.1.2.2 部署ELK

1.1.2.2.1 部署LogPilot
  • 准备YAML文件

    • log-pilot.yaml

    由于环境被我弄坏了,我这边重新安装了K8S集群,并且采用了最新版的1.18.2

    下面的yaml文件可能会不兼容旧版本,如果不兼容,大家网上搜一份修改下

    apiVersion: apps/v1
    kind: DaemonSet                   # 日志采集需要在所有节点都运行,所以采用DaemonSet资源
    metadata:
      name: log-pilot
      namespace: kube-system
      labels:
        k8s-app: log-pilot
        kubernetes.io/cluster-service: "true"
    spec:
      selector: 
        matchLabels: 
          k8s-app: log-es
          kubernetes.io/cluster-service: "true"
          version: v1.22
      template:
        metadata:
          labels:
            k8s-app: log-es
            kubernetes.io/cluster-service: "true"
            version: v1.22
        spec:
          tolerations:                # 这里的配置表示允许将资源部署到Master节点中
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          containers:
          - name: log-pilot
            image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.5-filebeat
            resources:                # 资源限制
              limits: 
                memory: 200Mi
              requests:
                cpu: 100m
                memory: 200Mi
            env:                  # 定义与Elasticsearch通信的一些环境变量
              - name: "FILEBEAT_OUTPUT"
                value: "elasticsearch"
              - name: "ELASTICSEARCH_HOST"
                value: "elasticsearch-api"
              - name: "ELASTICSEARCH_PORT"
                value: "9200"
              - name: "ELASTICSEARCH_USER"
                value: "elastic"
              - name: "ELASTICSEARCH_PASSWORD"
                value: "elastic"
            volumeMounts:         # 挂载日志目录
            - name: sock
              mountPath: /var/run/docker.sock
            - name: root
              mountPath: /host
              readOnly: true
            - name: varlib
              mountPath: /var/lib/filebeat
            - name: varlog
              mountPath: /var/log/filebeat
            securityContext:
              capabilities:
                add:
                - SYS_ADMIN
          terminationGracePeriodSeconds: 30
          volumes:
          - name: sock
            hostPath:
              path: /var/run/docker.sock
          - name: root
            hostPath:
              path: /
          - name: varlib
            hostPath:
              path: /var/lib/filebeat
              type: DirectoryOrCreate
          - name: varlog
            hostPath:
              path: /var/log/filebeat
              type: DirectoryOrCreate
    
  • 创建资源

    [root@master-kubeadm-k8s log]# kubectl apply -f log-pilot.yaml
    daemonset.extensions/log-pilot created
    
  • 查看资源

    [root@master-kubeadm-k8s log]# kubectl get pods -n kube-system -o wide | grep log
    log-pilot-8f4nv                              1/1     Running            0          2m4s   192.168.221.88    worker02-kubeadm-k8s   <none>           <none>
    log-pilot-h25fc                              1/1     Running            0          2m4s   192.168.16.250    master-kubeadm-k8s     <none>           <none>
        
    [root@master-kubeadm-k8s log]# kubectl get daemonset -n kube-system
    NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
    calico-node   3         3         3       3            3           beta.kubernetes.io/os=linux   41d
    kube-proxy    3         3         3       3            3           <none>                        41d
    log-pilot     2         2         2       2            2           <none>                        26s
    
1.1.2.2.2 部署Elasticsearch
  • 准备YAML文件

    • elasticsearch.yaml

    注意这里的资源要求还是比较高的,自己注意下系统资源是否够用

    apiVersion: v1
    kind: Service             # 这里的service是为外部访问ElasticSearch提供
    metadata:
      name: elasticsearch-api # 这里的名称要与logPilot中的ElasticHost环境变量一致
      namespace: kube-system  
      labels:
        name: elasticsearch
    spec:
      selector:
        app: es
      ports:
      - name: transport
        port: 9200
        protocol: TCP
    ---
    apiVersion: v1
    kind: Service             # # 这里的service是为ElasticSearch集群之间通信提供的
    metadata:
      name: elasticsearch-discovery
      namespace: kube-system
      labels:
        name: elasticsearch
    spec:
      selector:
        app: es
      ports:
      - name: transport
        port: 9300
        protocol: TCP
    ---
    apiVersion: apps/v1
    kind: StatefulSet         # 希望Elasticsearch是有序启动的,所以使用StatefulSet
    metadata:
      name: elasticsearch
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
    spec:
      replicas: 3
      serviceName: "elasticsearch-service"
      selector:
        matchLabels:
          app: es
      template:
        metadata:
          labels:
            app: es
        spec:
          tolerations:            # 同样elasticsearch也部署到Master节点中
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
          initContainers:     # 这里是指在创建容器之前要先进行初始化操作
          - name: init-sysctl
            image: busybox:1.27
            command:
            - sysctl
            - -w
            - vm.max_map_count=262144
            securityContext:
              privileged: true
          containers:
          - name: elasticsearch
            image: registry.cn-hangzhou.aliyuncs.com/log-monitor/elasticsearch:v5.5.1
            ports:
            - containerPort: 9200
              protocol: TCP
            - containerPort: 9300
              protocol: TCP
            securityContext:
              capabilities:
                add:
                  - IPC_LOCK
                  - SYS_RESOURCE
            resources:
              limits:
                memory: 4000Mi
              requests:
                cpu: 100m
                memory: 2000Mi
            env:
              - name: "http.host"
                value: "0.0.0.0"
              - name: "network.host"
                value: "_eth0_"
              - name: "cluster.name"
                value: "docker-cluster"
              - name: "bootstrap.memory_lock"
                value: "false"
              - name: "discovery.zen.ping.unicast.hosts"
                value: "elasticsearch-discovery"
              - name: "discovery.zen.ping.unicast.hosts.resolve_timeout"
                value: "10s"
              - name: "discovery.zen.ping_timeout"
                value: "6s"
              - name: "discovery.zen.minimum_master_nodes"
                value: "2"
              - name: "discovery.zen.fd.ping_interval"
                value: "2s"
              - name: "discovery.zen.no_master_block"
                value: "write"
              - name: "gateway.expected_nodes"
                value: "2"
              - name: "gateway.expected_master_nodes"
                value: "1"
              - name: "transport.tcp.connect_timeout"
                value: "60s"
              - name: "ES_JAVA_OPTS"
                value: "-Xms2g -Xmx2g"
            livenessProbe:                    # 健康检查
              tcpSocket:
                port: transport
              initialDelaySeconds: 20
              periodSeconds: 10
            volumeMounts:
            - name: es-data
              mountPath: /data
          terminationGracePeriodSeconds: 30
          volumes:
          - name: es-data
            hostPath:
              path: /es-data
    
  • 创建资源

    [root@master-kubeadm-k8s log]# kubectl apply -f elasticsearch.yaml
    service/elasticsearch-api created
    service/elasticsearch-discovery created
    statefulset.apps/elasticsearch created
    
  • 查看资源

    # Pod会进行有序的创建
    [root@master-kubeadm-k8s log]# kubectl get pods -n kube-system -o wide | grep elastic
    elasticsearch-0                              1/1     Running           0          4m36s   10.244.221.69    worker02- kubeadm-k8s   <none>           <none>
    elasticsearch-1                              1/1     Running           0          4m33s   10.244.14.4      worker01-kubeadm-k8s   <none>           <none>
    elasticsearch-2                              0/1     PodInitializing   0          101s    10.244.16.194    master-kubeadm-k8s     <none>           <none>
    
    [root@master-kubeadm-k8s log]# kubectl get svc -n kube-system -o wide | grep elastic
    elasticsearch-api         ClusterIP   10.104.144.183   <none>        9200/TCP                 5m2s   app=es
    elasticsearch-discovery   ClusterIP   10.109.137.36    <none>        9300/TCP                 5m2s   app=es
    
    [root@master-kubeadm-k8s log]# kubectl get statefulset -n kube-system
    NAME            READY   AGE
    elasticsearch   3/3     6m14s
    
1.1.2.2.3 部署Kibana

kibana主要是对外提供访问的,所以这边需要配置Service和Ingress
前提:要有Ingress Controller的支持,比如Nginx Controller

  • 准备YAML文件

    • kibana.yaml

      # Deployment
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: kibana
        namespace: kube-system
        labels:
          component: kibana
      spec:
        replicas: 1
        selector:
          matchLabels:
           component: kibana
        template:
          metadata:
            labels:
              component: kibana
          spec:
            containers:
            - name: kibana
              image: registry.cn-hangzhou.aliyuncs.com/log-monitor/kibana:v5.5.1
              env:
              - name: CLUSTER_NAME
                value: docker-cluster
              - name: ELASTICSEARCH_URL   # elasticsearch 地址
                value: http://elasticsearch-api:9200/
              resources:
                limits:
                  cpu: 1000m
                requests:
                  cpu: 100m
              ports:
              - containerPort: 5601
                name: http
      ---
      # Service
      apiVersion: v1
      kind: Service
      metadata:
        name: kibana
        namespace: kube-system
        labels:
          component: kibana
      spec:
        selector:
          component: kibana
        ports:
        - name: http
          port: 80
          targetPort: http
      ---
      # Ingress
      apiVersion: extensions/v1beta1
      kind: Ingress
      metadata:
        name: kibana
        namespace: kube-system
      spec:
        rules:
        - host: log.k8s.sunny.com     # 本地hosts配置域名
          http:
            paths:
            - path: /
              backend:
                serviceName: kibana
                servicePort: 80
      
  • 创建资源

    [root@master-kubeadm-k8s log]# kubectl apply -f kibana.yaml
    deployment.apps/kibana created
    service/kibana created
    ingress.extensions/kibana created
    
  • 查看资源

    [root@master-kubeadm-k8s log]# kubectl get pods -n kube-system | grep kibana
    kibana-8747dff7d-l627g                       1/1     Running   0          2m2s
    
    [root@master-kubeadm-k8s log]# kubectl get svc  -n kube-system | grep kibana
    kibana                    ClusterIP   10.109.177.214   <none>        80/TCP                   2m40s
    
    [root@master-kubeadm-k8s log]# kubectl get ingress  -n kube-system | grep kibana
    kibana   <none>   log.k8s.sunny.com             80      2m43s
    
  • 测试

image.png

2、监控

2.1 Prometheus简介

这里的监控是指监控K8S集群的健康状态,包括节点、K8S组件以及Pod都会监控。

Prometheus 是一个开源的监控和警报系统,它直接从目标主机上运行的代理程序中抓取指标,并将收集的样本集中存储在其服务器上。

2016 年 Prometheus 成为继 Kubernetes 之后,成为 CNCF (Cloud Native Computing Foundation)中的第二个项目成员。

2.1.2 主要功能

  • 多维 数据模型(时序由 metric 名字和 k/v 的 labels 构成)。
  • 灵活的查询语句(PromQL)。
  • 无依赖存储,支持 local 和 remote 不同模型。
  • 采用 http 协议,使用 pull 模式,拉取数据,简单易懂。
  • 监控目标,可以采用服务发现或静态配置的方式。
  • 支持多种统计数据模型,图形化友好。

2.1.3 Prometheus架构

image.png

从这个架构图,也可以看出 Prometheus 的主要模块包含:Server、Exporters、Pushgateway、PromQL、Alertmanager、WebUI 等。

它大致使用逻辑是这样:

  • Prometheus server 定期从静态配置的 target 或者服务发现的 target 拉取数据。

  • 当新拉取的数据大于配置内存缓存区的时候,Prometheus 会将数据持久化到磁盘(如果使用 remote storage 将持久化到云端)。

  • Prometheus 可以配置 rule,然后定时查询数据,当条件触发的时候,会将 alert 推送到配置的 Alertmanager。

  • Alertmanager 收到警告的时候,可以根据配置,聚合、去重、降噪,最后发送警告。

  • 可以使用 API、Prometheus Console 或者 Grafana 查询和聚合数据。

2.1.4 Prometheus知识普及

  • 支持pull、push数据添加方式
  • 支持k8s服务发现
  • 提供查询语言PromQL
  • 时序(time series)是由名字(Metric)以及一组key/value标签定义的数据类型

2.2 数据采集

  • 服务器数据
    • 在每一个节点中部署工具:Node-Exporter
  • K8S组件数据
    • 访问集群中不同端口下的metrics接口即可拿到
    • 比如
      • IP:2379/metrics 拿到ETCD数据
      • IP:6443/metrics 拿到apiServer数据
      • IP:10252/metrics 拿到controller manager数据
      • IP:10251/metrics 拿到scheduler数据
  • 容器数据
    • 在kubelet中有个cAdvisor组件,默认就可以拿到容器的数据

2.2.1 服务器数据采集

Prometheus可以从Kubernetes集群的各个组件中采集数据,比如kubelet中自带的cAdvisor,api-server等,而node-export就是其中一种来源。

Exporter是Prometheus的一类数据采集组件的总称。它负责从目标处搜集数据,并将其转化为Prometheus支持的格式。

服务器的指标采集是通过 Node-Exporter 进行采集,比如服务器CPU、内存、磁盘、I/O等信息

image.png

2.2.1.1 部署NodeExporter

  • 准备YAML文件

    • namespace.yaml

      我们将监控的资源都放到这个命名空间下,方便管理

      apiVersion: v1
      kind: Namespace
      metadata: 
        name: ns-monitor
        labels:
          name: ns-monitor
      
    • node-exporter.yaml

      kind: DaemonSet
      apiVersion: apps/v1
      metadata: 
        labels:
          app: node-exporter
        name: node-exporter
        namespace: ns-monitor
      spec:
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app: node-exporter
        template:
          metadata:
            labels:
              app: node-exporter
          spec:
            containers:
              - name: node-exporter
                image: prom/node-exporter:v0.16.0
                ports:
                  - containerPort: 9100
                    protocol: TCP
                    name: http
            hostNetwork: true
            hostPID: true
            tolerations:
              - effect: NoSchedule
                operator: Exists
      ---
      # 本来NodeExporter是不需要对外提供访问的,但我们这里一步一步来,先保证每一步都正确再往后进行
      kind: Service
      apiVersion: v1
      metadata:
        labels:
          app: node-exporter
        name: node-exporter-service
        namespace: ns-monitor
      spec:
        ports:
          - name: http
            port: 9100
            nodePort: 31672
            protocol: TCP
        type: NodePort
        selector:
          app: node-exporter
      
  • 创建资源

    [root@master-kubeadm-k8s prometheus]# kubectl apply -f node-exporter.yaml
    daemonset.apps/node-exporter created
    service/node-exporter-service created
    
  • 查看资源

    [root@master-kubeadm-k8s prometheus]# kubectl get pods -n ns-monitor
    NAME                  READY   STATUS    RESTARTS   AGE
    node-exporter-dsjbq   1/1     Running   0          2m32s
    node-exporter-mdnrj   1/1     Running   0          2m32s
    node-exporter-sxwxx   1/1     Running   0          2m32s
    
    [root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitor
    NAME                    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
    node-exporter-service   NodePort   10.109.226.6   <none>        9100:31672/TCP   2m46s
    
  • 测试

    image.png

2.2.1 部署Prometheus

  • 准备YAML文件

    • prometheus.yaml

      prometheus的yaml文件很值得学习,很多常用的资源类型这里都用到了

      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: prometheus
      rules:
        - apiGroups: [""] # "" indicates the core API group
          resources:
            - nodes
            - nodes/proxy
            - services
            - endpoints
            - pods
          verbs:
            - get
            - watch
            - list
        - apiGroups:
            - extensions
          resources:
            - ingresses
          verbs:
            - get
            - watch
            - list
        - nonResourceURLs: ["/metrics"]
          verbs:
            - get
      ---
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: prometheus
        namespace: ns-monitor
        labels:
          app: prometheus
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: prometheus
      subjects:
        - kind: ServiceAccount
          name: prometheus
          namespace: ns-monitor
      roleRef:
        kind: ClusterRole
        name: prometheus
        apiGroup: rbac.authorization.k8s.io
      ---
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: prometheus-conf
        namespace: ns-monitor
        labels:
          app: prometheus
      data:
        prometheus.yml: |-
          # my global config
          global:
            scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
            evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
            # scrape_timeout is set to the global default (10s).
      
          # Alertmanager configuration
          alerting:
            alertmanagers:
            - static_configs:
              - targets:
                # - alertmanager:9093
      
          # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
          rule_files:
            # - "first_rules.yml"
            # - "second_rules.yml"
      
          # A scrape configuration containing exactly one endpoint to scrape:
          # Here it's Prometheus itself.
          scrape_configs:
            # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
            - job_name: 'prometheus'
      
              # metrics_path defaults to '/metrics'
              # scheme defaults to 'http'.
      
              static_configs:
                - targets: ['localhost:9090']
            - job_name: 'grafana'
              static_configs:
                - targets:
                    - 'grafana-service.ns-monitor:3000'
      
            - job_name: 'kubernetes-apiservers'
      
              kubernetes_sd_configs:
              - role: endpoints
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                # insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              # Keep only the default/kubernetes service endpoints for the https port. This
              # will add targets for each API server which Kubernetes adds an endpoint to
              # the default/kubernetes service.
              relabel_configs:
              - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
                action: keep
                regex: default;kubernetes;https
      
            # Scrape config for nodes (kubelet).
            #
            # Rather than connecting directly to the node, the scrape is proxied though the
            # Kubernetes apiserver.  This means it will work if Prometheus is running out of
            # cluster, or can't connect to nodes for some other reason (e.g. because of
            # firewalling).
            - job_name: 'kubernetes-nodes'
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              kubernetes_sd_configs:
              - role: node
      
              relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - target_label: __address__
                replacement: kubernetes.default.svc:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/${1}/proxy/metrics
      
            # Scrape config for Kubelet cAdvisor.
            #
            # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
            # (those whose names begin with 'container_') have been removed from the
            # Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to
            # retrieve those metrics.
            #
            # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
            # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
            # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
            # the --cadvisor-port=0 Kubelet flag).
            #
            # This job is not necessary and should be removed in Kubernetes 1.6 and
            # earlier versions, or it will cause the metrics to be scraped twice.
            - job_name: 'kubernetes-cadvisor'
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              kubernetes_sd_configs:
              - role: node
      
              relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - target_label: __address__
                replacement: kubernetes.default.svc:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      
            # Scrape config for service endpoints.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
            # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
            # to set this to `https` & most likely set the `tls_config` of the scrape config.
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: If the metrics are exposed on a different port to the
            # service then set this appropriately.
            - job_name: 'kubernetes-service-endpoints'
      
              kubernetes_sd_configs:
              - role: endpoints
      
              relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
                action: replace
                target_label: __scheme__
                regex: (https?)
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                action: replace
                target_label: __address__
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_service_name]
                action: replace
                target_label: kubernetes_name
      
            # Example scrape config for probing services via the Blackbox Exporter.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/probe`: Only probe services that have a value of `true`
            - job_name: 'kubernetes-services'
      
              metrics_path: /probe
              params:
                module: [http_2xx]
      
              kubernetes_sd_configs:
              - role: service
      
              relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
                action: keep
                regex: true
              - source_labels: [__address__]
                target_label: __param_target
              - target_label: __address__
                replacement: blackbox-exporter.example.com:9115
              - source_labels: [__param_target]
                target_label: instance
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_service_name]
                target_label: kubernetes_name
      
            # Example scrape config for probing ingresses via the Blackbox Exporter.
            #
            # The relabeling allows the actual ingress scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/probe`: Only probe services that have a value of `true`
            - job_name: 'kubernetes-ingresses'
      
              metrics_path: /probe
              params:
                module: [http_2xx]
      
              kubernetes_sd_configs:
                - role: ingress
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
                  action: keep
                  regex: true
                - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
                  regex: (.+);(.+);(.+)
                  replacement: ${1}://${2}${3}
                  target_label: __param_target
                - target_label: __address__
                  replacement: blackbox-exporter.example.com:9115
                - source_labels: [__param_target]
                  target_label: instance
                - action: labelmap
                  regex: __meta_kubernetes_ingress_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_ingress_name]
                  target_label: kubernetes_name
      
            # Example scrape config for pods
            #
            # The relabeling allows the actual pod scrape endpoint to be configured via the
            # following annotations:
            #
            # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
            # pod's declared ports (default is a port-free target if none are declared).
            - job_name: 'kubernetes-pods'
      
              kubernetes_sd_configs:
              - role: pod
      
              relabel_configs:
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
                target_label: __address__
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_pod_name]
                action: replace
                target_label: kubernetes_pod_name
      ---
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: prometheus-rules
        namespace: ns-monitor
        labels:
          app: prometheus
      data:
        cpu-usage.rule: |
          groups:
            - name: NodeCPUUsage
              rules:
                - alert: NodeCPUUsage
                  expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
                  for: 2m
                  labels:
                    severity: "page"
                  annotations:
                    summary: "{{$labels.instance}}: High CPU usage detected"
                    description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
      ---
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: "prometheus-data-pv"
        labels:
          name: prometheus-data-pv
          release: stable
      spec:
        capacity:
          storage: 5Gi
        accessModes:
          - ReadWriteOnce
        persistentVolumeReclaimPolicy: Recycle
        nfs:
          path: /nfs/data/prometheus  # 定义持久化存储目录
          server: 192.168.50.111      # NFS服务器
      ---
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: prometheus-data-pvc
        namespace: ns-monitor
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        selector:
          matchLabels:
            name: prometheus-data-pv
            release: stable
      ---
      kind: Deployment
      apiVersion: apps/v1
      metadata:
        labels:
          app: prometheus
        name: prometheus
        namespace: ns-monitor
      spec:
        replicas: 1
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app: prometheus
        template:
          metadata:
            labels:
              app: prometheus
          spec:
            serviceAccountName: prometheus
            securityContext:
              runAsUser: 0
            containers:
              - name: prometheus
                image: prom/prometheus:latest
                imagePullPolicy: IfNotPresent
                volumeMounts:
                  - mountPath: /prometheus
                    name: prometheus-data-volume
                  - mountPath: /etc/prometheus/prometheus.yml
                    name: prometheus-conf-volume
                    subPath: prometheus.yml
                  - mountPath: /etc/prometheus/rules
                    name: prometheus-rules-volume
                ports:
                  - containerPort: 9090
                    protocol: TCP
            volumes:
              - name: prometheus-data-volume
                persistentVolumeClaim:
                  claimName: prometheus-data-pvc
              - name: prometheus-conf-volume
                configMap:
                  name: prometheus-conf
              - name: prometheus-rules-volume
                configMap:
                  name: prometheus-rules
            tolerations:
              - key: node-role.kubernetes.io/master
                effect: NoSchedule
      ---
      kind: Service
      apiVersion: v1
      metadata:
        annotations:
          prometheus.io/scrape: 'true'
        labels:
          app: prometheus
        name: prometheus-service
        namespace: ns-monitor
      spec:
        ports:
          - port: 9090
            targetPort: 9090
        selector:
          app: prometheus
        type: NodePort
      
    • 创建资源

      [root@master-kubeadm-k8s prometheus]# kubectl apply -f prometheus.yaml
      clusterrole.rbac.authorization.k8s.io/prometheus created
      serviceaccount/prometheus created
      clusterrolebinding.rbac.authorization.k8s.io/prometheus created
      configmap/prometheus-conf created
      configmap/prometheus-rules created
      persistentvolume/prometheus-data-pv created
      persistentvolumeclaim/prometheus-data-pvc created
      deployment.apps/prometheus created
      service/prometheus-service created
      
    • 查看资源

      [root@master-kubeadm-k8s prometheus]# kubectl get pods -n ns-monitor
      NAME                          READY   STATUS              RESTARTS   AGE
      node-exporter-dsjbq           1/1     Running             1          26m
      node-exporter-mdnrj           1/1     Running             1          26m
      node-exporter-sxwxx           1/1     Running             2          26m
      prometheus-5f7cb6d955-mm8d2   1/1     Running             1          28s
      
      [root@master-kubeadm-k8s prometheus]# kubectl get pv -n ns-monitor
      NAME                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS   REASON   AGE
      prometheus-data-pv   5Gi        RWO            Recycle          Bound    ns-monitor/prometheus-data-pvc                           53s
      
      [root@master-kubeadm-k8s prometheus]# kubectl get pvc -n ns-monitor
      NAME                  STATUS   VOLUME               CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      prometheus-data-pvc   Bound    prometheus-data-pv   5Gi        RWO                           60s
      
      [root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitor
      NAME                    TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
      node-exporter-service   NodePort   10.109.226.6     <none>        9100:31672/TCP   27m
      prometheus-service      NodePort   10.101.128.125   <none>        9090:31615/TCP   64s
      
    • 测试

      image.png

2.2.4 部署Grafana监控UI

  • 准备YAML文件

    • grafana.yaml

      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: "grafana-data-pv"
        labels:
          name: grafana-data-pv
          release: stable
      spec:
        capacity:
          storage: 5Gi
        accessModes:
          - ReadWriteOnce
        persistentVolumeReclaimPolicy: Recycle
        nfs:
          path: /nfs/data/grafana
          server: 192.168.50.111
      ---
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: grafana-data-pvc
        namespace: ns-monitor
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        selector:
          matchLabels:
            name: grafana-data-pv
            release: stable
      ---
      kind: Deployment
      apiVersion: apps/v1
      metadata:
        labels:
          app: grafana
        name: grafana
        namespace: ns-monitor
      spec:
        replicas: 1
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app: grafana
        template:
          metadata:
            labels:
              app: grafana
          spec:
            securityContext:
              runAsUser: 0
            containers:
              - name: grafana
                image: grafana/grafana:latest
                imagePullPolicy: IfNotPresent
                env:
                  - name: GF_AUTH_BASIC_ENABLED
                    value: "true"
                  - name: GF_AUTH_ANONYMOUS_ENABLED
                    value: "false"
                readinessProbe:
                  httpGet:
                    path: /login
                    port: 3000
                volumeMounts:
                  - mountPath: /var/lib/grafana
                    name: grafana-data-volume
                ports:
                  - containerPort: 3000
                    protocol: TCP
            volumes:
              - name: grafana-data-volume
                persistentVolumeClaim:
                  claimName: grafana-data-pvc
      ---
      kind: Service
      apiVersion: v1
      metadata:
        labels:
          app: grafana
        name: grafana-service
        namespace: ns-monitor
      spec:
        ports:
          - port: 3000
            targetPort: 3000
        selector:
          app: grafana
        type: NodePort
      
    • grafana-ingress.yaml

      #ingress
      apiVersion: extensions/v1beta1
      kind: Ingress
      metadata:
        name: grafana-ingress
        namespace: ns-monitor
      spec:
        rules:
        - host: monitor.k8s.sunny.com
          http:
            paths:
            - path: /
              backend:
                serviceName: grafana-service
                servicePort: 3000
      
  • 创建资源

    [root@master-kubeadm-k8s prometheus]# kubectl apply -f grafana.yaml
    persistentvolume/grafana-data-pv created
    persistentvolumeclaim/grafana-data-pvc created
    deployment.apps/grafana created
    service/grafana-service created
    
    [root@master-kubeadm-k8s prometheus]# kubectl apply -f grafana-ingress.yaml
    ingress.extensions/grafana-ingress created
    
  • 查看资源

    [root@master-kubeadm-k8s prometheus]# kubectl get deploy -n ns-monitor
    NAME         READY   UP-TO-DATE   AVAILABLE   AGE
    grafana      1/1     1            1           2m52s
    prometheus   1/1     1            1           6m41s
    
    [root@master-kubeadm-k8s prometheus]# kubectl get pv -n ns-monitor
    NAME                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS   REASON   AGE
    grafana-data-pv      5Gi        RWO            Recycle          Bound    ns-monitor/grafana-data-pvc                              3m10s
    prometheus-data-pv   5Gi        RWO            Recycle          Bound    ns-monitor/prometheus-data-pvc                           7m
    
    [root@master-kubeadm-k8s prometheus]# kubectl get pvc -n ns-monitor
    NAME                  STATUS   VOLUME               CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    grafana-data-pvc      Bound    grafana-data-pv      5Gi        RWO                           3m14s
    prometheus-data-pvc   Bound    prometheus-data-pv   5Gi        RWO                           7m4s
    
    [root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitor
    NAME                    TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    grafana-service         NodePort   10.111.192.206   <none>        3000:31828/TCP   3m5s
    node-exporter-service   NodePort   10.109.226.6     <none>        9100:31672/TCP   33m
    prometheus-service      NodePort   10.101.128.125   <none>        9090:31615/TCP   7m4s
    
    [root@master-kubeadm-k8s prometheus]# kubectl get ingress -n ns-monitor
    NAME              CLASS    HOSTS                   ADDRESS   PORTS   AGE
    grafana-ingress   <none>   monitor.k8s.sunny.com             80      2m15s
    
  • 测试

    账号密码都是 admin

    image.png
image.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,711评论 5 468
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,932评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,770评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,799评论 1 271
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,697评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,069评论 1 276
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,535评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,200评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,353评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,290评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,331评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,020评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,610评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,694评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,927评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,330评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,904评论 2 341