2022-03-20 day 107 kubernetes prometheus

强制删除monitoring命名空间

kubectl get namespace  monitoring -o json \
            | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
            | kubectl replace --raw /api/v1/namespaces/monitoring/finalize -f -

报错1

[root@master ~]# kubectl get nodes
The connection to the server 10.0.0.10:6443 was refused - did you specify the right host or port?
解决办法

[root@master ~]# kubectl get nodes
The connection to the server 10.0.0.10:6443 was refused - did you specify the right host or port?
[root@master ~]# kubeadm init
I0320 21:32:41.509087    5017 version.go:252] remote version is much newer: v1.23.5; falling back to: stable-1.19
W0320 21:32:42.543039    5017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.16
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10259]: Port 10259 is in use
    [ERROR Port-10257]: Port 10257 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@master ~]# kubectl get node
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   9d    v1.19.3
node1    Ready    node     9d    v1.19.3
node2    Ready    node     9d    v1.19.3

image.png

prometheus安装部署
1.comfigmap 存储配置文件
2.RBAC 权限
3.pv/pvc 存储位置
4.deployment 安装
5.service 访问它
6.ingress 网页打开它

首先是创建一个prom的命名空间
01prom-na.yaml

[root@master ~/k8s_yml/prom/prom]# cat 01prom-na.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: prom

1.先配置configmap资源
02prom-cm.yml

[root@master ~/k8s_yml/prom/prom]# cat 02prom-cm.yml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: prom
data:
  prometheus.yml: |                             
    global:                         #全局配置
      scrape_interval: 15s  #抓取数据间隔
      scrape_timeout: 15s  #抓取超时时间

    alerting:
      alertmanagers:
      - static_configs:     
        - targets: ["alertmanager:9093"]  
    rule_files:
    - /etc/prometheus/rules.yml


    scrape_configs:        #拉取配置
    - job_name: 'prometheus'   #任务名称
      static_configs:             #静态配置,写死的
      - targets: ['localhost:9090']  #抓取数据节点的IP端口,prometheus自己运行各项情况
    - job_name: 'coredns'
      static_configs:
      - targets: ['10.2.0.4:9153','10.2.0.5:9153']
    - job_name: 'mysql'
      static_configs:
      - targets: ['mysql-svc.default:9104']

    - job_name: 'nodes'
      kubernetes_sd_configs:                #k8s自动服务发现
      - role: node                                  #自动发现类型为node
      relabel_configs:
      - action: replace     
        source_labels: ['__address__']      #需要修改的源标签 
        regex: '(.*):10250'                         #正则表达式(10.0.0.10):10250
        replacement: '${1}:9100'            #替换后的内容10.0.0.10:9100
        target_label: __address__           #将替换后的内容覆盖原来的标签
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)


    - job_name: 'kubelet'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)


    - job_name: 'cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)   
        replacement: $1
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.*)
        replacement: /metrics/cadvisor
        target_label: __metrics_path__


    - job_name: 'apiservers-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        regex: apiserver
        action: keep

    - job_name: 'pods'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name


  rules.yml: |
    groups:
    - name: test-node-mem
      rules:
      - alert: NodeMemoryUsage
        expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 20
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Memory usage detected"
          description: "{{$labels.instance}}: Memory usage is above 20% (current value is: {{ $value }}"
 
image.png

创建RBAC,因为Prometheus有可能要跨命名空间
03prom-rbac.yml

apiVersion: v1
kind: ServiceAccount    #给pod使用
metadata:
  name: prometheus
  namespace: prom
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole #给跨命名空间使用
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: prom

04创建pv/pvc
04prom-pv-pvc.yml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-local
  labels:
    app: prometheus
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node2
  persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
  namespace: prom
spec:
  selector:
    matchLabels:
      app: prometheus
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage

创建deployment
05prom-dp.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: prom
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus #引用RBAC创建的serviceAccount
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: prometheus-data
      - name: config-volume  
        configMap:
          name: prometheus-config  
      initContainers:        #初始化容器
      - name: fix-permissions
        image: busybox
        volumeMounts:
        - name: data
          mountPath: /prometheus
        command: [chown, -R, "nobody:nobody", /prometheus]  
      containers:
      - name: prometheus
        image: prom/prometheus:v2.24.1
        resources:
          requests:
            cpu: 100m
            memory: 512Mi
          limits:
            cpu: 200m
            memory: 512Mi
        ports:
        - name: http
          containerPort: 9090            
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"  #指定配置文件
        - "--storage.tsdb.path=/prometheus" #tsdb数据库保留路径
        - "--storage.tsdb.retention.time=24h" #数据保留时间,默认15天
        - "--web.enable-admin-api"  #控制对admin http api的访问
        - "--web.enable-lifecycle" #支持热更新,直接执行localhost:9090/-/reload立即生效
        volumeMounts:
        - name: config-volume
          mountPath: "/etc/prometheus"
        - name: data
          mountPath: "/prometheus"

配置service资源
06prom-svc.yml

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: prom
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - name: web
      port: 9090
      targetPort: http

配置ingress资源
07prom-ingress.yml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus
  namespace: prom
  labels:
    app: prometheus
spec:
  rules:
  - host: prom.k8s.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: prometheus
            port:
              number: 9090

注意要安装ingress


image.png

电脑端配置host解析


image.png

依次按顺序执行yml


image.png

记得再node1和node2创建/prometheus目录


image.png

image.png

1.查看coredns的指标端口
kubectl -n kube-system describe cm coredns

2.查看coredns的POD地址
kubectl -n kube-system get pod -o wide

3.访问coredns pod的指标接口
curl 10.2.0.2:9153/metrics

4.编辑prometheus配置文件,添加静态配置

  • job_name: 'coredns'
    static_configs:
    • targets: ['10.2.0.2:9153', '10.2.1.61:9153']

5.更新cm
kubectl apply -f 02prom-cm.yml

6.prometheus热更新配置
kubectl -n pron get pod -o wide
curl -X POST "http://10.2.4.74:9090/-/reload"

7.在web页面观察是否出现了coredns的数据
status --> targets --> coredns

新建exporter

cat >prom-node-exporter.yaml<<'EOF'
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: prom
  labels:
    app: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      containers:
      - name: node-exporter
        image: prom/node-exporter:v1.1.1
        args:
        - --web.listen-address=$(HOSTIP):9100
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)           #忽略监控的磁盘信息
        - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$        #忽略监控的文件系统
        ports:
        - containerPort: 9100
        env:
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        resources:
          requests:
            cpu: 150m
            memory: 180Mi
          limits:
            cpu: 150m
            memory: 180Mi
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        volumeMounts:
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: root
          mountPath: /host/root
          mountPropagation: HostToContainer
          readOnly: true
      tolerations:
      - operator: "Exists"
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: dev
        hostPath:
          path: /dev
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /
EOF

新建granfa

cat >grafana.yml<<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: prom
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      volumes:
      - name: storage
        hostPath:
          path: /data/k8s/grafana/
      nodeSelector:
        kubernetes.io/hostname: node2
      securityContext:
        runAsUser: 0
      containers:
      - name: grafana
        image: grafana/grafana:7.4.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: grafana
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          limits:
            cpu: 150m
            memory: 512Mi
          requests:
            cpu: 150m
            memory: 512Mi
        volumeMounts:
        - mountPath: /var/lib/grafana
          name: storage
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: prom
spec:
  ports:
    - port: 3000
  selector:
    app: grafana
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: prom
  labels:
    app: grafana
spec:
  rules:
  - host: grafana.k8s.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: grafana
            port:
              number: 3000
EOF

新建prom.sh

[root@master ~/k8s_yml/prom/prom]# cat prom.sh 
#! /bin/bash
curl -X POST "http://10.1.9.237:9090/-/reload"

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,711评论 5 468
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,932评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,770评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,799评论 1 271
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,697评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,069评论 1 276
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,535评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,200评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,353评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,290评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,331评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,020评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,610评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,694评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,927评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,330评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,904评论 2 341

推荐阅读更多精彩内容