1. Prometheus
1.1 Exporter
启动prometheus后,会启动一个3000端口的exporter,使用如下命令测试exporter是否正常
curl -I http://22.22.3.244:3000/metrics
1.2 Targets
在prometheus的scrape_configs中增加job_name='prometheus'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
1.3 Grafana
已经存在dashboard
2. Linux
2.1 Exporter
安装并启动node_exporter,缺省端口为9100,可以使用--web.listen-address修改监听地址及端口
/root/node_exporter-0.18.1.linux-amd64/node_exporter --web.listen-address=:9200
使用如下命令测试exporter是否正常
curl -I http://node01:9200/metrics
2.2 Targets
在prometheus的scrape_configs中增加job_name='nodes'
- job_name: 'nodes'
static_configs:
# - targets: ['mgmt01:9200']
- targets: ['etcd01:9200']
- targets: ['etcd02:9200']
- targets: ['etcd03:9200']
- targets: ['master01:9200']
- targets: ['master02:9200']
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
2.3 Grafana
下载并导入dashboard模板
https://grafana.com/grafana/dashboards/8919
3. Etcd
3.1 Exporter
编辑 /etc/etcd/etcd.conf ,去掉ETCD_METRICS="basic"前面的注释,重新启动etcd
测试etcd exporter工作是否正常,由于etcd是https的,需要携带client证书
curl -I --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key -k https://etcd01:2379/metrics
3.2 Targets
在prometheus的scrape_configs中增加job_name='etcd'
- job_name: 'etcd'
static_configs:
- targets: ["22.22.3.231:2379","22.22.3.232:2379","22.22.3.233:2379"]
scheme: https
tls_config:
- ca_file: /etc/kubernetes/pki/etcd/ca.crt
- key_file: /etc/kubernetes/pki/etcd/healthcheck-client.key
- cert_file: /etc/kubernetes/pki/etcd/healthcheck-client.crt
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
3.3 Grafana
下载并导入dashboard模板
https://grafana.com/api/dashboards/3070/revisions/3/download
4. Mysql
4.1 Exporter
到项目主页
https://github.com/prometheus/mysqld_exporter
下载mysqld_exporter-0.12.1.linux-amd64.tar.gz
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'exporter123' WITH MAX_USER_CONNECTIONS 10;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
解压并启动exporter,可使用--web.listen-address=":9105" 修改监听地址及端口
tar -zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz
export DATA_SOURCE_NAME='exporter:exporter123@(localhost:3306)/'
cd mysqld_exporter-0.12.1.linux-amd64
nohup ./mysqld_exporter --web.listen-address=":9105" 2>&1 &
测试mysqld exporter工作是否正常,mgmt01为exporter所在机器
curl -I http://mgmt01:9105/metrics
4.2 Targets
在prometheus的scrape_configs中增加job_name='mysqld'
- job_name: 'mysqld'
static_configs:
- targets: ["22.22.3.244:9105"]
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
4.3 Grafana
下载并导入dashboard模板
https://grafana.com/api/dashboards/6239/revisions/1/download
5. Ceph
5.1 Exporter
从ceph主机上复制配置及key文件到监控机上,并验证访问正常
scp ceph01:/etc/ceph/ceph.client.admin.keyring /etc/ceph/
scp ceph01:/etc/ceph/ceph.conf /etc/ceph/
ceph -s
cluster:
id: 0cd78d03-771a-4c45-99eb-49b200ae7338
health: HEALTH_WARN
application not enabled on 1 pool(s)
下载exporter镜像digitalocean/ceph_exporter,使用docker的方式运行ceph exporter
docker pull digitalocean/ceph_exporter
docker run -v /etc/ceph:/etc/ceph --net=host -p=9128:9128 -d digitalocean/ceph_exporter
测试ceph exporter工作是否正常,mgmt01为exporter所在机器
curl -I http://mgmt01:9128/metrics
5.2 Targets
在prometheus的scrape_configs中增加job_name='mysqld'
- job_name: 'ceph'
static_configs:
- targets: ["22.22.3.244:9128"]
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
5.3 Grafana
下载并导入dashboard模板
https://grafana.com/api/dashboards/917/revisions/1/download
6. SNMP
6.1 Exporter
下载并启动snmp_exporter
https://github.com/prometheus/snmp_exporter/releases
wget https://github.com/prometheus/snmp_exporter/releases/download/v0.15.0/snmp_exporter-0.15.0.linux-amd64.tar.gz
tar -zxvf snmp_exporter-0.15.0.linux-amd64.tar.gz
cd snmp_exporter-0.15.0.linux-amd64
snmp的认证信息是通过snmp_exporter的配置文件snmp.yml来配置的,在module if_mib下增加如下配置,
if_mib:
version: 2
auth:
community: TEST_RO
启动snmp_exporter
nohup ./snmp_exporter 2>&1 &
测试SNMP exporter是否工作正常,mgmt01为exporter所在机器
curl "http://mgmt01:9116/snmp?target=30.3.1.86" -I
5.2 Targets
在prometheus的scrape_configs中增加job_name='snmp'
- job_name: 'snmp'
static_configs:
- targets:
- 30.3.1.86
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 22.22.3.244:9116
重启prometheus
ps -ef|grep prometheus|grep -v grep|awk -F' ' '{print $2}'|xargs kill -1
5.3 Grafana
下载并导入dashboard模板
https://grafana.com/grafana/dashboards/1124