- 参考方案kubeadm-ha
- 本文
cluster-info
部分为原创 - 禁止master上发布应用与参考文不同
- 安装前准备
CentOS Linux release 7.4.1708 (Core) 8台,其中3台为master1,master2,master3,node 5台为node1~node5。
-
条件允许的话,准备VIP一个,用户master集群。
Host IP master1 172.25.16.120 master2 172.25.16.121 master3 172.25.16.122 node1 172.25.16.167 node2 172.25.16.168 node3 172.25.16.169 node4 172.25.16.170 node5 172.25.16.171 VIP 172.25.16.228
-
所有机器上安装好docker-ce:17.09.0-ce, kubeadm:1.7.5, kubelet:1.7.5。
- 注意:docker建议版本是1.12,高于1.12版本,请在docker安装后,在每台机器上输入
iptables -P FORWARD ACCEPT
- 注意:docker建议版本是1.12,高于1.12版本,请在docker安装后,在每台机器上输入
-
貌似docker版本高于1.12都需要做以下修改,不然检查kubelet状态是出错:、
$vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf #Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd" Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs" $ systemctl daemon-reload && systemctl restart kubelet
master1,master2,master3上安装kubectl:1.7.5。
设置每台机器翻墙,包括yum和docker的翻墙。yum在/etc/yum.conf里配置
proxy=http://SERVER:PORT
,docker在/usr/lib/systemd/system/docker.service
中的[service]
下添加
Environment="NO_PROXY=localhost,127.0.0.0/8,172.0.0.0/24" Environment="HTTP_PROXY=http://SERVER:PORT/" Environment="HTTPS_PROXY=http://SERVER:PORT/"
- etcd集群
- 在master1上,以docker方式启动etcd
#!/bin/bash
docker stop etcd && docker rm etcd
rm -rf /var/lib/etcd-cluster
mkdir -p /var/lib/etcd-cluster
docker run -d \
--restart always \
-v /etc/ssl/certs:/etc/ssl/certs \
-v /var/lib/etcd-cluster:/var/lib/etcd \
-p 4001:4001 \
-p 2380:2380 \
-p 2379:2379 \
--name etcd \
gcr.io/google_containers/etcd-amd64:3.0.17 \
etcd --name=etcd0 \
--advertise-client-urls=http://172.25.16.120:2379,http://172.25.16.120:4001 \
--listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \
--initial-advertise-peer-urls=http://172.25.16.120:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
--initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \
--initial-cluster-state=new \
--auto-tls \
--peer-auto-tls \
--data-dir=/var/lib/etcd
- 在master2上,以docker方式启动etcd
#!/bin/bash
docker stop etcd && docker rm etcd
rm -rf /var/lib/etcd-cluster
mkdir -p /var/lib/etcd-cluster
docker run -d \
--restart always \
-v /etc/ssl/certs:/etc/ssl/certs \
-v /var/lib/etcd-cluster:/var/lib/etcd \
-p 4001:4001 \
-p 2380:2380 \
-p 2379:2379 \
--name etcd \
gcr.io/google_containers/etcd-amd64:3.0.17 \
etcd --name=etcd1 \
--advertise-client-urls=http://172.25.16.121:2379,http://172.25.16.120:4001 \
--listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \
--initial-advertise-peer-urls=http://172.25.16.121:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
--initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \
--initial-cluster-state=new \
--auto-tls \
--peer-auto-tls \
--data-dir=/var/lib/etcd
- 在master3上,以docker方式启动etcd
#!/bin/bash
docker stop etcd && docker rm etcd
rm -rf /var/lib/etcd-cluster
mkdir -p /var/lib/etcd-cluster
docker run -d \
--restart always \
-v /etc/ssl/certs:/etc/ssl/certs \
-v /var/lib/etcd-cluster:/var/lib/etcd \
-p 4001:4001 \
-p 2380:2380 \
-p 2379:2379 \
--name etcd \
gcr.io/google_containers/etcd-amd64:3.0.17 \
etcd --name=etcd2 \
--advertise-client-urls=http://172.25.16.122:2379,http://172.25.16.122:4001 \
--listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \
--initial-advertise-peer-urls=http://172.25.16.122:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
--initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \
--initial-cluster-state=new \
--auto-tls \
--peer-auto-tls \
--data-dir=/var/lib/etcd
- 在master1,mater2,master3上检测etcd状态
$ docker exec -ti etcd ash
$ etcdctl member list
19dcd68c1a5b8d7d: name=etcd2 peerURLs=http://172.25.16.122:2380 clientURLs=http://172.25.16.122:2379,http://172.25.16.122:4001 isLeader=true
688e88a7e1b4e844: name=etcd0 peerURLs=http://172.25.16.120:2380 clientURLs=http://172.25.16.120:2379,http://172.25.16.120:4001 isLeader=false
692a555d87ac214c: name=etcd1 peerURLs=http://172.25.16.121:2380 clientURLs=http://172.25.16.121:2379,http://172.25.16.121:4001 isLeader=false
$ etcdctl cluster-health
member 19dcd68c1a5b8d7d is healthy: got healthy result from http://172.25.16.122:2379
member 688e88a7e1b4e844 is healthy: got healthy result from http://172.25.16.120:2379
member 692a555d87ac214c is healthy: got healthy result from http://172.25.16.121:2379
cluster is healthy
- 在master1上通过kubeadm安装
- 配置文件内容 kubeadm-init-v1.7.5.yaml
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.7.5
networking:
podSubnet: 10.244.0.0/16
apiServerCertSANs:
- centos-master-1
- centos-master-2
- centos-master-3
- 172.25.16.120
- 172.25.16.121
- 172.25.16.122
- 172.25.16.228
etcd:
endpoints:
- http://172.25.16.120:2379
- http://172.25.16.121:2379
- http://172.25.16.122:2379
- 执行
kubeadm init --config=kubeadm-init-v1.7.5.yaml
- 修改 /etc/kubernetes/manifests/kube-apiserver.yaml
# - --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds
- 重启服务
systemctl restart docker kubelet
- 设置kubectl环境变量KUBECONFIG
$ vi ~/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
$ source ~/.bashrc
- 安装flannel组件
- 建议配置文件从网上取
- kubectl create -f flannel-rbac.yaml
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
- kubectl create -f flannel.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "host-gw"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
template:
metadata:
labels:
tier: node
app: flannel
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: flannel
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.8.0-amd64
command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr"]
securityContext:
privileged: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: install-cni
image: quay.io/coreos/flannel:v0.8.0-amd64
command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- 执行
kubectl get pods --all-namespaces -o wide
,等待所有服务为running即可。 - 至此,单点kubernetes master配置完毕。
- Master HA配置
- 把master1上的/etc/kubernetes/复制到master2、master3
scp -r /etc/kubernetes/ master2:/etc/
scp -r /etc/kubernetes/ master3:/etc/
在master2、master3上重启kubelet服务,并检查kubelet服务状态为active (running)
systemctl daemon-reload && systemctl restart kubelet
在master2和master3上配置kubectl的环境变量KUBECONFIG。
在master2、master3检测节点状态,发现节点已经加进来(需要时间下载镜像,等待一会确认状态为Ready)
-
修改Master配置
- 在master2、master3上修改kube-apiserver.yaml的配置,${HOST_IP}改为本机IP
$ vi /etc/kubernetes/manifests/kube-apiserver.yaml - --advertise-address=${HOST_IP}
- 在master2和master3上的修改kubelet.conf设置,${HOST_IP}改为本机IP
$ vi /etc/kubernetes/kubelet.conf
server: https://${HOST_IP}:6443
- 在master2和master3上修改admin.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/admin.conf
server: https://${HOST_IP}:6443
- 在master2和master3上修改controller-manager.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/controller-manager.conf
server: https://${HOST_IP}:6443
- 在master2和master3上修改scheduler.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/scheduler.conf
server: https://${HOST_IP}:6443
- 在master1、master2、master3上重启所有服务
$ systemctl daemon-reload && systemctl restart docker kubelet
- 在master1,master2,master3上安装keepalived
- 安装
yum install -y keepalived
systemctl enable keepalived && systemctl restart keepalived
- 在master1、master2、master3上设置apiserver监控脚本,当apiserver检测失败的时候关闭keepalived服务,转移虚拟IP地址
$ vi /etc/keepalived/check_apiserver.sh
#!/bin/bash
err=0
for k in $( seq 1 10 )
do
check_code=$(ps -ef|grep kube-apiserver | wc -l)
if [ "$check_code" = "1" ]; then
err=$(expr $err + 1)
sleep 5
continue
else
err=0
break
fi
done
if [ "$err" != "0" ]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
chmod a+x /etc/keepalived/check_apiserver.sh
-
在k8s-master1、k8s-master2、k8s-master3上查看接口名字
$ ip a | grep 192.168.60
在master1 * 、master2 * 、master3上设置keepalived,参数说明如下:
state ${STAT * E}:为MASTER或者 * BACKUP,只能有一个MASTER
interface ${IN* RFACE_NAME}:为本* 需要绑定的接口名字(通过上边的ip a命令查看)
mcast_src_ip ${H * OST_IP}:为本机的IP * 地址
priority ${PRIORITY}* :为优先级,例如102、101、10 * 0,优先级越高越容易选择为MASTER,优先级不能一样
${VIRTUAL_IP}:为VIP地址,这里设置为172.25.16.228。
$ vi /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 2
weight -5
fall 3
rise 2
}
vrrp_instance VI_1 {
state ${STATE}
interface ${INTERFACE_NAME}
mcast_src_ip ${HOST_IP}
virtual_router_id 51
priority ${PRIORITY}
advert_int 2
authentication {
auth_type PASS
auth_pass 4be37dc3b4c90194d1600c483e10ad1d
}
virtual_ipaddress {
${VIRTUAL_IP}
}
track_script {
chk_apiserver
}
}
- 在master1、master2、master3上重启keepalived服务,检测虚拟IP地址是否生效
$ systemctl restart keepalived
$ ping 172.25.16.228
- kube-proxy配置
- 在master1上修改configmap/kube-proxy的server指向keepalived的虚拟IP地址
$ kubectl edit -n kube-system configmap/kube-proxy
server: https://192.168.60.80:8443
- 在master1上删除所有kube-proxy的pod,让proxy重建
- 在master1、master2、master3上重启docker kubelet keepalived服务
systemctl restart docker kubelet keepalived
- 修改cluster-info中的${HOST_IP}修改为VIP的IP。
kubectl edit configmaps cluster-info -n kube-public
server: https://${HOST_IP}:6443
- 至此Master HA完成
- 加入node
- 在master1上查看token
kubeadm token list
- 在node1~node5上执行
kubeadm join --token ${TOKEN} 172.25.16.228:6443
- 在maseter1上查看node
kubectl get node
,状态为Ready则为ok。
- 禁止master2,master3上发布应用
kubectl taint nodes master-2 node-role.kubernetes.io/master=true:NoSchedule
kubectl taint nodes master-3 node-role.kubernetes.io/master=true:NoSchedule