1.排查应用 Pod 的 DNS 配置
#查找应用
$kubectl get pod |grep flink
deployment-flink-jobmanager-c695cf9d-rgtbh 1/1 Running 0 2d23h
deployment-flink-taskmanager-7c7bbcd4db-5qv9k 1/1 Running 0 2d23h
deployment-flink-taskmanager-7c7bbcd4db-hnpm5 1/1 Running 0 2d23h
#查看应用中的DNS配置
$kubectl exec -it deployment-flink-jobmanager-c695cf9d-rgtbh -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
#查看DNS地址是否正确
$kubectl get svc -nkube-system |grep 10.96.0.10
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 32d
2.排查 DNS 是否启用 Service 及其后端 Pod 是否正常运行
#查找dns的service
$kubectl get svc -nkube-system -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 32d k8s-app=kube-dns
#查看 service 详情
$kubectl describe svc -nkube-system kube-dns
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Families: <none>
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.0.34:53,10.244.0.66:53,10.244.0.8:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.0.34:53,10.244.0.66:53,10.244.0.8:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.0.34:9153,10.244.0.66:9153,10.244.0.8:9153
Session Affinity: None
Events: <none>
#查看endpoint对应的pod状态
$kubectl get pod -nkube-system -owide |grep 10.244.0.34
coredns-659f5bbffd-w5vzw 1/1 Running 0 2d 10.244.0.34 master-0002 <none> <none>
$kubectl get pod -nkube-system -owide |grep 10.244.0.66
coredns-659f5bbffd-qrzl8 1/1 Running 0 2d 10.244.0.66 master-0003 <none> <none>
$kubectl get pod -nkube-system -owide |grep 10.244.0.8
coredns-659f5bbffd-rfr79 1/1 Running 0 2d 10.244.0.8 master-0001 <none> <none>
3.如何在 CoreDNS 配置文件添加访问日志
CoreDNS 配置文件是 Corefile
,通过添加 log
插件可以打印访问日志,而 Corefile
是保存在 ConfigMap
中。
$kubectl edit configmap -nkube-system coredns
修改如下 ConfigMap
,添加 log
:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
ConfigMap 保存后需要等待 1 到 2 分钟生效到 CoreDNS 的 Pod 中。如果配置生效,则 CoreDNS 将在日志看到:
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
[INFO] Reloading complete
在应用 Pod 中测试目标地址的连通性:
#查询目标地址
$kubectl get svc |grep datachannel
datatest ClusterIP 10.96.0.37 <none> 8080/TCP 29d
#测试连通性
$kubectl exec -it deployment-flink-jobmanager-c695cf9d-rgtbh -- curl -vi datatest:8080
* Trying 10.96.0.37:8080...
* Connected to datachannel (10.96.0.37) port 8092 (#0)
> GET / HTTP/1.1
> Host: datachannel:8080
> User-Agent: curl/7.69.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 400
HTTP/1.1 400
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Connection: close
Connection: close
<
Bad Request
This combination of host and port requires TLS.
* Closing connection 0
#DNS的关键日志
kubectl logs -nkube-system coredns-659f5bbffd-rfr79 |grep datatest
[INFO] 10.244.2.252:36293 - 17111 "AAAA IN datatest.default.svc.cluster.local. udp 51 false 512" NOERROR qr,aa,rd 144 0.000184943s
[INFO] 10.244.2.252:36293 - 32461 "A IN datatest.default.svc.cluster.local. udp 51 false 512" NOERROR qr,aa,rd 100 0.000144745s