一 . 背景介绍
最近,Xcar公司HBASE家族几个任劳任怨的RS管事在某个神秘的夜间竟然不幸被未知原因Murder了,收到警报预警后,系统部李督查立刻找到了研发中心的王警官与步闲周,要求迅速展开地毯式侦查,尽快抓捕未知原因,以保障Xcar公司的切身利益,维护HBASE家族的尊严。
二. 展开搜捕行动
接到任务后,王警官与步闲迅速展开了搜捕行动,并以迅雷不及掩耳之势直抵未知原因的老巢,就在未知原因正欲逃走之时,可谁知步闲却拿小拳拳捶了她胸口,哈哈哈。
就在未知原因被拉至午门问斩之时,谁可想到,十二月的天空尽然下起了瓢泼大雨,她自身也大喊冤枉,并愿倾诉心肠告知原委,并应允说可保被Murder的几个RS管事起死回生。难道她就是妙手回春的老中医?
三. 逼出口供
未知原因拿出了她即将焚毁的Log日记本交于吾等,吾等拿来一看,才得知原来未知原因下此毒手也是事出有因,日记本上详细了记载了事情发生的原原委委,如下:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/replication/rs/yq-hadoop184137,60020,1519700714129
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:309)
at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)
at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:172)
at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2162)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1092)
at java.lang.Thread.run(Thread.java:745)
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2018-02-28 03:52:57,351 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server yq-hadoop184137,60020,1519700714129: regionserver:60020-0x15e93108337127c, quorum=yq-hadoop19:2181, baseZNode=/hbase regionserver:60020-0x15e93108337127c received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:700)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:611)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
四. 浮出水面
日记本详细描述了HBASE家族几个任劳任怨的RS管事因何故而死于非命,步闲揭示如下:
很久以前,Zookeeper家族曾将家族保命Session寄托于HBASE家族几位管事手中,并告知Session在则管事生,Session死则管事亡。而未知原因正是基于此点使用某种手段致Session过期死亡。Zookeeper家族得知,杀死了HBASE家族几位管事 。
( ps : zookeeper session 过期,hbase 连接超时断开,RS挂掉 )
当Zookeeper家族受到倾盆大雨的洗礼,得知错杀好人,后悔不已。幸好,Zookeeper家族还有起死回生稻草,但需用未知原因做药引方可生效。未知原因懊悔不已,自杀已谢罪,药引天成。
五. 起死回生
RS管事服下稻草汁,起死回生。
( ps : 这种情况重启RS即可 )
六. 庖解疑问
- 未知原因究竟通过何种原因致保命Session过期而亡?
- 保命Session死的太简单,如何增强其防御力?
- 保命Session究竟是什么东西?
基于上述三大疑点,步闲继续召开调查。