一、hbase的HRegionServer节点启动失败
2015-10-23 17:24:33,147 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/SlaveServer,60020,1413095376898 already deleted, retry=false
2015-10-23 17:24:33,147 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/SlaveServer,60020,1413095376898
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1298)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1012)
at java.lang.Thread.run(Thread.java:662)
2015-10-23 17:24:33,158 INFO [regionserver60020] zookeeper.ZooKeeper: Session: 0x249020a2cfd0014 closed
2015-10-23 17:24:33,158 INFO [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-10-23 17:24:33,158 INFO [regionserver60020] regionserver.HRegionServer: stopping server null; zookeeper connection closed.
2015-10-23 17:24:33,158 INFO [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting
2015-10-23 17:24:33,158 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2422)
2015-10-23 17:24:33,160 INFO [Thread-9] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@8d5aad
2015-10-23 17:24:33,160 INFO [Thread-9] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2015-10-23 17:24:33,160 INFO [Thread-9] regionserver.ShutdownHook: Shutdown hook finished.
一般这种情况,是因为集群中节点时间相差太多,时间没有同步导致的,解决方案:
# yum -y install ntpdate && chkconfig ntpdate off
# crontab -e #add sync time cron scripts
*/2 * * * * ntpdate asia.pool.ntp.org
如果遇到是其他原因的同学,下面回答分享一下!
二、主机名配置问题
failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
根据查看提示链接http://wiki.apache.org/hadoop/ConnectionRefused排查错误,将/etc/hosts中的127.0.0.1 hbase1删除(从节点对应也删除)后程序运行正常。接着尝试运行HBase,没有出现问题!创建表也正常了!
一开始知道得删除hosts文件中127.0.1.1,但是没想到127.0.0.1 主机名也得删除。
还有一种情况也会导致集群启动问题,那就是主机名不规范,作为hadoop集群中的主机名,是不支持_和-的,比如hbase_host1这是不支持的!