1
回答
Hadoop namenode无容灾备份的情况下,内存溢出后无法启动
利用AWS快速构建适用于生产的无服务器应用程序,免费试用12个月>>>   

问个Hadoop NameNode的相关问题,问题是这样的:我们的生产环境hadoop 2.5只启动了namenode,没启动secondnamenode和做HA,昨天因为master节点内存溢出,挂掉了,然后再次启动的时候就报了这个错误:

2017-03-29 08:07:10,392 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: There appears to be a gap in the edit log.  We expected txid 1, but got txid 162204250.
	at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476)
2017-03-29 08:07:10,498 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log.  We expected txid 1, but got txid 162204250.
	at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476)

这个错误其实百度解决方案都是一致的:

原因:namenode元数据被破坏,需要修复
解决:恢复一下namenode
hadoop namenode -recover

但是我尝试了下依然启动不起来,同样的错误。我观察了下namenode的具体内容和实际的集群操作,现在实际问题是我们的1月8号这天我们对namenode做了格式化(之前有数据):

fsimage的版本是:fsimage_0000000000000000000(1月8号)

而editlog最低版本hadoop自动生成的不是1,而是

edits_0000000000162204250-0000000000164244310(第一个版本)

集群并没有丢失过namenode数据,以下是namenode文件列表

-rw-rw-r-- 1 dc dc 449870376 Jan 11 09:31 edits_0000000000162204250-0000000000164244310
-rw-rw-r-- 1 dc dc 455993242 Jan 11 13:36 edits_0000000000164244311-0000000000166310268
-rw-rw-r-- 1 dc dc 431516862 Jan 14 23:06 edits_0000000000166310269-0000000000168353141
-rw-rw-r-- 1 dc dc 442553236 Jan 15 05:11 edits_0000000000168353142-0000000000170362625
-rw-rw-r-- 1 dc dc 442716989 Jan 15 09:01 edits_0000000000170362626-0000000000172368984
-rw-rw-r-- 1 dc dc 446921507 Jan 15 12:31 edits_0000000000172368985-0000000000174393760
-rw-rw-r-- 1 dc dc 443368815 Jan 15 16:21 edits_0000000000174393761-0000000000176399727
-rw-rw-r-- 1 dc dc 450792325 Jan 15 20:26 edits_0000000000176399728-0000000000178440654
-rw-rw-r-- 1 dc dc 456930167 Jan 16 02:16 edits_0000000000178440655-0000000000180511581
-rw-rw-r-- 1 dc dc 453644963 Jan 16 06:46 edits_0000000000180511582-0000000000182575682
-rw-rw-r-- 1 dc dc 448695514 Jan 16 10:51 edits_0000000000182575683-0000000000184614234
-rw-rw-r-- 1 dc dc 440278940 Jan 16 14:46 edits_0000000000184614235-0000000000186615837
-rw-rw-r-- 1 dc dc 447295881 Jan 16 18:51 edits_0000000000186615838-0000000000188645150
-rw-rw-r-- 1 dc dc 442591426 Jan 16 23:01 edits_0000000000188645151-0000000000190650764
-rw-rw-r-- 1 dc dc 450636722 Jan 17 05:26 edits_0000000000190650765-0000000000192697254
-rw-rw-r-- 1 dc dc 455166691 Jan 17 09:26 edits_0000000000192697255-0000000000194762434
-rw-rw-r-- 1 dc dc 446226866 Jan 17 13:11 edits_0000000000194762435-0000000000196787110
-rw-rw-r-- 1 dc dc 455379286 Jan 17 17:11 edits_0000000000196787111-0000000000198851160
-rw-rw-r-- 1 dc dc 441326239 Jan 17 21:16 edits_0000000000198851161-0000000000200853938
-rw-rw-r-- 1 dc dc 441960567 Jan 18 03:36 edits_0000000000200853939-0000000000202863493
-rw-rw-r-- 1 dc dc 445629865 Jan 18 07:26 edits_0000000000202863494-0000000000204883435
-rw-rw-r-- 1 dc dc 455087158 Jan 18 11:26 edits_0000000000204883436-0000000000206947149
-rw-rw-r-- 1 dc dc 440255432 Jan 18 15:31 edits_0000000000206947150-0000000000208949969
-rw-rw-r-- 1 dc dc 438984246 Jan 18 20:06 edits_0000000000208949970-0000000000210966178
-rw-rw-r-- 1 dc dc 444680471 Jan 19 00:11 edits_0000000000210966179-0000000000212989323
-rw-rw-r-- 1 dc dc 451446983 Jan 19 06:06 edits_0000000000212989324-0000000000215048745
-rw-rw-r-- 1 dc dc 453881743 Jan 19 10:06 edits_0000000000215048746-0000000000217112648
-rw-rw-r-- 1 dc dc 438766537 Jan 19 14:06 edits_0000000000217112649-0000000000219113769
-rw-rw-r-- 1 dc dc 439776222 Jan 19 18:06 edits_0000000000219113770-0000000000221117542
-rw-rw-r-- 1 dc dc 443975771 Jan 19 22:11 edits_0000000000221117543-0000000000223140279
-rw-rw-r-- 1 dc dc 442748245 Jan 20 04:01 edits_0000000000223140280-0000000000225161207
-rw-rw-r-- 1 dc dc 438830540 Jan 20 08:01 edits_0000000000225161208-0000000000227165924
-rw-rw-r-- 1 dc dc 441143096 Jan 20 11:56 edits_0000000000227165925-0000000000229183182
-rw-rw-r-- 1 dc dc 440125627 Jan 20 15:51 edits_0000000000229183183-0000000000231193746
-rw-rw-r-- 1 dc dc 450676256 Jan 20 19:56 edits_0000000000231193747-0000000000233253866
-rw-rw-r-- 1 dc dc 445533425 Jan 21 00:01 edits_0000000000233253867-0000000000235290088
-rw-rw-r-- 1 dc dc 447986915 Jan 21 05:46 edits_0000000000235290089-0000000000237327710
-rw-rw-r-- 1 dc dc 447447142 Jan 21 09:26 edits_0000000000237327711-0000000000239337475
-rw-rw-r-- 1 dc dc 446661949 Jan 21 13:11 edits_0000000000239337476-0000000000241346072
-rw-rw-r-- 1 dc dc 444816287 Jan 21 17:01 edits_0000000000241346073-0000000000243346264
-rw-rw-r-- 1 dc dc 450740109 Jan 21 20:56 edits_0000000000243346265-0000000000245375757
-rw-rw-r-- 1 dc dc 447906463 Jan 22 02:16 edits_0000000000245375758-0000000000247394334
-rw-rw-r-- 1 dc dc 450638656 Jan 22 06:36 edits_0000000000247394335-0000000000249436277
-rw-rw-r-- 1 dc dc 443798521 Jan 22 10:46 edits_0000000000249436278-0000000000251441784
-rw-rw-r-- 1 dc dc 442892253 Jan 22 15:01 edits_0000000000251441785-0000000000253450117
-rw-rw-r-- 1 dc dc 441307529 Jan 22 19:26 edits_0000000000253450118-0000000000255450204
-rw-rw-r-- 1 dc dc 441302807 Jan 22 23:51 edits_0000000000255450205-0000000000257450393
-rw-rw-r-- 1 dc dc 439592227 Jan 23 06:11 edits_0000000000257450394-0000000000259453841
-rw-rw-r-- 1 dc dc 445046140 Jan 23 10:36 edits_0000000000259453842-0000000000261472460
-rw-rw-r-- 1 dc dc 449611708 Jan 23 14:51 edits_0000000000261472461-0000000000263503102
-rw-rw-r-- 1 dc dc 447040133 Jan 23 19:06 edits_0000000000263503103-0000000000265515255
-rw-rw-r-- 1 dc dc 447526878 Jan 23 23:11 edits_0000000000265515256-0000000000267526571
-rw-rw-r-- 1 dc dc 454693067 Jan 24 05:26 edits_0000000000267526572-0000000000269580965
-rw-rw-r-- 1 dc dc 452209474 Jan 24 09:31 edits_0000000000269580966-0000000000271612731
-rw-rw-r-- 1 dc dc 454565779 Jan 24 13:41 edits_0000000000271612732-0000000000273653835
-rw-rw-r-- 1 dc dc 447891347 Jan 24 17:41 edits_0000000000273653836-0000000000275667549
-rw-rw-r-- 1 dc dc 447086213 Jan 24 21:36 edits_0000000000275667550-0000000000277682114
-rw-rw-r-- 1 dc dc 443852209 Jan 25 03:11 edits_0000000000277682115-0000000000279694079
-rw-rw-r-- 1 dc dc 444064231 Jan 25 06:46 edits_0000000000279694080-0000000000281695147
-rw-rw-r-- 1 dc dc 450088440 Jan 25 10:21 edits_0000000000281695148-0000000000283714275
-rw-rw-r-- 1 dc dc 442611806 Jan 28 02:16 edits_0000000000283714276-0000000000285725024
-rw-rw-r-- 1 dc dc 428961410 Feb  2 02:36 edits_0000000000285725025-0000000000287728013
-rw-rw-r-- 1 dc dc 427145299 Feb  8 02:11 edits_0000000000287728014-0000000000289729163
-rw-rw-r-- 1 dc dc 430521386 Feb 13 02:26 edits_0000000000289729164-0000000000291740727
-rw-rw-r-- 1 dc dc 441607787 Feb 19 02:11 edits_0000000000291740728-0000000000293813443
-rw-rw-r-- 1 dc dc 432544029 Feb 24 02:26 edits_0000000000293813444-0000000000295836731
-rw-rw-r-- 1 dc dc 439154130 Mar  2 02:11 edits_0000000000295836732-0000000000297894267
-rw-rw-r-- 1 dc dc 434329480 Mar  7 02:26 edits_0000000000297894268-0000000000299933463
-rw-rw-r-- 1 dc dc 440107121 Mar 13 02:11 edits_0000000000299933464-0000000000302007247
-rw-rw-r-- 1 dc dc 425999256 Mar 18 02:21 edits_0000000000302007248-0000000000304010365
-rw-rw-r-- 1 dc dc 425661857 Mar 23 02:41 edits_0000000000304010366-0000000000306014918
-rw-rw-r-- 1 dc dc 441503826 Mar 29 02:16 edits_0000000000306014919-0000000000308087977
-rw-rw-r-- 1 dc dc  20971520 Mar 29 07:33 edits_0000000000308087978-0000000000308184833
-rw-rw-r-- 1 dc dc       349 Jan  8 01:54 fsimage_0000000000000000000
-rw-rw-r-- 1 dc dc        62 Jan  8 01:54 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 dc dc         2 Mar 30 15:23 seen_txid
-rw-rw-r-- 1 dc dc         2 Mar 29 11:24 seen_txid.bak
-rw-rw-r-- 1 dc dc       204 Jan  8 01:54 VERSION

现在这个错误:We expected txid 1, but got txid 162204250

实际是namenode的fsimage记录的是txid是从1开始的,但是实际的editlog开始txid并不是1,而是162204250,这就导致了namenode对不上版本,recover也是从版本1开始,所以恢复是无效的,问问OSC的各位大牛,这种情况下该怎么处理不丢数据,重要的是不丢数据

 

举报
苍松
发帖于9个月前 1回/224阅
你们是丢了8-11号的 editlog。从8号你们 format开始就没有合并 fimage。找下journal的目录下有没有这些editlog吧,再就是检查一下editlog后面的序号是不是首尾相连的。没有那你们就丢了这些文件了。不知道 editlog是累积还是增量的,我得问下同事。如果是增量的,那可以考虑 备份所以节点的数据。之后把seen txid 改成1622这个值试一下,不知道行不行。
--- 共有 1 条评论 ---
苍松修改过值了,会报找不到1的错误的,最主要的应该是合并的fsimage丢了一个,那个1-16xxx的丢了,这些序号中有关键操作,有备份的话倒是容易恢复,没备份的话丢了就没办法了,只能重新格式化导数据了,我已经重新导数据了,感谢回复。 8个月前 回复
顶部