hadoop集群部署出错,求帮助!

nubo 发布于 2012/03/08 10:34
阅读 7K+
收藏 1
来请教下!完全分布式搭建模式的demo,hadoop-0.20.2,系统是AIX5.3和HP-UX rx4640,172.168.1.240(AIX,datanode),172.168.1.243(HP-UX ,namenode)。
<!--core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.168.1.243:10000</value>
</property>
</configuration>

<!--hdfs-site.xml -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/public/interf/hadoop/data/dfs.name.dir</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/public/interf/hadoop/data/dfs.data.dir</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/public/interf/hadoop/data/fs.checkpoint.dir</value>
</property>
</configuration>

<!--mapred-site.xml-->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>172.168.1.243:10005</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/public/interf/hadoop/mapred/mapred.system.dir</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/public/interf/hadoop/mapred/mapred.local.dir</value>
</property>
</configuration>

slaves:172.168.1.240

目前的问题:没root权限,243-240 ssh已经建立,start-dfs.sh启动正常,243:50070页面下 Live Nodes:1,240:50075显示正常,但启动start-mapred.sh出现错误。
jobtracker.log:
2012-03-08 07:49:06,955 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /public/interf/hadoop/mapred/mapred.system.dir/ jobtracker.info could only be replicated to 0 nodes, instead of 1

2012-03-08 07:49:06,956 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
2012-03-08 07:49:06,956 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/public/interf/hadoop/mapred/mapred.system.dir/jobtracker.info" - Aborting...
2012-03-08 07:49:06,957 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://rx4640:10000/public/interf/hadoop/mapred/mapred.system.dir/jobtracker.info failed!
2012-03-08 07:49:06,957 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet!
2012-03-08 07:49:06,967 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager. 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /public/interf/hadoop/mapred/mapred.system.dir/ jobtracker.info could only be replicated to 0 nodes, instead of 1

2012-03-08 07:49:16,982 WARN org.apache.hadoop.mapred.JobTracker: Retrying...
2012-03-08 07:49:17,019 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /public/interf/hadoop/mapred/mapred.system.dir/ jobtracker.info could only be replicated to 0 nodes, instead of 1

namenode.log:
2012-03-08 07:49:06,949 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1
2012-03-08 07:49:06,951 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 10000, call addBlock(/public/interf/hadoop/mapred/mapred.system.dir/jobtracker.info, DFSClient_-739369049) from 172.168.1.243:52154: error: java.io.IOException: File /public/interf/hadoop/mapred/mapred.system.dir/ jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /public/interf/hadoop/mapred/mapred.system.dir/ jobtracker.info could only be replicated to 0 nodes, instead of 1

网速查了些资料,重新format过,文件夹权限也改过,问题依旧,240启动后不报错,但243不断报错,关闭243后会提示连接不上:

2012-03-08 10:21:50,974 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Call to /172.168.1.243:10005 failed on local exception: java.io.IOException: A connection with a remote socket was reset by that socket.
Caused by: java.io.IOException: A connection with a remote socket was reset by that socket.

网上说有可能是防火墙和/etc/hosts文件问题,但start-dfs.sh启动是正常的。
243:50070/

Cluster Summary

6 files and directories, 0 blocks = 6 total. Heap Size is 17 MB / 888.94 MB (1%) 

Configured Capacity : 180 GB
DFS Used : 8 KB
Non DFS Used : 180 GB
DFS Remaining : 67 KB
DFS Used% : 0 %
DFS Remaining% : 0 %
Live Nodes : 1
Dead Nodes : 0

Live Datanodes : 1

Node Last 
Contact
Admin State Configured 
Capacity (GB)
Used 
(GB)
Non DFS 
Used (GB)
Remaining 
(GB)
Used 
(%)
Used 
(%)
Remaining 
(%)
Blocks
ltbss 1 In Service 180 180


加载中
0
nubo
nubo
郁闷,想换个版本试试,还JDK不兼容......
0
nubo
nubo

换了台机器,问题解决,hadoop-0.20.2-test.jar TestDFSIO 和 hadoop-0.20.2-examples.jar sort测试通过,期间又遇到了2个问题。

无法解析主机名:修改/etc/hosts(需要root权限)

Name node is in safe mode:hadoop dfsadmin -safemode leave

终于是弄好了,240机器怀疑是防火墙问题,但我又没权限,郁闷了。

返回顶部
顶部