nutch用local爬网站的问题!

经廷波 发布于 2013/04/22 13:41
阅读 591
收藏 0
2013-04-22 12:58:41,620 WARN  crawl.Crawl - solrUrl is not set, indexing will be skipped...
2013-04-22 12:58:41,929 INFO  crawl.Crawl - crawl started in: data
2013-04-22 12:58:41,929 INFO  crawl.Crawl - rootUrlDir = urls
2013-04-22 12:58:41,929 INFO  crawl.Crawl - threads = 100
2013-04-22 12:58:41,929 INFO  crawl.Crawl - depth = 5
2013-04-22 12:58:41,929 INFO  crawl.Crawl - solrUrl=null
2013-04-22 12:58:41,948 INFO  crawl.Injector - Injector: starting at 2013-04-22 12:58:41
2013-04-22 12:58:41,948 INFO  crawl.Injector - Injector: crawlDb: data/crawldb
2013-04-22 12:58:41,948 INFO  crawl.Injector - Injector: urlDir: urls
2013-04-22 12:58:42,005 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2013-04-22 12:58:42,125 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-22 12:58:42,191 WARN  snappy.LoadSnappy - Snappy native library not loaded
2013-04-22 12:58:42,198 ERROR security.UserGroupInformation - PriviledgedActionException as:root cause:org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist: file:/data/nutch/nutch/release-1.6/runtime/local/urls
2013-04-22 13:08:09,915 WARN  crawl.Crawl - solrUrl is not set, indexing will be skipped...
2013-04-22 13:08:10,165 INFO  crawl.Crawl - crawl started in: data
2013-04-22 13:08:10,174 INFO  crawl.Crawl - rootUrlDir = urls
2013-04-22 13:08:10,194 INFO  crawl.Crawl - threads = 30
2013-04-22 13:08:10,202 INFO  crawl.Crawl - depth = 3
2013-04-22 13:08:10,203 INFO  crawl.Crawl - solrUrl=null
2013-04-22 13:08:10,203 INFO  crawl.Crawl - topN = 50
2013-04-22 13:08:10,282 INFO  crawl.Injector - Injector: starting at 2013-04-22 13:08:10
2013-04-22 13:08:10,282 INFO  crawl.Injector - Injector: crawlDb: data/crawldb
2013-04-22 13:08:10,282 INFO  crawl.Injector - Injector: urlDir: urls
2013-04-22 13:08:10,330 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2013-04-22 13:08:10,457 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-22 13:08:10,542 WARN  snappy.LoadSnappy - Snappy native library not loaded
2013-04-22 13:08:10,551 ERROR security.UserGroupInformation - PriviledgedActionException as:root cause:org.apache.hadoop.mapred.InvalidInputException: Input path does

not exist: file:/data/nutch/nutch/release-1.6/runtime/local/urls




如上已经解决,没创建nutch的url目录文件,由于前面不小心给删除了,创建完文件目录,在里面定义要爬的网站解决!

~
加载中
返回顶部
顶部