org.apache.nutch.protocol.http.api.HttpRobotRulesParser: Couldn't get robots.txt The server xxx.cn failed to respond

长河青川 发布于 2014/09/02 13:06
阅读 504
收藏 0
在hadoop上运行nutch爬虫的时候,老师报一个info信息  ,curl可以获得网页,timeout设置的20000,没招了,大家帮看看

2014-09-02 09:31:07,964 INFO org.apache.nutch.protocol.http.api.HttpRobotRulesParser: Couldn't get robots.txt for http://xxx.cn/1-2-4-100-26-6-15-1-0-3-1.html: org.apache.commons.httpclient.NoHttpResponseException: The server xxx.cn failed to respond 2014-09-02 09:31:08,693 INFO org.apache.nutch.fetcher.Fetcher: -activeThreads=80, spinWaiting=79, fetchQueues.totalSize=4000 2014-09-02 09:31:08,701 INFO org.apache.nutch.fetcher.Fetcher: fetching http://xxx.cn/1-2-4-100-16-9-24-0-0-3-1.html (queue crawl delay=0ms) 2014-09-02 09:31:08,739 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:08,739 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:08,777 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:08,777 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:08,815 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:08,815 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:08,852 INFO org.apache.nutch.protocol.http.api.HttpRobotRulesParser: Couldn't get robots.txt for http://xxx.cn/1-2-4-100-16-9-24-0-0-3-1.html: org.apache.commons.httpclient.NoHttpResponseException: The server xxx.cn failed to respond 2014-09-02 09:31:08,997 INFO org.apache.nutch.fetcher.Fetcher: fetching http://xxx.cn/1-2-4-100-22-5-15-3-0-1-1.html (queue crawl delay=0ms) 2014-09-02 09:31:09,035 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:09,035 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:09,073 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:09,073 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:09,111 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:09,111 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:09,148 INFO org.apache.nutch.protocol.http.api.HttpRobotRulesParser: Couldn't get robots.txt for http://xxx.cn/1-2-4-100-22-5-15-3-0-1-1.html: org.apache.commons.httpclient.NoHttpResponseException: The server xxx.cn failed to respond 2014-09-02 09:31:09,694 INFO org.apache.nutch.fetcher.Fetcher: -activeThreads=80, spinWaiting=79, fetchQueues.totalSize=4000 2014-09-02 09:31:09,911 INFO org.apache.nutch.fetcher.Fetcher: fetching http://xxx.cn/1-2-4-100-21-15-1-9-0-1-1.html (queue crawl delay=0ms) 2014-09-02 09:31:09,949 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:09,949 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:09,986 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:09,986 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:10,024 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:10,024 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:10,061 INFO org.apache.nutch.protocol.http.api.HttpRobotRulesParser: Couldn't get robots.txt for http://xxx.cn/1-2-4-100-21-15-1-9-0-1-1.html: org.apache.commons.httpclient.NoHttpResponseException: The server xxx.cn failed to respond 2014-09-02 09:31:10,193 INFO org.apache.nutch.fetcher.Fetcher: fetching http://xxx.cn/1-2-4-100-21-23-17-1-0-3-1.html (queue crawl delay=0ms) 2014-09-02 09:31:10,231 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:10,231 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:10,268 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.cn failed to respond 2014-09-02 09:31:10,268 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2014-09-02 09:31:10,306 INFO org.apache.commons.httpclient.HttpMethodDirector
加载中
返回顶部
顶部