webmagic抓取数据仅抓取一个页面就停止了???

黄成祥 发布于 2016/09/26 15:20
阅读 547
收藏 0

@黄亿华 ,你好,我想请教一个问题。我在使用您开源框架抓取人人车买车数据时,model中的targetUrl与HelpURL是这么定义的:

@TargetUrl(value = {"https://www.renrenche.com/[a-z]+/car/[0-9a-z]{16}\\?plog_id=[0-9a-z]{32}"})
@HelpUrl(value = {"https://www.renrenche.com/[a-z]+/ershouche/(p[0-9]+)?\\?plog_id=[0-9a-z]{32}"})
public class RawUsedCar implements AfterExtractor{
    
    @ExtractBy(value = "//div[@class='title']/h1/text()", notNull = true)
    private String carInfo;
    
    @ExtractBy(value = "//div[@class='detail-box']/p[2]/text()", notNull = true)
    private String tradePrice;
    
    @ExtractBy (value = "//div[@class='price']/span[2]/s/text()" , notNull = true)
    private String newCarPrice;

.....

}

其中Spider是这么写的:

@Component
public class RawUsedCarCrawler {
    
    @Qualifier("RawUsedCarDaoPipeline")
    @Autowired
    private RawUsedCarDaoPipeline rawUsedCarDaoPipeline;

    public void crawl() {
        OOSpider.create(Site.me().setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0")
                                 .setDomain("renrenche.com"),
                        rawUsedCarDaoPipeline,
                        RawUsedCar.class)
                .addUrl("https://www.renrenche.com/bj/car/8a99d405ef8fbdf4?plog_id=392bb3dfd5ebe84d84e66b42ef48b3b7")
                .thread(5)
                .run();
    }

    public static void main(String[] args) {
        ApplicationContext applicationContext = new ClassPathXmlApplicationContext("classpath:/spring/applicationContext*.xml");
        final RawUsedCarCrawler rawUsedCarCrawler = applicationContext.getBean(RawUsedCarCrawler.class);
        rawUsedCarCrawler.crawl();
    }
}

为什么只抓取到一个页面的数据啊,就不再抓取了?

加载中
返回顶部
顶部