HtmlUnit抓取网页内容时出现错误Exception invoking doScroll

慕思萧萧 发布于 2014/05/09 11:04
阅读 5K+
收藏 0

【领华为电脑包】容器化时代到来!跳转机分配问题终于“有救”了!>>>

具体的错误信息如下,若是在程序中加上禁用Javascript的效果,webClient.getOptions().setJavaScriptEnabled(false);则执行没有问题,所以这个问题是因为HtmlUnit对Javascript的支持不够的原因吗?有没有办法可以解决?我需要的是对aJax,Javascript动态生成的网页数据,不能禁用Javascript的效果。

有用过HtmlUnit的友人,关注一下,求指导啊!!!!!!

Exception in thread "main" ======= EXCEPTION START ========

Exception class=[java.lang.RuntimeException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking doScroll
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:689)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:575)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1074)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:391)
at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:266)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:286)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:702)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:662)
at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:926)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:245)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:191)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:455)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:329)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:394)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:379)
at src.Test.getHtml(Test.java:23)
at src.Test.main(Test.java:15)
加载中
0
光明凤凰

你这个明显是javascript不正确,或者说不是标准的javascript。

浏览器的javascript引擎,是有容错机制的,有时候js代码写的不规范,也能正常运行。但是htmlunit里的js引擎显然是比较严格的,发现不对就跳出去了。

你只能把js写的很规范,htmlunit才会放过你。

慕思萧萧
慕思萧萧
谢谢你的关注啦~~~
0
慕思萧萧
慕思萧萧

嗷呜,自己来回答了,我是用HtmlUnit来抓取网页数据,不是我写的网页……不过这个问题有一个解决方案就是定义抓取网页时模拟的浏览器,代码如是:WebClient webClient=new WebClient(BrowserVersion.CHROME);不用禁用Javascript的功能,但是HtmlUnit在利用过程中还有好多其他的问题啊……有没有对HtmlUnit研究的比较深的人,求指导啊%>_<%

a
aiyayaa
大哥我来挖坟了 你这个问题怎么解决的呢
返回顶部
顶部