tika抽取docx格式的文件时找不到类

小丑_ 发布于 2016/09/02 12:03
阅读 405
收藏 0

tika抽取docx格式的文件时找不到类

java.lang.NoClassDefFoundError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTCustomXmlBlock

at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2436)
at java.lang.Class.getDeclaredMethods(Class.java:1793)
at com.zeroturnaround.javarebel.eG.initClass(JRebel:271)
at com.zeroturnaround.javarebel.eG.getDeclaredConstructor(JRebel:281)
at java.lang.Class.getDeclaredConstructor(Class.java:1987)
at com.zeroturnaround.javarebel.eG.getConstructor(JRebel:357)
at java.lang.Class.getConstructor(Class.java)
at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor(SchemaTypeImpl.java:1797)
at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1921)
at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
at org.apache.xmlbeans.impl.store.Xobj.find_element_user(Xobj.java:2080)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTFootnotesImpl.getFootnoteArray(Unknown Source)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTFootnotesImpl$1FootnoteList.get(Unknown Source)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTFootnotesImpl$1FootnoteList.get(Unknown Source)
at java.util.AbstractList$Itr.next(AbstractList.java:345)
at org.apache.poi.xwpf.usermodel.XWPFFootnotes.onDocumentRead(XWPFFootnotes.java:82)
at org.apache.poi.xwpf.usermodel.XWPFDocument.initFootnotes(XWPFDocument.java:239)
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:138)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:117)
at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:57)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at com.sgcc.uds.fileserver4zk.core.openfile.service.impl.OpenFileServiceImpl$$M$b537d3ed.parseFile(OpenFileServiceImpl.java:161)
at com.sgcc.uds.fileserver4zk.core.openfile.service.impl.OpenFileServiceImpl$$M$b537d3ed.wbcq(OpenFileServiceImpl.java:125)
at com.sgcc.uds.fileserver4zk.core.openfile.service.impl.OpenFileServiceImpl$$M$b537d3ed.textExtraction(OpenFileServiceImpl.java:85)
at com.sgcc.uds.fileserver4zk.core.openfile.service.impl.OpenFileServiceImpl$$A$b537d3ed.textExtraction(<generated>)
at com.sgcc.uds.fileserver4zk.core.openfile.service.impl.OpenFileServiceImpl.textExtraction(OpenFileServiceImpl.java:111)
at com.sgcc.uds.fileserver4zk.core.job.FileOpenTask.execute(FileOpenTask.java:86)
at com.sgcc.uds.fileserver4zk.core.job.FileOpenTask.execute(FileOpenTask.java:1)
at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:221)
at java.lang.Thread.run(Thread.java:662)

加载中
0
小丑_
小丑_
该问题已解决,原因是tika解析抽取文本时所依赖的poi-ooxml-schemas.jar是精简版的,为了节省空间,里面放的只有一些常用的模块,所以要引用另外一些功能的话就需要应用完整的ooxml-schemas.jar,网上下载一个ooxml-schemas.jar后将项目中的poi-ooxml-schemas.jar移除即可。
返回顶部
顶部