UIMA (Unstructured Information Management applications) 是一个软件系统,用来分析大量的非结构化信息从而发掘中对最终用户有用的知识点,一个最典型的 UIM 应用就是从文本文件中提取有用信息,例如人员、地址和组织等相关信息。
下面是 UIMA 的结构图:
Apache UIMA Java SDK 2.4.1 发布了,改进内容包括:
* Documentation of binary serialization. * New kinds of compressed binary serialization that compress the data significantly and one form that supports unequal source/target type systems * A new facility called External Parameter Overrides for specifying parameter settings for annotators that uses properties files and is independent of the annotator hierarchy * CasCopier enhancements to allow copying one view to a different view. * Additional options to restrict JCasGen operation to generating just those types that are defined in a project, excluding other types that are imported from other projects * A new Maven plugin that runs JCasGen (see tools documentation for how to configure and use this) * A new ability to preserve white space (indentation) when parsing XML descriptors; this is now used in the Component Descriptor Editor (CDE), to preserve indentation when editing an existing descriptor. * Performance and space improvements, including some bulk methods for efficiently removing Feature Structures from Indexes
Apache UIMA Java SDK 2.4.1 发布
UIMA (Unstructured Information Management applications) 是一个软件系统,用来分析大量的非结构化信息从而发掘中对最终用户有用的知识点,一个最典型的 UIM 应用就是从文本文件中提取有用信息,例如人员、地址和组织等相关信息。
下面是 UIMA 的结构图:
Apache UIMA Java SDK 2.4.1 发布了,改进内容包括:
* Documentation of binary serialization.
* New kinds of compressed binary serialization that compress the data significantly and one form that supports unequal source/target type systems
* A new facility called External Parameter Overrides for specifying parameter settings for annotators that uses properties files and is independent of the annotator hierarchy
* CasCopier enhancements to allow copying one view to a different view.
* Additional options to restrict JCasGen operation to generating just those types that are defined in a project, excluding other types that are imported from other projects
* A new Maven plugin that runs JCasGen (see tools documentation for how to configure and use this)
* A new ability to preserve white space (indentation) when parsing XML descriptors; this is now used in the Component Descriptor Editor (CDE), to preserve indentation when editing an existing descriptor.
* Performance and space improvements, including some bulk methods for efficiently removing Feature Structures from Indexes