结巴分词 0.26 发布,Python 中文分词组件

fxsjy
 fxsjy
发布于 2013年04月07日
收藏 23

本次的主要更新:

1) 改进了对标点符号的处理,之前的版本会过滤掉所有的标点符号;

2) 允许用户在自定义词典中添加词性;

3) 改进了关键词提取的功能jieba.analyse.extract_tags;

4) 修复了一个在pypy解释器下运行的bug.

在线演示:http://jiebademo.ap01.aws.af.cm/

 

 

本站文章除注明转载外,均为本站原创或编译。欢迎任何形式的转载,但请务必注明出处,尊重他人劳动共创开源社区。
转载请注明:文章转载自 开源中国社区 [http://www.oschina.net]
本文标题: 结巴分词 0.26 发布,Python 中文分词组件
加载中

最新评论(8

fanlu
fanlu

引用来自“sunjunyi”的评论

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

pip instal -U jieba,更新一下吧。

Ok了
fxsjy
fxsjy

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

pip instal -U jieba,更新一下吧。
fxsjy
fxsjy

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

多谢反馈啊,很奇怪,在python2.7中是没有问题的。python2.6不支持eval字符串中有\r\n,必须是\n。
fanlu
fanlu
>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错
墨仝mt
墨仝mt
这个东西有和php结合的使用方法么,哪位提供1下使用方法,谢谢
zx.lyn
zx.lyn
分词如果能集成到 emacs vim 的 autocomplete 以及 office 类软件的中文拼写错误检查就好了。
fxsjy
fxsjy
有很多用户都提到这个需求。比如文章中的逗号、句号、双引号之类的,如果被过滤掉了,可能会影响他们对句子结构的分析。
谢中辉
谢中辉
现在为什么不需要过滤标点符号?
返回顶部
顶部