结巴分词 0.26 发布,Python 中文分词组件 - 开源中国社区
结巴分词 0.26 发布,Python 中文分词组件
fxsjy 2013年04月07日

结巴分词 0.26 发布,Python 中文分词组件

fxsjy fxsjy 发布于2013年04月07日 收藏 23 评论 8

【腾讯云】云服务器新购特惠,1核1G 265元/年起>>>  

本次的主要更新:

1) 改进了对标点符号的处理,之前的版本会过滤掉所有的标点符号;

2) 允许用户在自定义词典中添加词性;

3) 改进了关键词提取的功能jieba.analyse.extract_tags;

4) 修复了一个在pypy解释器下运行的bug.

在线演示:http://jiebademo.ap01.aws.af.cm/

 

 

本站文章除注明转载外,均为本站原创或编译。欢迎任何形式的转载,但请务必注明出处,尊重他人劳动共创开源社区。
转载请注明:文章转载自 开源中国社区 [http://www.oschina.net]
本文标题: 结巴分词 0.26 发布,Python 中文分词组件
分享
评论(8)
最新评论
0

引用来自“sunjunyi”的评论

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

pip instal -U jieba,更新一下吧。

Ok了
0

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

pip instal -U jieba,更新一下吧。
0

引用来自“fanlu”的评论

>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错

多谢反馈啊,很奇怪,在python2.7中是没有问题的。python2.6不支持eval字符串中有\r\n,必须是\n。
0
>>> import jieba
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/__init__.py", line 5, in <module>
import finalseg
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 13, in <module>
prob_start = load_model("prob_start.py")
File "/usr/local/lib/python2.6/site-packages/jieba-0.26-py2.6.egg/jieba/finalseg/__init__.py", line 10, in load_model
tab = eval(open(prob_p_path,"rb").read())
File "<string>", line 1
{'B': -0.26268660809250016,
^
SyntaxError: invalid syntax
报错
0
这个东西有和php结合的使用方法么,哪位提供1下使用方法,谢谢
0
分词如果能集成到 emacs vim 的 autocomplete 以及 office 类软件的中文拼写错误检查就好了。
0
有很多用户都提到这个需求。比如文章中的逗号、句号、双引号之类的,如果被过滤掉了,可能会影响他们对句子结构的分析。
0
现在为什么不需要过滤标点符号?
顶部