Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分词结果中出现了自定义词典之外的词 #987

Open
wwfcnu opened this issue Mar 9, 2023 · 2 comments
Open

分词结果中出现了自定义词典之外的词 #987

wwfcnu opened this issue Mar 9, 2023 · 2 comments

Comments

@wwfcnu
Copy link

wwfcnu commented Mar 9, 2023

jieba.load_userdict("user.txt")
jieba.cut(text,HMM=False)
user.txt中有200000,切出的词却超过了200000,包含了一些词典中不存在的词,比如词典中没有“胡如雷”,却分出了这个词,按理说,不应该分成胡 如 雷三个单字吗?

@wwfcnu
Copy link
Author

wwfcnu commented Mar 9, 2023

如果自定义词典中没有,jieba就会按照默认词典切分吗

@brynne8
Copy link

brynne8 commented Apr 25, 2023

是的,如果自定义词典没有,就会按照默认词典。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants