You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jiebaR_0.9.99 jiebaRD_0.1 chinese.misc_0.1.9
loaded via a namespace (and not attached):
[1] compiler_3.5.1 magrittr_1.5 parallel_3.5.1 tools_3.5.1 NLP_0.1-11
[6] yaml_2.2.0 Rcpp_0.12.18 slam_0.1-43 xml2_1.2.0 stringi_1.1.7
[11] tm_0.7-5 Ruchardet_0.0-3 rlang_0.2.2 purrr_0.2.5
第二,全部错误信息
Warning messages:
1: In segment(itext, analyzer, mod = "mix") :
In file mode, only the first element will be processed.
2: In readLines(input.r, n = lines, encoding = encoding) : incomplete final line found on 'E:/201803D/0910ontosim/texttest/鍩轰簬鏂囩尞璁¢噺瀛︾殑鍥介檯鐏北鐢熸€佸鐮旂┒鎬佸娍鍒嗘瀽_榄忔檽闆?segment.2018-09-15_23_44_30.txt'
Error in file_coding(code[1]) : Cannot open file
第三,最小可重复代码和数据源文件,哪一步的代码出现错误
尊敬的覃博士,您好。我在词性标记过程中遇到了麻烦,请求您的帮助。具体情况如下:
第一,环境信息
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jiebaR_0.9.99 jiebaRD_0.1 chinese.misc_0.1.9
loaded via a namespace (and not attached):
[1] compiler_3.5.1 magrittr_1.5 parallel_3.5.1 tools_3.5.1 NLP_0.1-11
[6] yaml_2.2.0 Rcpp_0.12.18 slam_0.1-43 xml2_1.2.0 stringi_1.1.7
[11] tm_0.7-5 Ruchardet_0.0-3 rlang_0.2.2 purrr_0.2.5
第二,全部错误信息
Warning messages:
1: In segment(itext, analyzer, mod = "mix") :
In file mode, only the first element will be processed.
2: In readLines(input.r, n = lines, encoding = encoding) : incomplete final line found on 'E:/201803D/0910ontosim/texttest/鍩轰簬鏂囩尞璁¢噺瀛︾殑鍥介檯鐏北鐢熸€佸鐮旂┒鎬佸娍鍒嗘瀽_榄忔檽闆?segment.2018-09-15_23_44_30.txt'
Error in file_coding(code[1]) : Cannot open file
第三,最小可重复代码和数据源文件,哪一步的代码出现错误
Text Processing and Analysis
ifolder<-"E:/201803D/0910ontosim/texttest"
itext<-list.files(ifolder, pattern = ".txt", all.files = FALSE,
recursive = TRUE, include.dirs = FALSE, full.names=TRUE)
tagging
library(jiebaR)
analyzer <- worker(type = "mix", dict = DICTPATH, hmm = HMMPATH, user = "E:/2017DN/data/custom.dict", stop_word ="E:/2017DN/data/stopwords.txt",
write = TRUE, qmax = 20, topn = 5, encoding = "UTF-8", detect = TRUE, symbol = FALSE, lines = 1e+05,
output = NULL, bylines = TRUE, user_weight = "max")
textseg <- segment(itext, analyzer, mod = "mix")
tokenizer <- worker("tag")
pos_tag<-tagging(textseg, tokenizer)
第四,尝试过用什么方式来解决,可能的问题根源
测试过字符串格式输入分词标记对象,执行无误。
测试过一步到位的词性标记,无误。(但是不知可否使用第三方词典,专业文档标注十分仰赖专业词汇。)
换回文本文件会在分词后的标记步骤报错,仍然声称无法读取文档,不生成第二个标记的分词文档。
The text was updated successfully, but these errors were encountered: