You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.
Basically Goose depends a lot on the english stop-word list to extract "relevant" text. Goose currently does not support other languages than English. I made changes so it can support other languages because I needed support for German. To support German the changes needed to be done are to detect the charset (UTF-8, ISO-xxxxx, etc.) the web page is encoded in (to support special characters like umlauts), content language detection (find out whether the web page is in English, German, or something else) and to provide a German stop word list if the article is in German.
I am unfortunately not (yet) allowed to provide my changes due to legal reasons.
In your case you would need to do something similar but I don't really know much about the structure of the Chinese language and if something like stop-words exists in Chinese.
Hi, I was just trying goose out on some Chinese language news sites, and it doesn't appear to be able to pull any article text. Examples:
http://news.xhby.net/system/2011/10/03/011788372.shtml
http://news.iqilu.com/shehui/huahuashijie/20111003/565892.html
Will your algorithm work on Chinese with a minor fix or does it need to be a latin language?
Thanks,
Joel
The text was updated successfully, but these errors were encountered: