-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does anybody else have this problem when using lncrawl in 69xinshu #2269
Comments
this is what appears |
I fixed 69xinshu in #2256 which is currently not released yet, this branch fixes the first error, the 403 you got -> not sure about that. |
I used the dev branch to download from this website and found that there was a problem with downloading more than 650 chapters at once. |
Darn, that's not too good. I'll see if there's anything that can be done there.. and maybe combine that with --auto-proxy to get new source IPs each download batch. Let me know if that works. @ncuxie |
I can get chapters without $ lncrawl -s https://www.69xinshu.com/book/40107.htm ===================================================
--------------------------------------------------------------------------------------- -> Press Ctrl + C to exit Retrieving novel info... [#] 从时间停止开始纵横诸天 ? Enter output directory: C:\Users\XIE\Lightnovels\www-69xinshu-com\C $ lncrawl -s https://www.69xinshu.com/book/40107.htm --auto-proxy ===================================================
--------------------------------------------------------------------------------------- -> Press Ctrl + C to exit Retrieving novel info... ! Error: No chapters found ---------------------------------------------------------------------- |
It looks like some of the the proxies are likely already on a blacklist or have a very bad IP reputation. So the other somewhat simple way forward would be to find working proxies for 69xinshu and test them - once you have a few suitable ones you could make a custom proxies file and use as described in the lncrawl help section Otherwise you can slowly download part-by-part with your own IP and that might work given enough time and only selecting a few hundred chaps per day max. I suggest this way if you're fine waiting a bit and downloading in parts. The EPUB can always be concatenated into one big thing with some tool at a later time if you prefer it that way. To make |
this link of raws does not have limit rates for downloads: https://www.ddxsss.com/ |
I checked and lncrawl doesn't currently support this source yet but if it does indeed not have any rate-limiting like 69xinshu then it would be a viable alternative, the site structure looks relatively similar as well so adding it shouldn't be too big of an issue. I even found a novel with the same title as mentioned in the above logs https://www.ddxsss.com/book/46000/ so they seem to overlap in that part as well. If someone wants to create an issue to add this source I'll look into doing that later this week. |
I actually went ahead and added the crawler already, it's currently a pull request so once it's merged into dev you can test it out by installing the newest dev version locally. #2287 I was able to download 1.3k chaps at once without any significant issues. The chapters with HTTP 503 reported did have their content available so it seemed to have failed once out of the few retries it has per chapter in those instances but no blocking from cloudflare / captchas or the like.
|
Let us know
Novel URL: https://www.69xinshu.com/book/9969673.htm
App Location: PIP | EXE | Discord | Telegram
App Version: x.y.z
Describe this issue
The text was updated successfully, but these errors were encountered: