Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for self-solved anti-captcha #172

Open
eharris opened this issue Nov 19, 2019 · 16 comments
Open

Allow for self-solved anti-captcha #172

eharris opened this issue Nov 19, 2019 · 16 comments

Comments

@eharris
Copy link

eharris commented Nov 19, 2019

With the changes they appear to have recently made that now requires solving a captcha even for logging in (see info in #169), the script doesn't even allow for downloading books you already own without setting up anti-captcha.

I have no interest in setting up or subscribing to an anti-captcha service, but I'm happy to solve the captcha myself manually, I just need a way to do so and provide that to the script so I can download my owned content (packt-cli -da).

@mjenczmyk
Copy link
Collaborator

If so why won't you just use their website? And what do you expect from the script: just the CLI command allowing you to pass ReCaptcha solution to it and run download all books or also a way to handle the ReCaptcha solving?

@eharris
Copy link
Author

eharris commented Nov 19, 2019

The website doesn't offer an easy way to download anything I haven't already downloaded, or do so with consistent naming. This script does. I would have to manually and very time intensively figure out what I'm missing (on an continuing/ongoing basis) and then download anything I'm missing individually, make sure the names are correct. And do all that work over and over again every single time I want to make sure my downloaded library is complete.

@eharris
Copy link
Author

eharris commented Nov 19, 2019

Yes, I would expect some way to have the script let me solve the captcha and then continue to do its work with the results. For example, one way that comes to mind would be to invoke a browser window and pass the captcha to it, and take the results of the submit back. The anti-captcha service it's already using obviously has a way to do this, so it seems like there should be some reasonable way to do it locally.

@eharris
Copy link
Author

eharris commented Nov 19, 2019

Maybe another way would be to have the user login to the site using a normal browser, and then invoke a little javascript or some other means of providing an easy way to expose and grab the JWT token easily from that login and just pass it on the command line to the script?

@mjenczmyk
Copy link
Collaborator

I'll think about that, but it may not be very obvious (and almost surely beyond the scope of the CLI script). I'd just go for the second solution for the sake of simplicity of the script, but we'd need how to obtain ReCAPTCHA solution in the first place.

@fref
Copy link

fref commented Nov 21, 2019

I second that demand.
I know it must be possible, I knew this was done for Pokemon Go mapping at some point in the past.

@fref
Copy link

fref commented Nov 21, 2019

I searched for this, this was done in RocketMap and seems to be a non-trivial feature (they passed via a bookmarklet which in the end sent back information to the server-script)

@supachris28
Copy link

supachris28 commented Nov 26, 2019

The Recaptcha on login is only enabled sometimes.

There are other endpoints that can be used with the access token and refresh token that a successful login grants, to get new valid tokens, following the OAuth2 model. I.e. once you have successfully authenticated, you can reauthenticate using your tokens at least every 30 days without using the username and password.

This would mean the Recaptcha does not affect your ability to download titles.


Please note this is not an endorsement of any activity.

@mjenczmyk
Copy link
Collaborator

@supachris28 Where have you taken this information from? I'm especially interested how do you know that "you can reauthenticate using your tokens at least every 30 days without using the username and password".

I can see access_token_live and refresh_token_live in my cookies, they expire after 24 hours. How do you know that this refresh token will be valid for 30 days? Is it part of OAuth 2.0?

@supachris28
Copy link

The cookies should be valid for longer, they are 30 days if you log into the subscription site rather than the store.


Please note this is not an endorsement of any activity.

@mjenczmyk
Copy link
Collaborator

Can we assume that user can find JWT token in the cookies in the browser after logging in? If so it would be easy to provide additional CLI parameter (JWT token) which would override user authentication and make script use provided JWT token.

@eharris
Copy link
Author

eharris commented Dec 10, 2019

If this is the solution used (having the user find and supply the JWT token from a browser login) then the script should probably also undertake the additional functionality @supachris28 noted to automatically "refresh" (extend the expiration of) the token(s) so that the user only has to supply it once, and the script keeps it "active" as long as the script is used on a regular enough basis to keep the token from expiring, and takes care of storing the updated token values.

I can confirm that when I login to the subscription site, the two tokens do have a 30 day expiration. In my session, the access_token_live is 671 chars long, and the refresh_token_live is 82 chars long. Given the length, and allowing for the possibility that more than one cookie/token is needed for this to work, it would seem that storing the credential information in the config file would be better than passing it on the command line.

For the first proof-of-concept pass, I think it would be ok to have the user be responsible for knowing how to view and copy the access token(s), as long as there is documentation as to how to find the correct cookie values (what domain, what cookie names, etc).

@RuthlessRuler
Copy link

Any Plans to implement cookie based log ins?
I'm trying to DL my entire library But I cant Log in due to Anti-Captcha token requirement.

@mjenczmyk
Copy link
Collaborator

mjenczmyk commented Mar 30, 2020

It'd be possible to change the code to be able to pass JWT (not very hard, I'd say its very easy). Then you could log in the browser, extract JWT token from the cookies (do you know how to do it?) and pass it to the script (although we'd be unable to fetch another token after passed one expires).

We (I?) could also see one day how to properly refresh token as specified in JWT specification, than I guess the issue above would no longer be and issue.

Would that be suitable for you? I'm not sure we should merge it before we make it properly, but we could maintain a branch with such functionality.

Oh, I see it's not a new issue, but everything I've written still holds.

@RuthlessRuler
Copy link

Cant we just ask User to export the cookies of the browser and let the Program search for the token and do it automatically like how it's done in aria2c?
I'm comfortable with any option though, But i just want token to be last enough for 300+ Books to be DLed at <1MBps speed.

@mjenczmyk
Copy link
Collaborator

I see it may be quite easy using browser-cookie3, I'll try to do it in spare time (I cannot promise any particular deadline though, I just promise I'll have that in my mind).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants