-
-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow read_pdf to accept a file-like object #103
Comments
in this other repository (https://github.com/atlanhq/camelot) (I assume the original one?) there are already two merge requests pending and aiting to get accepted for this issue:
Maybe we can do this quickly with that ;). I think this is really a feature that a lot of poeple would like to have ... |
Thanks for pointing that out! Right now #13 is taking up a lot of my time, but I will try to get to this over the weekend. |
For poeple where the main problem is, that you want to keep the file "in-memory" for example as a spooled temporary file, a short workaround could be the following: use this library here: https://github.com/mbello/memory-tempfile to create a file on a a tmpfs in our memory. This soluion only works for linux though ... Additionally, its difficult to do this in docker images or on kubernetes. |
@vinayak-mehta just saw your comment. Looking forward to this! If you need any help (testing, review...) just contact me ;) although I am not that deep into the library ... |
Thanks for the suggestion, and for offering your help! I will try to get to the PRs by the weekend and will definitely comment here if I need help :) |
I mentioned another use case for this in atlanhq/camelot#189, where reading from file-like object would come in handy when more advanced authentication is required for websites (e.g. SharePoint), requiring pulling the object using a library like requests. |
Hey @vinayak-mehta , just checking in if you got around to doing this? |
Would love this feature to be implemented. The use case is an AWS Lambda function that has read a pdf from S3, processed it with regex to find relevant pages then we wish to pass the relevant pages as bytes to a table extraction package, ideally without having to write/read to/from file again in the Lambda. |
want to add to the comments that this would a very useful feature to access. writing and reading from disk can be quite expensive |
This would be very useful feature. Big appreciate if there is any update |
I was working an a forward port over here: |
In our use case we have PDF data streamed in memory from an external service; in order for us to process it using
camelot
we need to save that data out to a file and then pass the filename over. It would be great to be able to just send a file-like object through the interface instead, as this would save us from needing to write temporary files only to read them back in. I do not think there is a workaround for this at the moment, but if there is any information would be greatly appreciated.I do not know if I will have time immediately soon to work on a PR, but does this sound like a reasonable feature to add?
The text was updated successfully, but these errors were encountered: