Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three errors from ./blog2epubcli.py #17

Closed
meedstrom opened this issue Jul 4, 2024 · 9 comments
Closed

Three errors from ./blog2epubcli.py #17

meedstrom opened this issue Jul 4, 2024 · 9 comments

Comments

@meedstrom
Copy link

meedstrom commented Jul 4, 2024

Command ./blog2epubcli.py https://eukaryotewritesblog.com failed on or after post 37: Nemesis club (next post would be Biodiversity for heretics)

File "/src/blog2epub/crawlers/wordpress.py", line 121, in get_images_with_captions
  img_caption = img_tr.xpath('//p[@class="wp-caption-text"]/text()').pop()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: pop from empty list

Command ./blog2epubcli.py https://agentyduck.blogspot.com/ failed on or after post 91: Mental Postures (next post would probably be Simulating Confusion)

File "src/lxml/apihelpers.pxi", line 1736, in lxml.etree._htmlTagValidOrRaise
ValueError: Invalid HTML tag name 'li"'

Command ./blog2epubcli.py https://kajsotala.fi/ failed at the start. Just a HTTP error so it may be on my end, but I wonder if you get the same too.

File "/usr/local/lib/python3.12/urllib/request.py", line 639, in http_error_default
  raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
@bohdanbobrowski
Copy link
Owner

This actually shows three separate bugs. Expect to see fix for them in version 1.3.0 wich I will try to deliver by the end of the next week.

Thanks @meedstrom for creating this issue. It's always very exciting for me, when someone uses your software :-)

@meedstrom
Copy link
Author

meedstrom commented Jul 18, 2024

Ya, a tool like this fills a distinct niche. I haven't found anything else so I'm reduced to using the EpubPress web-extension and manually clicking each article to download, and on top of that it has a max size per book and you don't know exactly when you'll hit the max and if the download fails for that reason then you have to click all over again.... surprisingly not so many people that want to read a blog whole?

Thanks for creating this project :)

@bohdanbobrowski
Copy link
Owner

The release of the new version is taking bit longer than I've estimated above, but it should be available soon (maybe this weekend). I'm polishing the latest changes and fixing building errors (after changing from venv to poetry, it broke a bit, but it's okay now - at least for Windows). Progress can be tracked on branch 1.3.0. Meanwhile, you can see the newly added functionality: the ability to select added chapters (articles) from all downloaded ones. Stay tuned!

obraz

@bohdanbobrowski
Copy link
Owner

@meedstrom it looks like both bugs are finally fully resolved. I've been working on a deep code refactor for a while now... it's definitely not over yet... but it works much better now. It's currently available on the dev branch, it will be in the next 1.5.0 release.

blog2epub https://eukaryotewritesblog.com -l=10
blog2epub https://agentyduck.blogspot.com -l=10
blog2epub https://kajsotala.fi -l=10

All these commands (note the bit different syntax after cli interface refactor) produce shiny epubs, for example:

obraz
obraz

@meedstrom
Copy link
Author

meedstrom commented Nov 15, 2024 via email

@bohdanbobrowski
Copy link
Owner

@meedstrom new issues will be very welcome! :-)

@meedstrom
Copy link
Author

Would it be possible to reverse the order of posts so that it's oldest-first? :-)

@bohdanbobrowski
Copy link
Owner

Hmmm, it should be sorted that way - on which example it starts from the newest?

@meedstrom
Copy link
Author

You seem to be right, I assumed it does newest-first because I saw https://eukaryotewritesblog.com/ has newest post as first page, but then the second page has some old post, it's not consistently ordered.

But we can take this discussion to #33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants