You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
becoming '5612_Wyntkangular_Ebook_500X617.Jpg' instead of the correct title. And a wrong title will also mess up the filename under which the books is written to the disk making it '5612_Wyntkangular_Ebook_500X617.Jpg.{pdf,mobi,epub}'.
An alternative to this would be to use the string inside the h1 tag of the title-bar-title div like here: mkarpiarz@c583d37.
But this also doesn't seem to be always reliable, e.g.:
<div id="title-bar-title"><h1>Free Amazon Web Services eBook</h1></div>
The text was updated successfully, but these errors were encountered:
I would suggest to go for the h1 tag and if for some reason is missing use the other as fallback, maybe removing with a regexp the numbers, the output probably will not be nice but at least it should work
As I mentioned in #47 (comment), the newsletter parser gets the book title from the url behind the image cover.
packtpub-crawler/script/packtpub.py
Line 101 in e604cc1
This will work fine if the link on the landing page points to the main book page like it was the case here: https://www.packtpub.com/packt/free-ebook/amazon-web-services-free
but will yield some unexpected results when this href points to, for example, a cover image - like here: https://www.packtpub.com/packt/free-ebook/what-you-need-know-about-angular-2
The latter will result in
packtpub-crawler/script/packtpub.py
Line 102 in e604cc1
An alternative to this would be to use the string inside the h1 tag of the
title-bar-title
div like here: mkarpiarz@c583d37.But this also doesn't seem to be always reliable, e.g.:
The text was updated successfully, but these errors were encountered: