You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to do the same thing in PyMuPDF with this code:
importfitzdoc=fitz.Document('C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\test.epub')
page=doc[331] # the page index is somehow different for the same page I wanthtml=page.get_text("html")
withopen("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\py-test.html", "w") asfile:
file.write(html)
I got this result:
the image is included in base64 format.
I also tried doing the same thing via mutool convert cli, and can get the same result but there's an option that need to be enabled, I dont find anyway to set this thing in to_html method of this crate. The option in mutool look like this:
Text output options:
inhibit-spaces: don't add spaces between gaps in the text
preserve-images: keep images in output
preserve-ligatures: do not expand ligatures into constituent characters
preserve-whitespace: do not convert all whitespace into space characters
preserve-spans: do not merge spans on the same line
dehyphenate: attempt to join up hyphenated words
mediabox-clip=no: include characters outside mediabox
The text was updated successfully, but these errors were encountered:
When I try coverting a page that have image to
html
orxhtml
, the image is not included. With this code:I got this result:
there should be an image above
Figure 10.3
text.I tried to do the same thing in
PyMuPDF
with this code:I got this result:
the image is included in base64 format.
I also tried doing the same thing via
mutool convert
cli, and can get the same result but there's an option that need to be enabled, I dont find anyway to set this thing into_html
method of this crate. The option inmutool
look like this:The text was updated successfully, but these errors were encountered: