You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I woud use -layout option of pdftotext for that I guess I have to change body, err := exec.Command("pdftotext", "-q", "-nopgbrk", "-enc", "UTF-8", "-eol", "unix", f.Name(), "-").Output() to add -layout am I correct?
The text was updated successfully, but these errors were encountered:
@MikhailKlemin yes you would, however it may be worth making that default for all. Do you have any text examples showing the difference with and without the layout option?
Hi
For me it makes a lot of sense, since usually I apply a lot of regex after converting to TXT, and -layout really helps to fight the mess. I attached an example with screenshots.
Here are source PDF and convert to txt with and without layout option https://transfer.sh/WJzz/examples.zip
@MikhailKlemin we normally have to clean up the whitespace, so we'd need to test this internally to see what happens. I think it's worth adding as an option. I would look at adding some ENV options to control this. What's your timeframe?
Hello!
I woud use
-layout
option of pdftotext for that I guess I have to changebody, err := exec.Command("pdftotext", "-q", "-nopgbrk", "-enc", "UTF-8", "-eol", "unix", f.Name(), "-").Output()
to add-layout
am I correct?The text was updated successfully, but these errors were encountered: