forked from wkhtmltopdf/wkhtmltopdf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README_WKHTMLTOPDF
510 lines (422 loc) · 26.1 KB
/
README_WKHTMLTOPDF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
=======================> wkhtmltopdf 0.10.0 rc2 Manual <========================
This file documents wkhtmltopdf, a program capable of converting html documents
into PDF documents.
==================================> Contact <===================================
If you experience bugs or want to request new features please visit
<http://code.google.com/p/wkhtmltopdf/issues/list>, if you have any problems or
comments please feel free to contact me: see
<http://www.madalgo.au.dk/~jakobt/#about>
===========================> Reduced Functionality <============================
Some versions of wkhtmltopdf are compiled against a version of QT without the
wkhtmltopdf patches. These versions are missing some features, you can find out
if your version of wkhtmltopdf is one of these by running wkhtmltopdf --version
if your version is against an unpatched QT, you can use the static version to
get all functionality.
Currently the list of features only supported with patch QT includes:
* Printing more then one HTML document into a PDF file.
* Running without an X11 server.
* Adding a document outline to the PDF file.
* Adding headers and footers to the PDF file.
* Generating a table of contents.
* Adding links in the generated PDF file.
* Printing using the screen media-type.
* Disabling the smart shrink feature of webkit.
==================================> License <===================================
Copyright (C) 2010 wkhtmltopdf/wkhtmltoimage Authors.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.
==================================> Authors <===================================
Written by Jan Habermann, Christian Sciberras and Jakob Truelsen. Patches by
Mehdi Abbad, Lyes Amazouz, Pascal Bach, Emmanuel Bouthenot, Benoit Garret and
Mário Silva.
==================================> Synopsis <==================================
wkhtmltopdf [GLOBAL OPTION]... [OBJECT]... <output file>
==============================> Document objects <==============================
wkhtmltopdf is able to put several objects into the output file, an object is
either a single webpage, a cover webpage or a table of content. The objects are
put into the output document in the order they are specified on the command
line, options can be specified on a per object basis or in the global options
area. Options from the Global Options section can only be placed in the global
options area
A page objects puts the content of a singe webpage into the output document.
(page)? <input url/file name> [PAGE OPTION]...
Options for the page object can be placed in the global options and the page
options areas. The applicable options can be found in the Page Options and
Headers And Footer Options sections.
A cover objects puts the content of a singe webpage into the output document,
the page does not appear in the table of content, and does not have headers and
footers.
cover <input url/file name> [PAGE OPTION]...
All options that can be specified for a page object can also be specified for a
cover.
A table of content object inserts a table of content into the output document.
toc [TOC OPTION]...
All options that can be specified for a page object can also be specified for a
toc, further more the options from the TOC Options section can also be applied.
The table of content is generated via XSLT which means that it can be styled to
look however you want it to look. To get an aide of how to do this you can dump
the default xslt document by supplying the --dump-default-toc-xsl, and the
outline it works on by supplying --dump-outline, see the Outline Options
section.
===============================> Global Options <===============================
--collate Collate when printing multiple copies
(default)
--no-collate Do not collate when printing multiple
copies
--cookie-jar <path> Read and write cookies from and to the
supplied cookie jar file
--copies <number> Number of copies to print into the pdf
file (default 1)
-d, --dpi <dpi> Change the dpi explicitly (this has no
effect on X11 based systems)
-H, --extended-help Display more extensive help, detailing
less common command switches
-g, --grayscale PDF will be generated in grayscale
-h, --help Display help
--htmldoc Output program html help
--image-dpi * <integer> When embedding images scale them down to
this dpi (default 600)
--image-quality * <integer> When jpeg compressing images use this
quality (default 94)
-l, --lowquality Generates lower quality pdf/ps. Useful to
shrink the result document space
--manpage Output program man page
-B, --margin-bottom <unitreal> Set the page bottom margin (default 10mm)
-L, --margin-left <unitreal> Set the page left margin (default 10mm)
-R, --margin-right <unitreal> Set the page right margin (default 10mm)
-T, --margin-top <unitreal> Set the page top margin (default 10mm)
-O, --orientation <orientation> Set orientation to Landscape or Portrait
(default Portrait)
--output-format <format> Specify an output format to use pdf or ps,
instead of looking at the extention of the
output filename
--page-height <unitreal> Page height
-s, --page-size <Size> Set paper size to: A4, Letter, etc.
(default A4)
--page-width <unitreal> Page width
--no-pdf-compression * Do not use lossless compression on pdf
objects
-q, --quiet Be less verbose
--read-args-from-stdin Read command line arguments from stdin
--readme Output program readme
--title <text> The title of the generated pdf file (The
title of the first document is used if not
specified)
--use-xserver * Use the X server (some plugins and other
stuff might not work without X11)
-V, --version Output version information an exit
Items marked * are only available using patched QT.
==============================> Outline Options <===============================
--dump-default-toc-xsl * Dump the default TOC xsl style sheet to
stdout
--dump-outline * <file> Dump the outline to a file
--outline * Put an outline into the pdf (default)
--no-outline * Do not put an outline into the pdf
--outline-depth * <level> Set the depth of the outline (default 4)
Items marked * are only available using patched QT.
================================> Page Options <================================
--allow <path> Allow the file or files from the specified
folder to be loaded (repeatable)
--background Do print background (default)
--no-background Do not print background
--checkbox-checked-svg <path> Use this SVG file when rendering checked
checkboxes
--checkbox-svg <path> Use this SVG file when rendering unchecked
checkboxes
--cookie <name> <value> Set an additional cookie (repeatable)
--custom-header <name> <value> Set an additional HTTP header (repeatable)
--custom-header-propagation Add HTTP headers specified by
--custom-header for each resource request.
--no-custom-header-propagation Do not add HTTP headers specified by
--custom-header for each resource request.
--debug-javascript Show javascript debugging output
--no-debug-javascript Do not show javascript debugging output
(default)
--default-header * Add a default header, with the name of the
page to the left, and the page number to
the right, this is short for:
--header-left='[webpage]'
--header-right='[page]/[toPage]' --top 2cm
--header-line
--encoding <encoding> Set the default text encoding, for input
--disable-external-links * Do not make links to remote web pages
--enable-external-links * Make links to remote web pages (default)
--disable-forms * Do not turn HTML form fields into pdf form
fields (default)
--enable-forms * Turn HTML form fields into pdf form fields
--images Do load or print images (default)
--no-images Do not load or print images
--disable-internal-links * Do not make local links
--enable-internal-links * Make local links (default)
-n, --disable-javascript Do not allow web pages to run javascript
--enable-javascript Do allow web pages to run javascript
(default)
--javascript-delay <msec> Wait some milliseconds for javascript
finish (default 200)
--load-error-handling <handler> Specify how to handle pages that fail to
load: abort, ignore or skip (default
abort)
--disable-local-file-access Do not allowed conversion of a local file
to read in other local files, unless
explecitily allowed with --allow
--enable-local-file-access Allowed conversion of a local file to read
in other local files. (default)
--minimum-font-size <int> Minimum font size
--exclude-from-outline * Do not include the page in the table of
contents and outlines
--include-in-outline * Include the page in the table of contents
and outlines (default)
--page-offset <offset> Set the starting page number (default 0)
--password <password> HTTP Authentication password
--disable-plugins Disable installed plugins (default)
--enable-plugins Enable installed plugins (plugins will
likely not work)
--post <name> <value> Add an additional post field (repeatable)
--post-file <name> <path> Post an additional file (repeatable)
--print-media-type * Use print media-type instead of screen
--no-print-media-type * Do not use print media-type instead of
screen (default)
-p, --proxy <proxy> Use a proxy
--radiobutton-checked-svg <path> Use this SVG file when rendering checked
radiobuttons
--radiobutton-svg <path> Use this SVG file when rendering unchecked
radiobuttons
--run-script <js> Run this additional javascript after the
page is done loading (repeatable)
--disable-smart-shrinking * Disable the intelligent shrinking strategy
used by WebKit that makes the pixel/dpi
ratio none constant
--enable-smart-shrinking * Enable the intelligent shrinking strategy
used by WebKit that makes the pixel/dpi
ratio none constant (default)
--stop-slow-scripts Stop slow running javascripts (default)
--no-stop-slow-scripts Do not Stop slow running javascripts
(default)
--disable-toc-back-links * Do not link from section header to toc
(default)
--enable-toc-back-links * Link from section header to toc
--user-style-sheet <url> Specify a user style sheet, to load with
every page
--username <username> HTTP Authentication username
--window-status <windowStatus> Wait until window.status is equal to this
string before rendering page
--zoom <float> Use this zoom factor (default 1)
Items marked * are only available using patched QT.
=========================> Headers And Footer Options <=========================
--footer-center * <text> Centered footer text
--footer-font-name * <name> Set footer font name (default Arial)
--footer-font-size * <size> Set footer font size (default 12)
--footer-html * <url> Adds a html footer
--footer-left * <text> Left aligned footer text
--footer-line * Display line above the footer
--no-footer-line * Do not display line above the footer
(default)
--footer-right * <text> Right aligned footer text
--footer-spacing * <real> Spacing between footer and content in mm
(default 0)
--header-center * <text> Centered header text
--header-font-name * <name> Set header font name (default Arial)
--header-font-size * <size> Set header font size (default 12)
--header-html * <url> Adds a html header
--header-left * <text> Left aligned header text
--header-line * Display line below the header
--no-header-line * Do not display line below the header
(default)
--header-right * <text> Right aligned header text
--header-spacing * <real> Spacing between header and content in mm
(default 0)
--replace * <name> <value> Replace [name] with value in header and
footer (repeatable)
Items marked * are only available using patched QT.
================================> TOC Options <=================================
--disable-dotted-lines * Do not use dotted lines in the toc
--toc-header-text * <text> The header text of the toc (default Table
of Content)
--toc-level-indentation * <width> For each level of headings in the toc
indent by this length (default 1em)
--disable-toc-links * Do not link from toc to sections
--toc-text-size-shrink * <real> For each level of headings in the toc the
font is scaled by this facter (default
0.8)
--xsl-style-sheet * <file> Use the supplied xsl style sheet for
printing the table of content
Items marked * are only available using patched QT.
=============================> Specifying A Proxy <=============================
By default proxy information will be read from the environment variables: proxy,
all_proxy and http_proxy, proxy options can also by specified with the -p switch
<type> := "http://" | "socks5://"
<serif> := <username> (":" <password>)? "@"
<proxy> := "None" | <type>? <sering>? <host> (":" <port>)?
Here are some examples (In case you are unfamiliar with the BNF):
http://user:password@myproxyserver:8080
socks5://myproxyserver
None
============================> Footers And Headers <=============================
Headers and footers can be added to the document by the --header-* and --footer*
arguments respectfully. In header and footer text string supplied to e.g.
--header-left, the following variables will be substituted.
* [page] Replaced by the number of the pages currently being printed
* [frompage] Replaced by the number of the first page to be printed
* [topage] Replaced by the number of the last page to be printed
* [webpage] Replaced by the URL of the page being printed
* [section] Replaced by the name of the current section
* [subsection] Replaced by the name of the current subsection
* [date] Replaced by the current date in system local format
* [time] Replaced by the current time in system local format
* [title] Replaced by the title of the of the current page object
* [doctitle] Replaced by the title of the output document
As an example specifying --header-right "Page [page] of [toPage]", will result
in the text "Page x of y" where x is the number of the current page and y is the
number of the last page, to appear in the upper left corner in the document.
Headers and footers can also be supplied with HTML documents. As an example one
could specify --header-html header.html, and use the following content in
header.html:
<html><head><script>
function subst() {
var vars={};
var x=document.location.search.substring(1).split('&');
for (var i in x) {var z=x[i].split('=',2);vars[z[0]] = unescape(z[1]);}
var x=['frompage','topage','page','webpage','section','subsection','subsubsection'];
for (var i in x) {
var y = document.getElementsByClassName(x[i]);
for (var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]];
}
}
</script></head><body style="border:0; margin: 0;" onload="subst()">
<table style="border-bottom: 1px solid black; width: 100%">
<tr>
<td class="section"></td>
<td style="text-align:right">
Page <span class="page"></span> of <span class="topage"></span>
</td>
</tr>
</table>
</body></html>
As can be seen from the example, the arguments are sent to the header/footer
html documents in get fashion.
==================================> Outlines <==================================
Wkhtmltopdf with patched qt has support for PDF outlines also known as book
marks, this can be enabled by specifying the --outline switch. The outlines are
generated based on the <h?> tags, for a in-depth description of how this is done
see the Table Of Contest section.
The outline tree can sometimes be very deep, if the <h?> tags where spread to
generous in the HTML document. The --outline-depth switch can be used to bound
this.
==============================> Table Of Content <==============================
A table of content can be added to the document by adding a toc objectto the
command line. For example:
wkhtmltopdf toc http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The table of content is generated based on the H tags in the input documents.
First a XML document is generated, then it is converted to HTML using XSLT.
The generated XML document can be viewed by dumping it to a file using the
--dump-outline switch. For example:
wkhtmltopdf --dump-outline toc.xml http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The XSLT document can be specified using the --xsl-style-sheet switch. For
example:
wkhtmltopdf toc --xsl-style-sheet my.xsl http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The --dump-default-toc-xsl switch can be used to dump the default XSLT style
sheet to stdout. This is a good start for writing your own style sheet
wkhtmltopdf --dump-default-toc-xsl
The XML document is in the namespace
"http://code.google.com/p/wkhtmltopdf/outline" it has a root node called
"outline" which contains a number of "item" nodes. An item can contain any
number of item. These are the outline subsections to the section the item
represents. A item node has the following attributes:
* "title" the name of the section.
* "page" the page number the section occurs on.
* "link" a URL that links to the section.
* "backLink" the name of the anchor the the section will link back to.
The remaining TOC options only affect the default style sheet so they will not
work when specifying a custom style sheet.
===============================> Page Breaking <================================
The current page breaking algorithm of WebKit leaves much to be desired.
Basically webkit will render everything into one long page, and then cut it up
into pages. This means that if you have two columns of text where one is
vertically shifted by half a line. Then webkit will cut a line into to pieces
display the top half on one page. And the bottom half on another page. It will
also break image in two and so on. If you are using the patched version of QT
you can use the CSS page-break-inside property to remedy this somewhat. There is
no easy solution to this problem, until this is solved try organising your HTML
documents such that it contains many lines on which pages can be cut cleanly.
See also: <http://code.google.com/p/wkhtmltopdf/issues/detail?id=9>,
<http://code.google.com/p/wkhtmltopdf/issues/detail?id=33> and
<http://code.google.com/p/wkhtmltopdf/issues/detail?id=57>.
=================================> Page sizes <=================================
The default page size of the rendered document is A4, but using this --page-size
optionthis can be changed to almost anything else, such as: A3, Letter and
Legal. For a full list of supported pages sizes please see
<http://doc.trolltech.com/4.6/qprinter.html#PageSize-enum>.
For a more fine grained control over the page size the --page-height and
--page-width options may be used
========================> Reading arguments from stdin <========================
If you need to convert a lot of pages in a batch, and you feel that wkhtmltopdf
is a bit to slow to start up, then you should try --read-args-from-stdin,
When --read-args-from-stdin each line of input sent to wkhtmltopdf on stdin will
act as a separate invocation of wkhtmltopdf, with the arguments specified on the
given line combined with the arguments given to wkhtmltopdf
For example one could do the following:
echo "http://doc.trolltech.com/4.5/qapplication.html qapplication.pdf" >> cmds
echo "cover google.com http://en.wikipedia.org/wiki/Qt_(toolkit) qt.pdf" >> cmds
wkhtmltopdf --read-args-from-stdin --book < cmds
===============================> Static version <===============================
On the wkhtmltopdf website you can download a static version of wkhtmltopdf
<http://code.google.com/p/wkhtmltopdf/downloads/list>. This static binary will
work on most systems and comes with a build in patched QT.
Unfortunately the static binary is not particularly static, on Linux it depends
on both glibc and openssl, furthermore you will need to have an xserver
installed but not necessary running. You will need to have different fonts
install including xfonts-scalable (Type1), and msttcorefonts. See
<http://code.google.com/p/wkhtmltopdf/wiki/static> for trouble shouting.
================================> Compilation <=================================
It can happen that the static binary does not work for your system for one
reason or the other, in that case you might need to compile wkhtmltopdf
yourself.
*GNU/Linux:*
Before compilation you will need to install dependencies: X11, gcc, git and
openssl. On Debian/Ubuntu this can be done as follows:
sudo apt-get build-dep libqt4-gui libqt4-network libqt4-webkit
sudo apt-get install openssl build-essential xorg git-core git-doc libssl-dev
On other systems you must use your own package manager, the packages might be
named differently.
First you must check out the modified version of QT
git clone git://gitorious.org/~antialize/qt/antializes-qt.git wkhtmltopdf-qt
Next you must configure, compile and install QT, note this will take quite some
time, depending on what arguments you use to configure qt
cd wkhtmltopdf-qt
./configure -nomake tools,examples,demos,docs,translations -opensource -prefix ../wkqt
git checkout 4.8.4
make -j3
make install
cd ..
All that is needed now is, to compile wkhtmltopdf.
git clone git://github.com/antialize/wkhtmltopdf.git wkhtmltopdf
cd wkhtmltopdf
../wkqt/bin/qmake
make -j3
You show now have a binary called wkhtmltopdf in the currently folder that you
can use, you can optionally install it by running
make install
*Other operative systems and advanced features*
If you want more details or want to compile under other operative systemsother
then GNU/Linux, please see
<http://code.google.com/p/wkhtmltopdf/wiki/compilation>.
================================> Installation <================================
There are several ways to install wkhtmltopdf. You can download a already
compiled binary, or you can compile wkhtmltopdf yourself. On windows the easiest
way to install wkhtmltopdf is to download the latest installer. On linux you can
download the latest static binary, however you still need to install some other
pieces of software, to learn more about this read the static version section of
the manual.
==================================> Examples <==================================
This section presents a number of examples of how to invoke wkhtmltopdf.
To convert a remote HTML file to PDF:
wkhtmltopdf http://www.google.com google.pdf
To convert a local HTML file to PDF:
wkhtmltopdf my.html my.pdf
You can also convert to PS files if you like:
wkhtmltopdf my.html my.ps
Produce the eler2.pdf sample file:
wkhtmltopdf -H http://geekz.co.uk/lovesraymond/archive/eler-highlights-2008 eler2.pdf
Printing a book with a table of content:
wkhtmltopdf -H cover cover.html toc chapter1.html chapter2.html chapter3.html book.pdf