-
Notifications
You must be signed in to change notification settings - Fork 14
/
index.html
646 lines (515 loc) · 19.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>unzipit</title>
<meta name="description" content="unzipit: Random access unzip library for browser and node based JavaScript" />
<meta name="keywords" content="zip unzip javascript node" />
<meta name="thumbnail" content="https://greggman.github.io/unzipit/unzipit-no-anim.png" />
<meta property="og:title" content="unzipit" />
<meta property="og:type" content="website" />
<meta property="og:image" content="https://greggman.github.io/unzipit/unzipit-no-anim.png" />
<meta property="og:description" content="unzipit: Random access unzip library for browser and node based JavaScript" />
<meta property="og:url" content="https://greggman.github.io/unzipit/">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:site" content="@greggman">
<meta name="twitter:creator" content="@greggman">
<meta name="twitter:domain" content="greggman.github.io">
<meta name="twitter:title" content="unzipit">
<meta name="twitter:url" content="https://greggman.github.io/unzipit/">
<meta name="twitter:description" content="unzipit: Random access unzip library for browser and node based JavaScript" />
<meta name="twitter:image:src" content="https://greggman.github.io/unzipit/unzipit-no-anim.png">
<script src="https://cdnjs.cloudflare.com/ajax/libs/showdown/1.9.0/showdown.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.15.10/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.15.10/languages/javascript.min.js"></script>
<style>
body {
margin: 0;
background: #222;
font-size: large;
font-family: sans-serif;
}
.logo img {
display: block;
}
.logo,
.content {
max-width: 1000px;
margin: 0 auto;
background: white;
}
.content>div {
padding: 2em;
background: linear-gradient(#53be9a 0, #444 150px);
color: white;
}
a {
color: yellow;
text-decoration: none;
}
pre {
padding: 1em;
font-size: medium;
}
p {
line-height: 1.3;
}
</style>
<link href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.15.10/styles/monokai-sublime.min.css" rel="stylesheet">
</head>
<body>
<div class="logo">
<img src="./unzipit.svg" style="width: 100%;">
</div>
<div class="content"><div><pre>
Random access unzip library for browser and node based JavaScript
[![Build Status](https://travis-ci.org/greggman/unzipit.svg?branch=master)](https://travis-ci.org/greggman/unzipit)
[[Live Tests](https://greggman.github.io/unzipit/test/)]
* Less than 8k gzipped without workers, Less than 13k with.
* [6x to 25x faster than JSZip](https://jsperf.com/jszip-vs-unzipit/4) without workers and even faster with
* Uses far less memory.
* Can [avoid downloading the entire zip file](#Streaming) if the server supports http range requests.
# How to use
Live Example: [https://jsfiddle.net/greggman/awez4sd7/](https://jsfiddle.net/greggman/awez4sd7/)
## without workers
```js
import {unzip} from 'unzipit';
async function readFiles(url) {
const {entries} = await unzip(url);
// print all entries and their sizes
for (const [name, entry] in Object.entries(entries)) {
console.log(name, entry.size);
}
// read an entry as an ArrayBuffer
const arrayBuffer = await entries['path/to/file'].arrayBuffer();
// read an entry as a blob and tag it with mime type 'image/png'
const blob = await entries['path/to/otherFile'].blob('image/png');
}
```
## with workers
```js
import {unzip, setOptions} from 'unzipit';
setOptions({workerURL: 'path/to/unzipit-worker.module.js'});
async function readFiles(url) {
const {entries} = await unzip(url);
...
}
```
or if you prefer
```js
import * as unzipit from 'unzipit';
unzipit.setOptions({workerURL: 'path/to/unzipit-worker.module.js'});
async function readFiles(url) {
const {entries} = await unzipit.unzip(url);
...
}
```
You can also pass a [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob),
[`ArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer),
[`SharedArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer),
[`TypedArray`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray),
or your own `Reader`
## Node
For node you need to make your own `Reader` or pass in an
[`ArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer),
[`SharedArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer),
or [`TypedArray`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray).
### Load a file as an ArrayBuffer
```js
const unzipit = require('unzipit');
const fsPromises = require('fs').promises;
async function readFiles(filename) {
const buf = await fsPromises.readFile(filename);
const {zip, entries} = await unzipit.unzip(new Uint8Array(buf));
... (see code above)
}
```
You can also pass your own reader. Here's 2 examples. This first one
is stateless. That means there is never anything to clean up. But,
it has the overhead of opening the source file once for each time
you get the contents of an entry. I have no idea what the overhead
of that is.
```js
const unzipit = require('unzipit');
const fsPromises = require('fs').promises;
class StatelessFileReader {
constructor(filename) {
this.filename = filename;
}
async getLength() {
if (this.length === undefined) {
const stat = await fsPromises.stat(this.filename);
this.length = stat.size;
}
return this.length;
}
async read(offset, length) {
const fh = await fsPromises.open(this.filename);
const data = new Uint8Array(length);
await fh.read(data, 0, length, offset);
await fh.close();
return data;
}
}
async function readFiles(filename) {
const reader = new StatelessFileReader(filename);
const {zip, entries} = await unzipit.unzip(reader);
... (see code above)
}
```
Here's also an example of one that only opens the file a single time
but that means the file stays open until you manually call close.
```js
class FileReader {
constructor(filename) {
this.fhp = fsPromises.open(filename);
}
async close() {
const fh = await this.fhp;
await fh.close();
}
async getLength() {
if (this.length === undefined) {
const fh = await this.fhp;
const stat = await fh.stat();
this.length = stat.size;
}
return this.length;
}
async read(offset, length) {
const fh = await this.fhp;
const data = new Uint8Array(length);
await fh.read(data, 0, length, offset);
return data;
}
}
async function doStuff() {
// ...
const reader = new FileReader(filename);
const {zip, entries} = await unzipit.unzip(reader);
// ... do stuff with entries ...
// you must call reader.close for the file to close
await reader.close();
}
```
### Workers in Node
```js
const unzipit = require('unzipit');
unzipit.setOptions({workerURL: require.resolve('unzipit/dist/unzipit-worker.js')});
...
// Only if you need node to exit you need to shut down the workers.
unzipit.cleanup();
```
## Why?
Most of the js libraries I looked at would decompress all files in the zip file.
That's probably the most common use case but it didn't fit my needs. I needed
to, as fast as possible, open a zip and read a specific file. The better libraries
only worked on node, I needed a browser based solution for Electron.
Note that to repo the behavior of most unzip libs would just be
```js
import {unzip} from 'unzipit';
async function readFiles(url) {
const {entries} = await unzip(url);
await Promise.all(Object.values(entries).map(async(entry) => {
entry.data = await entry.arrayBuffer();
}));
}
```
One other thing is that many libraries seem bloated. IMO the smaller the API the better.
I don't need a library to try to do 50 things via options and configuration. Rather I need
a library to handle the main task and make it possible to do the rest outside the library.
This makes a library far more flexible.
As an example some libraries provide no raw data for filenames. Apparently some zip files
have non-utf8 filenames in them. The solution for this library is to do that on your own.
Example
```js
const {zip, entriesArray} = await unzipit.unzipRaw(url);
// decode names as big5 (chinese)
const decoder = new TextDecoder('big5');
entriesArray.forEach(entry => {
entry.name = decoder.decode(entry.nameBytes);
});
const entries = Object.fromEntries(entriesArray.map(v => [v.name, v]));
... // same as above beyond this point
```
Same thing with filenames. If you care about slashes or backslashes do that yourself outside the library
```js
const {entries} = await unzipit(url);
// change slashes and backslashes into '-'
entries.forEach(entry => {
entry.name = entry.name.replace(/\\|\//g, '-');
});
```
Some libraries both zip and unzip.
IMO those should be separate libraries as there is little if any code to share between
both. Plenty of projects only need to do one or the other.
Similarly inflate and deflate libraries should be separate from zip, unzip libraries.
You need one or the other not both. See zlib as an example.
This library is ES6 based using async/await and import which makes the code
much simpler.
Advantages over other libraries.
* JSZIP requires the entire compressed file in memory.
It also requires reading through all entries in order.
* UZIP requires the entire compressed file to be in memory and
the entire uncompressed contents of all the files to be in memory.
* Yauzl does not require all the files to be in memory but
they do have to be read in order and it has very peculiar API where
you still have to manually go through all the entries even if
you don't choose to read their contents. Further it's node only.
* fflate has 2 modes. One the entire contents of all
uncompressed files are provided therefore using lots
of memory. The other is like Yauzl where you're required
to handle every file but you can choose to ignore
certain ones. Further in this mode (maybe both modes) are
not standards compliant. It scans for files but that is not
a valid way to read a zip file. The only valid way to read a zip file
is to jump to the end of the file and find the table of
contents. So, fflate will fail on perfectly valid zip files.
Unzipit does not require all compressed content nor all uncompressed
content to be in memory. Only the entries you access use memory.
If you use a Blob as input the browser can effectively virtualize
access so it doesn't have to be in memory and unzipit will only
access the parts of the blob needed to read the content you request.
Further, if you use the `HTTPRangeReader` or similar, unzipit only
downloads/reads the parts of the zip file you actually use, saving you
bandwidth.
As well, if you only need the data for images or video or audio then you can do
things like
```js
const {entries} = await unzip(url);
const blob = await entries['/some/image.jpg'].blob('image/jpeg');
const url = URL.createObjectURL(blob);
const img = new Image();
img.src = url;
```
Notice there is no access to the data using Blobs which means the browser
manages them. They don't count as part of the JavaScript heap.
In node, the examples with the file readers will only read the header and whatever entries' contents
you ask for so similarly you can avoid having everything in memory except the things you read.
# API
```js
import { unzipit, unzipitRaw, setOptions, cleanup } from 'unzipit';
```
# unzip, unzipRaw
```js
async unzip(url: string): ZipInfo
async unzip(src: Blob): ZipInfo
async unzip(src: TypedArray): ZipInfo
async unzip(src: ArrayBuffer): ZipInfo
async unzip(src: Reader): ZipInfo
async unzipRaw(url: string): ZipInfoRaw
async unzipRaw(src: Blob): ZipInfoRaw
async unzipRaw(src: TypedArray): ZipInfoRaw
async unzipRaw(src: ArrayBuffer): ZipInfoRaw
async unzipRaw(src: Reader): ZipInfoRaw
```
`unzip` and `unzipRaw` are async functions that take a url, `Blob`, `TypedArray`, or `ArrayBuffer` or a `Reader`.
Both functions return an object with fields `zip` and `entries`.
The difference is with `unzip` the `entries` is an object mapping filenames to `ZipEntry`s where as `unzipRaw` it's
an array of `ZipEntry`s. The reason to use `unzipRaw` over `unzip` is if the filenames are not utf8
then the library can't make an object from the names. In that case you get an array of entries, use `entry.nameBytes`
and decode the names as you please.
```js
type ZipInfo = {
zip: Zip,
entries: {[key: string]: ZipEntry},
};
```
```js
type ZipInfoRaw = {
zip: Zip,
entries: [ZipEntry],
};
```
```js
class Zip {
comment: string, // the comment for the zip file
commentBytes: Uint8Array, // the raw data for comment, see nameBytes
}
```
```js
class ZipEntry {
async blob(type?: string): Blob, // returns a Blob for this entry
// (optional type as in 'image/jpeg')
async arrayBuffer(): ArrayBuffer, // returns an ArrayBuffer for this entry
async text(): string, // returns text, assumes the text is valid utf8.
// If you want more options decode arrayBuffer yourself
async json(): any, // returns text with JSON.parse called on it.
// If you want more options decode arrayBuffer yourself
name: string, // name of entry
nameBytes: Uint8Array, // raw name of entry (see notes)
size: number, // size in bytes
compressedSize: number, // size before decompressing
comment: string, // the comment for this entry
commentBytes: Uint8Array, // the raw comment for this entry
lastModDate: Date, // a Date
isDirectory: bool, // True if directory
encrypted: bool, // True if encrypted
}
```
```js
interface Reader {
async getLength(): number,
async read(offset, size): Uint8Array,
}
```
## setOptions
```js
setOptions(options: UnzipitOptions)
```
The options are
* `useWorkers`: true/false (default: false)
* `workerURL`: string
The URL to use to load the worker script. Note setting this automatically sets `useWorkers` to true
* `numWorkers`: number (default: 1)
How many workers to use. You can inflate more files in parallel with more workers.
## cleanup
```js
cleanup()
```
Shuts down the workers. You would only need to call this if you want node
to exit since it will wait for the workers to exit.
# Notes:
## Supporting old browsers
Use a transpiler like [Babel](https://babeljs.io).
## Caching
If you ask for the same entry twice it will be read twice and decompressed twice.
If you want to cache entires implement that at a level above unzipit
## Streaming
You can't stream zip files. The only valid way to read a zip file is to read the
central directory which is at the end of the zip file. Sure there are zip files
where you can cheat and read the local headers of each file but that is an invalid
way to read a zip file and it's trivial to create zip files that will fail when
read that way but are perfectly valid zip files.
If your server supports http range requests you can do this.
```js
import {unzip, HTTPRangeReader} from 'unzipit';
async function readFiles(url) {
const reader = new HTTPRangeReader(url);
const {zip, entries} = await unzip(reader);
// ... access the entries as normal
}
```
## Special headers and options for network requests
The library takes a URL but there are no options for cors, or credentials etc.
If you need that pass in a Blob or ArrayBuffer you fetched yourself.
```js
import {unzip} from 'unzipit';
...
const req = await fetch(url, { mode: 'cors' });
const blob = await req.blob();
const {entries} = await unzip(blob);
```
## Non UTF-8 Filenames
The zip standard predates unicode so it's possible and apparently not uncommon for files
to have non-unicode names. `entry.nameBytes` contains the raw bytes of the filename.
so you are free to decode the name using your own methods. See example above.
## ArrayBuffer and SharedArrayBuffer caveats
If you pass in an `ArrayBuffer` or `SharedArrayBuffer` you need to keep the data unchanged
until you're finished using the data. The library doesn't make a copy, it uses the buffer directly.
## Handling giant entries
There is no way for the library to know what "too large" means to you.
The simple way to handle entries that are too large is to check their
size before asking for their content.
```js
const kMaxSize = 1024*1024*1024*2; // 2gig
if (entry.size > kMaxSize) {
throw new Error('this entry is larger than your max supported size');
}
const data = await entry.arrayBuffer();
...
```
## Encrypted, Password protected Files
unzipit does not currently support encrypted zip files and will throw if you try to get the data for one.
Put it on the TODO list 😅
# Testing
When writing tests serve the folder with your favorite web server (recommend [`servez`](https://www.npmjs.com/package/servez))
then go to `http://localhost:8080/test/` to easily re-run the tests. You can set a grep regular expression to only run certain tests
`http://localhost:8080/test/?grep=json`. It's up to you to encode the regular expression for a URL. For example
```js
encodeURIComponent('j(.*?)son')
"j(.*%3F)son"
```
so `http://localhost:8080/test/?grep=j(.*%3F)son`. The regular expression will be marked as case insensitive.
Of course you can also `npm test` to run the tests from the command line.
## Debugging
Follow the instructions on testing but add `?timeout=0` to the URL as in `http://localhost:8080/tests/?timeout=0`
## Live Browser Tests
[https://greggman.github.io/unzipit/test/](https://greggman.github.io/unzipit/test/)
# Acknowledgements
* The code is **heavily** based on [yauzl](https://github.com/thejoshwolfe/yauzl)
* The code uses the es6 module version of [uzip.js](https://www.npmjs.com/package/uzip-module)
# Licence
MIT
</pre></div></div>
<style>
#forkongithub a {
background: #000;
color: #fff;
text-decoration: none;
font-family: arial,sans-serif;
text-align: center;
font-weight: bold;
padding: 5px 40px;
font-size: 12px;
line-height: 24px;
position: relative;
transition: 0.5s;
display: block;
}
#forkongithub a:hover {
background: #c11;
color: #fff;
}
#forkongithub a::before,#forkongithub a::after {
content: "";
width: 100%;
display: block;
position: absolute;
top: 1px;
left: 0;
height: 1px;
background: #fff;
}
#forkongithub a::after {
bottom: 1px;
top: auto;
}
@media screen and (min-width: 400px){
#forkongithub{
position: fixed;
display: block;
top: 0;
right: 0;
width: 200px;
overflow: hidden;
height: 200px;
z-index: 9999;
}
#forkongithub a{
width: 200px;
position: absolute;
top: 40px;
right: -70px;
transform: rotate(45deg);
-webkit-transform: rotate(45deg);
-ms-transform: rotate(45deg);
-moz-transform: rotate(45deg);
-o-transform: rotate(45deg);
box-shadow: 4px 4px 10px rgba(0,0,0,0.8);
}
}
</style>
<div id="forkongithub"><a href="https://github.com/greggman/unzipit">Fork me on GitHub</a></div>
</body>
<script>
/* global showdown, hljs */
const converter = new showdown.Converter();
const pre = document.querySelector('pre');
const text = pre.innerText;
const html = converter.makeHtml(text);
pre.parentElement.innerHTML = html;
hljs.initHighlightingOnLoad();
</script>
</html>