Move to zip.js from JSZip? Or implement JSZip within webworker? #881

jthrilly · 2019-03-28T11:15:11Z

Currently unzipping large protocol files hangs the render thread on cordova (which is single threaded) when using JSZip.

zip.js implements concurrent threaded unzip using web workers, which are supported on all platforms.

Additionally, unzipping large protocol files causes app crashes, believed to be due to memory constraints, on cordova (particularly iOS). This may help alleviate that issue.

jthrilly · 2019-03-28T11:38:00Z

Also of note is that this would impact the writeStream filesystem method that we currently use to manage memory use. This method is called by extractZipFile to write the contents of a zip file as it is extracted.

zip.js doesn't seem to offer a stream for output.

Does this mean we should consider putting our existing JSZip code in a web worker instead?

wwqrd · 2019-03-29T10:49:59Z

Looks like zip.js uses FileWriter, which supports streams.

Either way using a worker seems like a good way to stop the render process locking?

Failing that, maybe we need to rethink how assets are transferred?

jthrilly · 2019-04-02T13:33:46Z

Failing that, maybe we need to rethink how assets are transferred?

Another option here is stronger support for remote URL assets, and/or embedding things like the youtube video player. Video assets are likely to be the largest files we deal with.

wwqrd · 2019-06-25T11:27:33Z

We are emulating FileWriter streams on Cordova, which I think is a compromise based on what we could achieve with the API. The solution might be to find a Cordova zip plugin and handle it that way, it means forking the code, but we already had to do that for emulation.

jthrilly · 2020-07-14T13:38:08Z

Some more research on this issue:

zip.js is essentially abandoned, and jszip is irregularly maintained and using pretty old code.
https://github.com/transcend-io/conflux looks like a promising alternative - a stream-based zip/unzip library - but it has some shortcomings (issue tracker is a mess).
Feature/lazy blob reading Stuk/jszip#555
https://github.com/jimmywarting/StreamSaver.js/blob/master/examples/zip-stream.js
https://github.com/jimmywarting/StreamSaver.js/blob/master/examples/saving-multiple-files.html

jthrilly · 2020-07-14T13:39:55Z

Further options is native compression/decompression which may have already landed in chrome: https://github.com/WICG/compression/blob/master/explainer.md (not yet landed in safari)

Web assembly: https://github.com/drbh/wasm-flate and https://github.com/nika-begiashvili/libarchivejs

keean · 2020-07-30T10:59:37Z

I am using zip.js for very big files, it is the only JS zip library that I have found that can seek to the end of the zip to read the file directory without having to read the whole zip file. Even stream based implementations cannot do that. So the max memory requirement is the size of the buffer, not the size of the zip archive, nor the file you want to extract (unless you extract to memory). It uses a configurable buffer size (you can edit in the code), and you just need to implement the Reader and Writer interface for whatever you want to read and write from. I wrote chunked ArrayBuffer Readers and Writers for IndexedDB allowing me to unzip gigabytes of data to and from IndexedDB reliably, It may be 'abandoned', or it may just 'work', and not need any updates. It has worked without any issues for me in production code for years, and allows me to handle files that are too big for JSZip.

jthrilly · 2020-07-30T11:40:41Z

@keean - thanks for weighing in on this! I really appreciate your input.

The issue we are trying to address here is both memory use (which as you say can partially be managed by manipulating the buffer, using streams for backpressure, and creating adapters for the input and output 'write' mechanism) and also the UI jank caused by unzipping in a single threaded context. The latter one is actually probably a higher priority for us. Do you have any insight into unzipping in a different process, via webworker?

We are struggling with buffered reading specifically on cordova. As you see from this issue: Stuk/jszip#555 JSzip actually does inadvertently load large amounts of data into memory in some circumstances, which can cause our app to be killed by iOS in particular. We need to stream both zip reads and writes, and we only ever want to write/write the entire archive (never just the directory listing for example). Any thoughts on this?

keean · 2020-07-30T12:38:22Z

Yes, this zip.js runs the pako compressor/decompresser in a web worker so you dont need to do anything special, this is automatic (this can be disabled for single threaded use as well). I find it works on iOS to decompress multi-gigabyte zip files. Its also pretty fast as pako is written in asm.js and written to be as fast as possible in JS.

I find using stream backpressure very unreliable and "removed" from the actual control needed. With the buffering technique used in zip.js memory control is very precise, and you can reduce memory requirements to the size of a single buffer if you implement your Reader and Writer correctly.

What zip.js does is quite simple, it reads a buffer from your 'Reader' processes it through the pako decompression using a web-worker, and then passes the buffer to the 'Writer'. If your Reader and Writer can operate on the same buffer size as zip.js this never uses more than one buffer-size of memory to process the whole file.

The only issue is that to decompress a file in the zip, you must read the directory at the end first, there is no way around this, because it contains all the meta-data about the file.

What you might be able to do is this:

read a block of the zipfile decompress the block (using pako), write the block to a file.
keep going until you hit the End of central directory record.
from there you should be able to find the directory using which you can slice the big decompressed file up using the decompressed file sizes from the directory.

But this all seems a bit backwards to me. I would need to understand what you are actually trying to do, but instead of streaming I would recommend:

write an 'HttpReader' that uses the 'Range' header to read byte-ranges directly from the server.
use an HTTP HEAD request to get the length of the archive for the Reader.

Now the client can get the directory first, and then download and decompress each file in the archive to a local file.

jthrilly · 2020-07-31T09:04:24Z

@keean - Thanks again for providing your insight into this issue. Very much appreciated. It sounds like your approach will definitely work for us.

jthrilly added enhancement discussion item refactor NC labels Mar 28, 2019

jthrilly changed the title ~~Move to zip.js from JSZip?~~ Move to zip.js from JSZip? Or implement JSZip within webworker? Mar 28, 2019

This was referenced Apr 5, 2019

Fix large asset import on Cordova #692

Merged

Speed up protocol import on Cordova #702

Closed

jthrilly added this to the Beta-stretch milestone Apr 5, 2019

wwqrd mentioned this issue Jun 25, 2019

Large protocols may cause the app to crash during import on iOS. #939

Closed

jthrilly moved this to Maintainence in Network Canvas Aug 25, 2022

jthrilly added this to Network Canvas Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

jthrilly commented Mar 28, 2019

jthrilly commented Mar 28, 2019

wwqrd commented Mar 29, 2019

jthrilly commented Apr 2, 2019

wwqrd commented Jun 25, 2019

jthrilly commented Jul 14, 2020

jthrilly commented Jul 14, 2020 •

edited

Loading

keean commented Jul 30, 2020 •

edited

Loading

jthrilly commented Jul 30, 2020 •

edited

Loading

keean commented Jul 30, 2020 •

edited

Loading

jthrilly commented Jul 31, 2020

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

Comments

jthrilly commented Mar 28, 2019

jthrilly commented Mar 28, 2019

wwqrd commented Mar 29, 2019

jthrilly commented Apr 2, 2019

wwqrd commented Jun 25, 2019

jthrilly commented Jul 14, 2020

jthrilly commented Jul 14, 2020 • edited Loading

keean commented Jul 30, 2020 • edited Loading

jthrilly commented Jul 30, 2020 • edited Loading

keean commented Jul 30, 2020 • edited Loading

jthrilly commented Jul 31, 2020

jthrilly commented Jul 14, 2020 •

edited

Loading

keean commented Jul 30, 2020 •

edited

Loading

jthrilly commented Jul 30, 2020 •

edited

Loading

keean commented Jul 30, 2020 •

edited

Loading