Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

Open
jthrilly opened this issue Mar 28, 2019 · 10 comments
Open

Move to zip.js from JSZip? Or implement JSZip within webworker? #881

jthrilly opened this issue Mar 28, 2019 · 10 comments

Comments

@jthrilly
Copy link
Member

Currently unzipping large protocol files hangs the render thread on cordova (which is single threaded) when using JSZip.

zip.js implements concurrent threaded unzip using web workers, which are supported on all platforms.

Additionally, unzipping large protocol files causes app crashes, believed to be due to memory constraints, on cordova (particularly iOS). This may help alleviate that issue.

@jthrilly
Copy link
Member Author

Also of note is that this would impact the writeStream filesystem method that we currently use to manage memory use. This method is called by extractZipFile to write the contents of a zip file as it is extracted.

zip.js doesn't seem to offer a stream for output.

Does this mean we should consider putting our existing JSZip code in a web worker instead?

@jthrilly jthrilly changed the title Move to zip.js from JSZip? Move to zip.js from JSZip? Or implement JSZip within webworker? Mar 28, 2019
@wwqrd
Copy link
Contributor

wwqrd commented Mar 29, 2019

Looks like zip.js uses FileWriter, which supports streams.

Either way using a worker seems like a good way to stop the render process locking?

Failing that, maybe we need to rethink how assets are transferred?

@jthrilly
Copy link
Member Author

jthrilly commented Apr 2, 2019

Failing that, maybe we need to rethink how assets are transferred?

Another option here is stronger support for remote URL assets, and/or embedding things like the youtube video player. Video assets are likely to be the largest files we deal with.

@jthrilly jthrilly added this to the Beta-stretch milestone Apr 5, 2019
@wwqrd
Copy link
Contributor

wwqrd commented Jun 25, 2019

We are emulating FileWriter streams on Cordova, which I think is a compromise based on what we could achieve with the API. The solution might be to find a Cordova zip plugin and handle it that way, it means forking the code, but we already had to do that for emulation.

@jthrilly
Copy link
Member Author

Some more research on this issue:

@jthrilly
Copy link
Member Author

jthrilly commented Jul 14, 2020

Further options is native compression/decompression which may have already landed in chrome: https://github.com/WICG/compression/blob/master/explainer.md (not yet landed in safari)

Web assembly: https://github.com/drbh/wasm-flate and https://github.com/nika-begiashvili/libarchivejs

@keean
Copy link

keean commented Jul 30, 2020

I am using zip.js for very big files, it is the only JS zip library that I have found that can seek to the end of the zip to read the file directory without having to read the whole zip file. Even stream based implementations cannot do that. So the max memory requirement is the size of the buffer, not the size of the zip archive, nor the file you want to extract (unless you extract to memory). It uses a configurable buffer size (you can edit in the code), and you just need to implement the Reader and Writer interface for whatever you want to read and write from. I wrote chunked ArrayBuffer Readers and Writers for IndexedDB allowing me to unzip gigabytes of data to and from IndexedDB reliably, It may be 'abandoned', or it may just 'work', and not need any updates. It has worked without any issues for me in production code for years, and allows me to handle files that are too big for JSZip.

@jthrilly
Copy link
Member Author

jthrilly commented Jul 30, 2020

@keean - thanks for weighing in on this! I really appreciate your input.

The issue we are trying to address here is both memory use (which as you say can partially be managed by manipulating the buffer, using streams for backpressure, and creating adapters for the input and output 'write' mechanism) and also the UI jank caused by unzipping in a single threaded context. The latter one is actually probably a higher priority for us. Do you have any insight into unzipping in a different process, via webworker?

We are struggling with buffered reading specifically on cordova. As you see from this issue: Stuk/jszip#555 JSzip actually does inadvertently load large amounts of data into memory in some circumstances, which can cause our app to be killed by iOS in particular. We need to stream both zip reads and writes, and we only ever want to write/write the entire archive (never just the directory listing for example). Any thoughts on this?

@keean
Copy link

keean commented Jul 30, 2020

Yes, this zip.js runs the pako compressor/decompresser in a web worker so you dont need to do anything special, this is automatic (this can be disabled for single threaded use as well). I find it works on iOS to decompress multi-gigabyte zip files. Its also pretty fast as pako is written in asm.js and written to be as fast as possible in JS.

I find using stream backpressure very unreliable and "removed" from the actual control needed. With the buffering technique used in zip.js memory control is very precise, and you can reduce memory requirements to the size of a single buffer if you implement your Reader and Writer correctly.

What zip.js does is quite simple, it reads a buffer from your 'Reader' processes it through the pako decompression using a web-worker, and then passes the buffer to the 'Writer'. If your Reader and Writer can operate on the same buffer size as zip.js this never uses more than one buffer-size of memory to process the whole file.

The only issue is that to decompress a file in the zip, you must read the directory at the end first, there is no way around this, because it contains all the meta-data about the file.

What you might be able to do is this:

  • read a block of the zipfile decompress the block (using pako), write the block to a file.
  • keep going until you hit the End of central directory record.
  • from there you should be able to find the directory using which you can slice the big decompressed file up using the decompressed file sizes from the directory.

But this all seems a bit backwards to me. I would need to understand what you are actually trying to do, but instead of streaming I would recommend:

  • write an 'HttpReader' that uses the 'Range' header to read byte-ranges directly from the server.
  • use an HTTP HEAD request to get the length of the archive for the Reader.

Now the client can get the directory first, and then download and decompress each file in the archive to a local file.

@jthrilly
Copy link
Member Author

@keean - Thanks again for providing your insight into this issue. Very much appreciated. It sounds like your approach will definitely work for us.

@jthrilly jthrilly moved this to Maintainence in Network Canvas Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants