Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zip support #310

Closed
wants to merge 3 commits into from
Closed

Add Zip support #310

wants to merge 3 commits into from

Conversation

wb14123
Copy link

@wb14123 wb14123 commented Sep 17, 2024

Resolves #161. Add Zip support.

os.zip(path) to read/write zip file. File will be created if not exists.

zipFile / subPath to get a path. Should be able to support all the file operations like copy, move, delete and so on.

Resolves com-lihaoyi#161. Add Zip
support.

`os.zip(path)` to read/write zip file. File will be created if not
exists.

`zipFile / subPath` to get a path. Should be able to support all the
file operations like copy, move, delete and so on.
@wb14123
Copy link
Author

wb14123 commented Sep 19, 2024

Hi @lihaoyi , this is ready for review

@lihaoyi
Copy link
Member

lihaoyi commented Sep 19, 2024

@wb14123 looks like a good start. Need a few more things according to the original ticket:

  1. Conveniently creating zip files from folders
  2. Unzipping zip files into folders
  3. Support for the various flags that zip and unzip support, where reasonable:

Below are the various flags on my OS-X laptop, I assume Linux zip has similar ones

lihaoyi mill$ zip --help
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete OS files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -e   encrypt                      -n   don't compress these suffixes
  -h2  show more help

lihaoyi mill$ unzip --help
UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
  Default action is to extract files in list, except those in xlist, to exdir;
  file[.zip] may be a wildcard.  -Z => ZipInfo mode ("unzip -Z" for usage).

  -p  extract files to pipe, no messages     -l  list files (short format)
  -f  freshen existing files, create none    -t  test compressed archive data
  -u  update files, create if necessary      -z  display archive comment only
  -v  list verbosely/show version info       -T  timestamp archive to latest
  -x  exclude files that follow (in xlist)   -d  extract files into exdir
modifiers:
  -n  never overwrite existing files         -q  quiet mode (-qq => quieter)
  -o  overwrite files WITHOUT prompting      -a  auto-convert any text files
  -j  junk paths (do not make directories)   -aa treat ALL files as text
  -C  match filenames case-insensitively     -L  make (some) names lowercase
  -X  restore UID/GID info                   -V  retain VMS version numbers
  -K  keep setuid/setgid/tacky permissions   -M  pipe through "more" pager
See "unzip -hh" or unzip.txt for more help.  Examples:
  unzip data1 -x joe   => extract all files except joe from zipfile data1.zip
  unzip -p foo | more  => send contents of foo.zip via pipe into program more
  unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer

We don't need to support all of them verbatim, but on a case by case basis:

  1. At least we need to support the ones that do things that are impossible using the current PR API: e.g. zip -0 to -9, or zip -c
  2. Some of the flags may technically be possible using the common API, but we should probably provide support directly for convenience: e.g. creating zips with zip -x and -i, or unpacking zips with unzip -x, maybe we can let our Scala API take a Set[os.SubPath] or a os.SubPath => Boolean filter as part of creating or deconstructing zips
  3. Some of the flags may be substitutable with usage of the existing zip-file-as-filesystem API you already implemented: e.g. zip -d, zip -m, or unzip -l, in which case we should document how to perform that workflow in a unit test using the os.zip primitives that you do provide
  4. Some of these flags are irrelevant (e.g. zip -h2, unzip -M), or may be difficult/impossible to implement on the JVM.

Exactly which flag goes into which category is a subjective judgement. In terms of next steps, how about you go through the list of zip flags (either the ones I provided or the ones on your computer, doesn't really matter) and categorize them into the 4 categories above, and list out your categorization as part of the PR description. Then we can ensure that the ones that deserve special support are supported, and the ones that can be emulated via other workflows have those workflows tested and exercised.

Lastly, we need to include this in the documentation. One you have the categorization/implementation/tests finalized, it should be pretty quick to copy paste the necessary parts of each into the readme.adoc with a few words to explain how they are used

import java.nio.file.{Path => _, _}
import collection.JavaConverters._

class zip(zipPath: Path) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fs is created internally, shouldn't the class zip implement Closeable?


private val fs = FileSystems.newFileSystem(
URI.create("jar:file:" + zipPath.wrapped.toString),
Map("create" -> "true").asJava

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a Map[String, String] be an optional parameter to the class constructor with this as the default value? That would make it configurable from the outside.

@lihaoyi lihaoyi mentioned this pull request Oct 7, 2024
@lihaoyi lihaoyi closed this Oct 7, 2024
lihaoyi added a commit that referenced this pull request Oct 7, 2024
Pulls in changes from #316 and
#310 and cleans it up

Mostly documented in the readme.adoc. 

Major APIs added:

- `os.zip`: create or append to an existing zip file on disk
- `os.zip.stream`: create a new zip file but write it to an
`java.io.OutputStream` rather than a file on disk
- `os.unzip`: unzip a zip file on disk into a folder on disk
- `os.unzip.stream`: unzip a zip file from an `java.io.InputStream` into
a folder on disk
- `os.unzip.list`: list the contents of a zip file
- `os.unzip.streamRaw`: low-level API used by `os.unzip.stream` and
`os.unzip.list`, exposed in case users need it
- `os.zip.open`: Opens a zip file as `java.nio.file.FileSystem` and
gives you an `os.Path` you can use to work with it

Hopefully these are APIs we can start using in Mill rather than `"zip"`
subprocesses or ad-hoc helpers like `IO.unpackZip`

Limitations:

* Use of `java.nio.file.FileSystem` is only supported on JVM and not on
Scala-Native, and so using `os.zip` to append to existing jar files or
`os.zip.open` does not work on Scala-Native.
* Also `os.zip` doesn't support creating/unpacking symlinks or
preserving filesystem permissions in Zip files, because the underlying
`java.util.zip.Zip*Stream` doesn't support them. Apache Commons Compress
can work with them
(https://commons.apache.org/proper/commons-compress/zip.html), but if
we're sticking with std lib we don't have that
* Bumps the version requirement to Java 11 and above, matching the
direction of the rest of com-lihaoyi. Probably not strictly necessary,
but we have to do it eventually and now is as good a time as ever with
requests already bumped and Mill bumping soon in 0.12.0

---------

Co-authored-by: Chaitanya Waikar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] zip file handling (500USD Bounty)
3 participants