[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

siddvenk · 2024-11-26T04:30:29Z

…extraction

Description

This change fixes an issue with ZipUtils.unzip, and TarUtils.untar. This issue was obfuscated by the fact that our ci tests were not failing when they should have been. See #3543 for details.

Rather than stripping leading file separators and still extracting the archive, this change validates that the tar entry will extract into the expected output directory.

While this change means that some tars that previously worked (such as those with entries starting with / or \) will no longer work, I argue that the new behavior is correct and those tar entries are invalid/incorrect.

si2d · 2024-11-26T05:30:11Z

api/src/main/java/ai/djl/util/ZipUtils.java

        }
-        return name.substring(index);
+        return name;


Why return the input variable as is? Maybe this method should be getAbsolutePath and the method is expected to throw if the path is invalid.
We are using the output in vis.validate. Do we need that after this change?

Agree, this method can be void

I think we still need vis.validate since that method is doing some deeper validation specific to zips. It's validating the entries in the header match what is in the archive

frankfliu · 2024-11-26T06:51:56Z

For Zip file, entry name starts with "/" is valid, and will be ignored by common zip utils.
For Tar file, entry name starts with "/" is a security vulnerability, tar cli will untar to root folder.

I think it's OK we make it more strict that doesn't allows "/", but your changes makes a/../b.txt valid, which can overwrite b.txt in the same archive file.

frankfliu · 2024-11-26T06:47:46Z

api/src/test/java/ai/djl/util/ZipUtilsTest.java

+    public void testLinuxCreatedWindowsUsedOffendingTar() throws IOException {
+        TestRequirements.windows();
+        Path tarPath = Paths.get("src/test/resources/linux_create_windows_use.tar");
+        Path output = Paths.get("C:/out");


Why choose "C:/out"?, better output to build folder, although the test throws exception before access to file system but using "c:/out" is a bit confusion:

not all machine has c driver

not all environment has write access to /out directory

Good catch, this was leftover from my testing on a windows machine - build/out suffices here

frankfliu · 2024-11-26T06:57:37Z

api/src/main/java/ai/djl/util/ZipUtils.java

        }
-        return name.substring(index);
+        return name;


Agree, this method can be void

siddvenk · 2024-11-26T16:05:11Z

@si2d @frankfliu what do you think about this method? We would use sanitizeAndValidateArchiveEntry where we currently use validateArchiveEntry.

We sanitize the archive entry by converting the entry name to a system aware path (convert a linux archive entry path to the equivalent windows path, or vis versa). Then we can remove the leading file separator as before (but it works now cross platform) and handles the concern @frankfliu mentioned about / being a valid starting char for zip
Validate that with this santized path, the expected output path exists under the expected output dir

static String sanitizeAndValidateArchiveEntry(String name, Path destination) throws IOException {
        String sanitizedName = removeLeadingFileSeparator(name);
        Path expectedOutputPath = destination.resolve(sanitizedName).normalize();
        if (!expectedOutputPath.startsWith(destination.normalize())) {
            throw new IOException(
                    "Bad archive entry "
                            + name
                            + ". Attempted write outside destination "
                            + destination);
        }
        return sanitizedName;
    }

static String removeLeadingFileSeparator(String name) {
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

frankfliu · 2024-11-26T16:20:29Z

@si2d

in archive file, the path is always linux style, we don't really need use os specific Path to validate it. And we don't have to support special cases (even they are valid). We can just check if the entry name starts with "/" or contains "..", we treat them as invalid. This was the original algorithm. I'm curious why that cause test failure on windows.

siddvenk · 2024-11-26T16:22:00Z

@si2d

in archive file, the path is always linux style, we don't really need use os specific Path to validate it. And we don't have to support special cases (even they are valid). We can just check if the entry name starts with "/" or contains "..", we treat them as invalid. This was the original algorithm. I'm curious why that cause test failure on windows.

File.separatorChar is os specific. On mac/linux it is /, on windows it is \. So on windows, if you had an archive entry like /tmp/test.txt, the removeLeadingFileSeparator would not actually remove / from the entry and cause the write at root

siddvenk · 2024-11-26T16:36:28Z

@frankfliu with the existing logic (ignoring any of my changes here), this is what the testOffendingTar unit test produces on windows (i added some print statements, but otherwise logic is same).

Gradle suite > Gradle test > ai.djl.util.ZipUtilsTest > testOffendingTar STANDARD_OUT
    Entry before removing leading separator: /tmp/empty.txt
    Entry after removing leading separator: /tmp/empty.txt
    Writing archive entry to D:\tmp\empty.txt

The output dir in that test is build/output, and we would expect that we write to build/output/tmp/empty.txt, but that's not what happens because of the separatorChar issue i mentioned

si2d · 2024-11-26T16:38:56Z

For Zip file, entry name starts with "/" is valid, and will be ignored by common zip utils. For Tar file, entry name starts with "/" is a security vulnerability, tar cli will untar to root folder.

I think it's OK we make it more strict that doesn't allows "/", but your changes makes a/../b.txt valid, which can overwrite b.txt in the same archive file.

I wonder - instead of trying to block this, should we only disallow overwrites? so even if the path is a/../b.txt, it will be allowed as long as nothing else writes to b.txt. We are explicitly setting REPLACE_EXISTING when unarchiving, but is overwriting files a valid use case? (Also, this change might break customers if they are somehow using it)

siddvenk · 2024-11-26T16:40:56Z

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

frankfliu · 2024-11-26T16:55:37Z

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

Right, we should not use File.separatorChar. Archive file format is not os specific

frankfliu · 2024-11-26T16:59:39Z

vis.validate(set) is till required. There is a CVE. Basically, when user use winzip to open the file, the see: a.txt, but when they unzip to file system, they got b.txt. We need make sure the ZipEntry matches the ZipInputStream name.

siddvenk · 2024-11-26T17:23:28Z

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.
static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }
Right, we should not use File.separatorChar. Archive file format is not os specific

The original algorithm was using File.separatorChar, not /. I think it's still possible to craft a tar with \ as the path separator. Converting the entry name with separatorsToSystem, and then using File.separatorChar will work. We could also modify the condition to something like

if (name.charAt(index) != '/' && name.charAt(index) != '\\') {
 ...
}

si2d · 2024-11-26T17:29:51Z

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.
static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }
Right, we should not use File.separatorChar. Archive file format is not os specific
The original algorithm was using File.separatorChar, not /. I think it's still possible to craft a tar with \ as the path separator. Converting the entry name with separatorsToSystem, and then using File.separatorChar will work. We could also modify the condition to something like
if (name.charAt(index) != '/' && name.charAt(index) != '\\') {
 ...
}

To me, checking that the path resolves to be within the destination is the more correct change rather than modifying our custom method. What is a reason to keep this change to a simplest change?

siddvenk · 2024-11-26T17:33:33Z

@si2d

To me, checking that the path resolves to be within the destination is the more correct change rather than modifying our custom method. What is a reason to keep this change to a simplest change?

I agree with you. I think we should be checking that the path resolves to be within the destination. I've kept that portion in and removed the check on "path contains .." since that would be redundant in my opinion.

The part that still seems open is whether we want to sanitize the archive entry at all. If it's possible and valid for zip archives to start with /, then we probably should be sanitizing each entry to remove leading file separators. Or, we can decide to be more strict and not allow that (which is handled implicitly by ensuring we're only writing under the expected output dir).

siddvenk · 2024-11-26T19:39:12Z

@si2d @frankfliu I have updated the PR based on the above discussions. The changes are now

convert the archive entry to system specific (convert path separators to the one used by the current system). we remove leading file separators using File.separatorChar as this should work post the initial conversion.
validate that the expected output path for the entry exists under the specified output directory

This will fix the issue with the current code. Some additional things we may consider:

remove step 1 entirely. This will be more strict than what we have today, but I think that's ok.
don't replace existing files, or provide an option for user to specify whether to overwrite or not

frankfliu · 2024-11-26T21:13:53Z

api/src/main/java/ai/djl/util/ZipUtils.java

    static String removeLeadingFileSeparator(String name) {
+        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);


try to avoid dependency on commons.io

If we always run sanitizeAndValidateArchiveEntry(), we only need to remove "/" char here, "\" issue will be caught by sanitizeAndValidateArchiveEntry()

why avoid commons.io? we're using it in a few places in this code path already

in api module, the only place use commons.io is in TarUtils.java, commons-io is transient dependency from commons-compression recently added in 1.27.x, which is not intention of api project

there are customers they don't use tar files, they can exclude commons-compression from their project. see: Make commons-compress an optional dependency #2949

api/src/main/java/ai/djl/util/ZipUtils.java

siddvenk · 2024-11-26T21:58:33Z

Updated the PR - i've opted for the more strict approach where entries that start with '/' are invalid.

We validate that each archive entry will be written to a location under the provided output directory. If it won't (either because it starts with /, or containers .. traversal elements that would go outside the expected dir), we fail. Users should be expected to provide valid archives, and we don't do any special processing on them.

api/src/main/java/ai/djl/util/ZipUtils.java

frankfliu · 2024-11-26T22:24:36Z

api/src/main/java/ai/djl/util/ZipUtils.java

-            }
+    static void validateArchiveEntry(String name, Path destination) throws IOException {
+        Path expectedOutputPath = destination.resolve(name).normalize();
+        if (!expectedOutputPath.startsWith(destination.normalize())) {


I think we'd better block ".." as well in this method. it prevent file overwrite inside the destination folder. In original version, we already blocking "..", and nobody complained about it.

The default behavior of extracting a tar is to overwrite isn't it? I'm not sure why we need to differ.

If an archive mytar.tar had (in order)

b.txt a/../b.txt

a/../b.txt would overwrite b.txt (e.g. using tar -xvf mytar.tar)

I've gone ahead and added it back to keep in line with what we had before, but curious to know what you think about my point above.

…extraction

…extraction (#3544)

siddvenk requested review from zachgk and a team as code owners November 26, 2024 04:30

si2d reviewed Nov 26, 2024

View reviewed changes

frankfliu reviewed Nov 26, 2024

View reviewed changes

siddvenk force-pushed the windows-test branch from 82d65a8 to 2ce8065 Compare November 26, 2024 17:29

siddvenk force-pushed the windows-test branch from 2ce8065 to ac401e7 Compare November 26, 2024 19:14

frankfliu reviewed Nov 26, 2024

View reviewed changes

siddvenk force-pushed the windows-test branch from ac401e7 to 836e1f0 Compare November 26, 2024 21:30

github-advanced-security bot found potential problems Nov 26, 2024

View reviewed changes

api/src/main/java/ai/djl/util/ZipUtils.java Dismissed Show dismissed Hide dismissed

si2d reviewed Nov 26, 2024

View reviewed changes

api/src/main/java/ai/djl/util/ZipUtils.java Show resolved Hide resolved

si2d approved these changes Nov 26, 2024

View reviewed changes

frankfliu reviewed Nov 26, 2024

View reviewed changes

siddvenk force-pushed the windows-test branch from 836e1f0 to d543ffe Compare November 27, 2024 00:52

frankfliu approved these changes Nov 27, 2024

View reviewed changes

siddvenk force-pushed the windows-test branch 2 times, most recently from cd00a44 to 34d457a Compare November 27, 2024 01:46

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact …

971e251

…extraction

siddvenk force-pushed the windows-test branch from 34d457a to 971e251 Compare November 27, 2024 01:55

siddvenk merged commit 7d197ba into master Nov 27, 2024
8 checks passed

siddvenk deleted the windows-test branch November 27, 2024 02:26

siddvenk added a commit that referenced this pull request Dec 10, 2024

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact …

7415cc5

…extraction (#3544)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

siddvenk commented Nov 26, 2024

si2d Nov 26, 2024

frankfliu Nov 26, 2024

siddvenk Nov 26, 2024

frankfliu commented Nov 26, 2024

frankfliu Nov 26, 2024

siddvenk Nov 26, 2024

frankfliu Nov 26, 2024

siddvenk commented Nov 26, 2024 •

edited

Loading

frankfliu commented Nov 26, 2024

siddvenk commented Nov 26, 2024

siddvenk commented Nov 26, 2024

si2d commented Nov 26, 2024

siddvenk commented Nov 26, 2024

frankfliu commented Nov 26, 2024

frankfliu commented Nov 26, 2024

siddvenk commented Nov 26, 2024

si2d commented Nov 26, 2024

siddvenk commented Nov 26, 2024 •

edited

Loading

siddvenk commented Nov 26, 2024

frankfliu Nov 26, 2024 •

edited

Loading

siddvenk Nov 26, 2024

frankfliu Nov 26, 2024

siddvenk commented Nov 26, 2024

frankfliu Nov 26, 2024

siddvenk Nov 26, 2024 •

edited

Loading

siddvenk Nov 27, 2024

		static String removeLeadingFileSeparator(String name) {
		String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

Conversation

siddvenk commented Nov 26, 2024

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankfliu commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddvenk commented Nov 26, 2024 • edited Loading

frankfliu commented Nov 26, 2024

siddvenk commented Nov 26, 2024

siddvenk commented Nov 26, 2024

si2d commented Nov 26, 2024

siddvenk commented Nov 26, 2024

frankfliu commented Nov 26, 2024

frankfliu commented Nov 26, 2024

siddvenk commented Nov 26, 2024

si2d commented Nov 26, 2024

siddvenk commented Nov 26, 2024 • edited Loading

siddvenk commented Nov 26, 2024

frankfliu Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddvenk commented Nov 26, 2024

Choose a reason for hiding this comment

siddvenk Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddvenk commented Nov 26, 2024 •

edited

Loading

siddvenk commented Nov 26, 2024 •

edited

Loading

frankfliu Nov 26, 2024 •

edited

Loading

siddvenk Nov 26, 2024 •

edited

Loading