Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

siddvenk
Copy link
Contributor

…extraction

Description

This change fixes an issue with ZipUtils.unzip, and TarUtils.untar. This issue was obfuscated by the fact that our ci tests were not failing when they should have been. See #3543 for details.

Rather than stripping leading file separators and still extracting the archive, this change validates that the tar entry will extract into the expected output directory.

While this change means that some tars that previously worked (such as those with entries starting with / or \) will no longer work, I argue that the new behavior is correct and those tar entries are invalid/incorrect.

@siddvenk siddvenk requested review from zachgk and a team as code owners November 26, 2024 04:30
}
return name.substring(index);
return name;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return the input variable as is? Maybe this method should be getAbsolutePath and the method is expected to throw if the path is invalid.
We are using the output in vis.validate. Do we need that after this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this method can be void

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need vis.validate since that method is doing some deeper validation specific to zips. It's validating the entries in the header match what is in the archive

@frankfliu
Copy link
Contributor

For Zip file, entry name starts with "/" is valid, and will be ignored by common zip utils.
For Tar file, entry name starts with "/" is a security vulnerability, tar cli will untar to root folder.

I think it's OK we make it more strict that doesn't allows "/", but your changes makes a/../b.txt valid, which can overwrite b.txt in the same archive file.

public void testLinuxCreatedWindowsUsedOffendingTar() throws IOException {
TestRequirements.windows();
Path tarPath = Paths.get("src/test/resources/linux_create_windows_use.tar");
Path output = Paths.get("C:/out");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose "C:/out"?, better output to build folder, although the test throws exception before access to file system but using "c:/out" is a bit confusion:

  1. not all machine has c driver
  2. not all environment has write access to /out directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was leftover from my testing on a windows machine - build/out suffices here

}
return name.substring(index);
return name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this method can be void

@siddvenk
Copy link
Contributor Author

siddvenk commented Nov 26, 2024

@si2d @frankfliu what do you think about this method? We would use sanitizeAndValidateArchiveEntry where we currently use validateArchiveEntry.

  1. We sanitize the archive entry by converting the entry name to a system aware path (convert a linux archive entry path to the equivalent windows path, or vis versa). Then we can remove the leading file separator as before (but it works now cross platform) and handles the concern @frankfliu mentioned about / being a valid starting char for zip
  2. Validate that with this santized path, the expected output path exists under the expected output dir
static String sanitizeAndValidateArchiveEntry(String name, Path destination) throws IOException {
        String sanitizedName = removeLeadingFileSeparator(name);
        Path expectedOutputPath = destination.resolve(sanitizedName).normalize();
        if (!expectedOutputPath.startsWith(destination.normalize())) {
            throw new IOException(
                    "Bad archive entry "
                            + name
                            + ". Attempted write outside destination "
                            + destination);
        }
        return sanitizedName;
    }

static String removeLeadingFileSeparator(String name) {
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

@frankfliu
Copy link
Contributor

@si2d

in archive file, the path is always linux style, we don't really need use os specific Path to validate it. And we don't have to support special cases (even they are valid). We can just check if the entry name starts with "/" or contains "..", we treat them as invalid. This was the original algorithm. I'm curious why that cause test failure on windows.

@siddvenk
Copy link
Contributor Author

@si2d

in archive file, the path is always linux style, we don't really need use os specific Path to validate it. And we don't have to support special cases (even they are valid). We can just check if the entry name starts with "/" or contains "..", we treat them as invalid. This was the original algorithm. I'm curious why that cause test failure on windows.

File.separatorChar is os specific. On mac/linux it is /, on windows it is \. So on windows, if you had an archive entry like /tmp/test.txt, the removeLeadingFileSeparator would not actually remove / from the entry and cause the write at root

@siddvenk
Copy link
Contributor Author

@frankfliu with the existing logic (ignoring any of my changes here), this is what the testOffendingTar unit test produces on windows (i added some print statements, but otherwise logic is same).

Gradle suite > Gradle test > ai.djl.util.ZipUtilsTest > testOffendingTar STANDARD_OUT
    Entry before removing leading separator: /tmp/empty.txt
    Entry after removing leading separator: /tmp/empty.txt
    Writing archive entry to D:\tmp\empty.txt

The output dir in that test is build/output, and we would expect that we write to build/output/tmp/empty.txt, but that's not what happens because of the separatorChar issue i mentioned

@si2d
Copy link

si2d commented Nov 26, 2024

For Zip file, entry name starts with "/" is valid, and will be ignored by common zip utils. For Tar file, entry name starts with "/" is a security vulnerability, tar cli will untar to root folder.

I think it's OK we make it more strict that doesn't allows "/", but your changes makes a/../b.txt valid, which can overwrite b.txt in the same archive file.

I wonder - instead of trying to block this, should we only disallow overwrites? so even if the path is a/../b.txt, it will be allowed as long as nothing else writes to b.txt. We are explicitly setting REPLACE_EXISTING when unarchiving, but is overwriting files a valid use case? (Also, this change might break customers if they are somehow using it)

@siddvenk
Copy link
Contributor Author

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

@frankfliu
Copy link
Contributor

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

Right, we should not use File.separatorChar. Archive file format is not os specific

@frankfliu
Copy link
Contributor

vis.validate(set) is till required. There is a CVE. Basically, when user use winzip to open the file, the see: a.txt, but when they unzip to file system, they got b.txt. We need make sure the ZipEntry matches the ZipInputStream name.

@siddvenk
Copy link
Contributor Author

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

Right, we should not use File.separatorChar. Archive file format is not os specific

The original algorithm was using File.separatorChar, not /. I think it's still possible to craft a tar with \ as the path separator. Converting the entry name with separatorsToSystem, and then using File.separatorChar will work. We could also modify the condition to something like

if (name.charAt(index) != '/' && name.charAt(index) != '\\') {
 ...
}

@si2d
Copy link

si2d commented Nov 26, 2024

The simplest change that solves the issue is to keep everything the same, except for this modification to removeLeadingFileSeparator.

static String removeLeadingFileSeparator(String name) {
        // Below single line is the only change
        String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
        int index = 0;
        for (; index < osAwareArchiveEntryName.length(); index++) {
            if (osAwareArchiveEntryName.charAt(index) != File.separatorChar) {
                break;
            }
        }
        return osAwareArchiveEntryName.substring(index);
    }

Right, we should not use File.separatorChar. Archive file format is not os specific

The original algorithm was using File.separatorChar, not /. I think it's still possible to craft a tar with \ as the path separator. Converting the entry name with separatorsToSystem, and then using File.separatorChar will work. We could also modify the condition to something like

if (name.charAt(index) != '/' && name.charAt(index) != '\\') {
 ...
}

To me, checking that the path resolves to be within the destination is the more correct change rather than modifying our custom method. What is a reason to keep this change to a simplest change?

@siddvenk
Copy link
Contributor Author

siddvenk commented Nov 26, 2024

@si2d

To me, checking that the path resolves to be within the destination is the more correct change rather than modifying our custom method. What is a reason to keep this change to a simplest change?

I agree with you. I think we should be checking that the path resolves to be within the destination. I've kept that portion in and removed the check on "path contains .." since that would be redundant in my opinion.

The part that still seems open is whether we want to sanitize the archive entry at all. If it's possible and valid for zip archives to start with /, then we probably should be sanitizing each entry to remove leading file separators. Or, we can decide to be more strict and not allow that (which is handled implicitly by ensuring we're only writing under the expected output dir).

@siddvenk
Copy link
Contributor Author

@si2d @frankfliu I have updated the PR based on the above discussions. The changes are now

  1. convert the archive entry to system specific (convert path separators to the one used by the current system). we remove leading file separators using File.separatorChar as this should work post the initial conversion.
  2. validate that the expected output path for the entry exists under the specified output directory

This will fix the issue with the current code. Some additional things we may consider:

  • remove step 1 entirely. This will be more strict than what we have today, but I think that's ok.
  • don't replace existing files, or provide an option for user to specify whether to overwrite or not

static String removeLeadingFileSeparator(String name) {
String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name);
Copy link
Contributor

@frankfliu frankfliu Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. try to avoid dependency on commons.io
  2. If we always run sanitizeAndValidateArchiveEntry(), we only need to remove "/" char here, "\" issue will be caught by sanitizeAndValidateArchiveEntry()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why avoid commons.io? we're using it in a few places in this code path already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. in api module, the only place use commons.io is in TarUtils.java, commons-io is transient dependency from commons-compression recently added in 1.27.x, which is not intention of api project
  2. there are customers they don't use tar files, they can exclude commons-compression from their project. see: Make commons-compress an optional dependency #2949

api/src/main/java/ai/djl/util/ZipUtils.java Dismissed Show dismissed Hide dismissed
@siddvenk
Copy link
Contributor Author

Updated the PR - i've opted for the more strict approach where entries that start with '/' are invalid.

We validate that each archive entry will be written to a location under the provided output directory. If it won't (either because it starts with /, or containers .. traversal elements that would go outside the expected dir), we fail. Users should be expected to provide valid archives, and we don't do any special processing on them.

}
static void validateArchiveEntry(String name, Path destination) throws IOException {
Path expectedOutputPath = destination.resolve(name).normalize();
if (!expectedOutputPath.startsWith(destination.normalize())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better block ".." as well in this method. it prevent file overwrite inside the destination folder. In original version, we already blocking "..", and nobody complained about it.

Copy link
Contributor Author

@siddvenk siddvenk Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior of extracting a tar is to overwrite isn't it? I'm not sure why we need to differ.

If an archive mytar.tar had (in order)

b.txt
a/../b.txt

a/../b.txt would overwrite b.txt (e.g. using tar -xvf mytar.tar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and added it back to keep in line with what we had before, but curious to know what you think about my point above.

@siddvenk siddvenk force-pushed the windows-test branch 2 times, most recently from cd00a44 to 34d457a Compare November 27, 2024 01:46
@siddvenk siddvenk merged commit 7d197ba into master Nov 27, 2024
8 checks passed
@siddvenk siddvenk deleted the windows-test branch November 27, 2024 02:26
siddvenk added a commit that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants