Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data error during extraction #57

Open
BiatuAutMiahn opened this issue Jan 1, 2019 · 20 comments
Open

Data error during extraction #57

BiatuAutMiahn opened this issue Jan 1, 2019 · 20 comments

Comments

@BiatuAutMiahn
Copy link

Error when running example:

Traceback (most recent call last):
  File "E:\Python\7ztest.py", line 38, in <module>
    sevenZfile.extractall('.')
  File "E:\Python\7ztest.py", line 33, in extractall
    outfile.write(self.archive.getmember(name).read())
  File "E:\Python\py7zlib.py", line 632, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "E:\Python\py7zlib.py", line 717, in _read_lzma2
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "E:\Python\py7zlib.py", line 688, in _read_from_decompressor
    data = decompressor.decompress(input)
ValueError: data error during decompression

Source

import py7zlib
import os

class SevenZFile(object):
    @classmethod
    def is_7zfile(cls, filepath):
        '''
        Class method: determine if file path points to a valid 7z archive.
        '''
        is7z = False
        fp = None
        try:
            fp = open(filepath, 'rb')
            archive = py7zlib.Archive7z(fp)
            n = len(archive.getnames())
            is7z = True
        finally:
            if fp:
                fp.close()
        return is7z

    def __init__(self, filepath):
        fp = open(filepath, 'rb')
        self.archive = py7zlib.Archive7z(fp)

    def extractall(self, path):
        for name in self.archive.getnames():
            outfilename = os.path.join(path, name)
            outdir = os.path.dirname(outfilename)
            if not os.path.exists(outdir):
                os.makedirs(outdir)
            outfile = open(outfilename, 'wb')
            outfile.write(self.archive.getmember(name).read())
            outfile.close()
			
if SevenZFile.is_7zfile('DP_LAN_Realtek-XP_18000.7z'):
    sevenZfile = SevenZFile('DP_LAN_Realtek-XP_18000.7z')
    sevenZfile.extractall('.')

@fancycode
Copy link
Owner

Could you please provide a (small) sample file that shows the error? Also which version of pylzma are you running?.

@Jimmy-Jon
Copy link

Jimmy-Jon commented Mar 22, 2019

Hello fancycode, thank you for responding to the above comment because I am having the same problem as the original poster. Here is my extractall function that I use on 7z archives:

 def extractall(path):
    with open(item, 'rb') as fp:
      archive = py7zlib.Archive7z(fp)
      for name in archive.getnames():
        outfilename = os.path.join(path, name)
        outdir = os.path.dirname(outfilename)
        if not os.path.exists(outdir):
          os.makedirs(outdir)
        with open(outfilename, 'wb') as outfile:
          acv = archive.getmember(name)
          outfile.write(acv.read())
       

     extractall(path=os,curdir)

I've narrowed down where the data errors come from, most notably from executable files (.exe on windows), which when I extract create the proper executable file name, but will be a zero-byte file. It also appears .dll files will completely fail to extract and return the data error. However, this function works perfectly with a folder full of JPEG wallpapers or XML files, dll and exe files are the only files I noticed to return errors. I really would like to use this function since it's recursive and can do folders/subfolders. If you need anything else to help towards narrowing out this problem just ask and I will see what I can provide.

@fancycode
Copy link
Owner

Master now supports various BCJ filter, could you please check if this solves the issues you were having?

@Jimmy-Jon
Copy link

This new addition works, but only up to 128 kilobytes, after it hits that limit it stops reading and creates zero-byte files again. I verified a 70 kilobyte file with Md5Checker and the hash was exact, it just seems there is this data limit imposed consistently, which applies to .dll and executables.

@fancycode
Copy link
Owner

Should be fixed with the latest change, could you please test again?

@Jimmy-Jon
Copy link

Works perfectly! A pythonic way to extract 7z archives is amazing. All .dll files and .exe file hashes match. If I uncover anything else I can submit a new error report, thank you!

@fancycode
Copy link
Owner

Great, thanks for reporting & testing!

@embray
Copy link

embray commented Apr 21, 2019

Incidentally I just had the same problem and this fixed it. Any chance of getting a new release on PyPI with this fix? Thanks!

@BiatuAutMiahn
Copy link
Author

Thanks @fancycode fir the fix, sorry for MIA catching up on subscriptions

@BiatuAutMiahn
Copy link
Author

BiatuAutMiahn commented Jun 14, 2019

Successfully installed pylzma-v0.5.0-17-gccb0dev

Traceback (most recent call last):
  File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 38, in <module>
    sevenZfile.extractall('.')
  File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 33, in extractall
    outfile.write(self.archive.getmember(name).read())
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 650, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 750, in _read_lzma2
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 717, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression

Process returned 1 (0x1)        execution time : 1.242 s
Press any key to continue . . .

Sources+7z file: https://drive.google.com/file/d/1YUgE15Tt2OS07X6Yzc_afc98UjRsfbQs/view?usp=sharing

@fancycode
Copy link
Owner

So this still happens on master for you 😞 Could you please provide a (small) file I can use for testing?

@fancycode fancycode reopened this Jun 14, 2019
@BiatuAutMiahn
Copy link
Author

Check my edit, included sources

@fancycode
Copy link
Owner

Thanks, I would need a .7z file that fails to decompress, not the Python source you are using for extracting.

@BiatuAutMiahn
Copy link
Author

It's in the archive I linked

@fancycode
Copy link
Owner

Oops, sorry that link didn't show up earlier. After an explicit refresh I can now see it. Thanks.

@BiatuAutMiahn
Copy link
Author

np, and ty :)

@BiatuAutMiahn
Copy link
Author

Any updates on this?

@BiatuAutMiahn
Copy link
Author

BiatuAutMiahn commented Dec 25, 2019

The first file compressed with only LZMA2 extracts just fine, but the binary with multiple coders-fails

image

Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF
{'digest': 824600073, '_start': 25372901, '_src_start': 32, '_folder': {'coders': [{'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}], 'digestdefined': False, 'totalout': 1, 'bindpairs': [], 'packed_indexes': [0], 'unpacksizes': [100331479], 'solid': True}, '_maxsize': 445821, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF', 'attributes': 32, 'compressed': 445821, '_uncompressed': [5726], 'size': 5726, 'uncompressed': 5726, 'pos': 0}


Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS
{'digest': 572653165, '_start': 68116417, '_src_start': 445853, '_folder': {'coders': [{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}], 'digestdefined': False, 'totalout': 4, 'bindpairs': [(5, 0), (4, 1), (3, 2)], 'packed_indexes': [2, 6, 1, 0], 'unpacksizes': [1450568, 5837200, 79220364, 86508132], 'solid': True}, '_maxsize': 16244592, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS', 'attributes': 32, 'compressed': 16244592, '_uncompressed': [21504, 21504, 21504, 21504], 'size': 21504, 'uncompressed': 21504, 'pos': 0}
[{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}]Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression

Traceback (most recent call last):
  File "...\idx.py", line 74, in <module>
    sevenZfile.extractall('.')
  File "...\idx.py", line 66, in extractall
    md=mn.read()
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 658, in read
    raise e
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)

@BiatuAutMiahn
Copy link
Author

Are there any updates on this? This still fails

@BiatuAutMiahn
Copy link
Author

More info:
I pulled 7zdec.exe from lzma1900.7z and compressed it with 7zFM, Ultra Compression. Ultra completely chokes when decoding this.
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants