Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader.shapes() fails on a particular shapefile #119

Open
anyeli opened this issue Oct 7, 2017 · 2 comments
Open

Reader.shapes() fails on a particular shapefile #119

anyeli opened this issue Oct 7, 2017 · 2 comments

Comments

@anyeli
Copy link

anyeli commented Oct 7, 2017

I'm trying to read this shapefile:

http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_boundary_lines_land.zip

The shapefile has 461 polyline records; shapefile.Reader will tell you this correctly via len() and shapeType. However, Reader.shapes() gives me 114 records and the last record has a type of 1075695572 instead of 3. When I use Reader.shape() on each individual record, it seems to do the right thing.

I'm not sure if the shapefile is technically broken, but if I can use it by reading it with shape(), I think I should be able to read it with shapes().

@GeospatialPython
Copy link
Owner

Interesting. The method used by Reader.shape() is slightly different from the shapes() method. In this particular shapefile, it appears a shape was deleted between shapes 113 and 114 at some point. The shape() method tries to ask the shx index file for the offset of the shape. If it can't find the shx file, it begins with the first shape record and loops through until it hits the end of the file. The other methods just loops through until the end without checking the shx. The shapefile spec allows you to do "lazy deletes" and remove a shape by deleting the data and just leaving a gap which is accounted for in the shx index. That way you don't have to rewrite the entire shapefile. But if you're looping through assuming a continuous series of shapes, it can cause this problem. I think most software does rewrite the whole shapefile after edits to avoid this kind of mess. Whatever software edited this shapefile used a lazy delete. A more robust method for pyshp would be to wrap shape() with shapes() to try and use the shx file if available. In all this time nobody has presented a file like this so it's good to have something to test.

@karimbahgat karimbahgat added the investigate Needs to be looked at more closely label Jun 23, 2018
@visr
Copy link

visr commented Sep 18, 2019

In all this time nobody has presented a file like this so it's good to have something to test.

Ha I came upon this issue looking for such a testfile, for a dbf reader in julia. But since they are so hard to find, I decided to just not support the deleted record marker fully yet.

It seems because of readers ignoring deleted record markers, they are more often packed away during writes. The linked Natural Earth shape now no longer contains them. QGIS also repacks the files after modifications. If you still want a test file, have a look at the attachments in https://issues.qgis.org/issues/11007#note-30.

@karimbahgat karimbahgat added enhancement and removed investigate Needs to be looked at more closely labels Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants