Reader.shapes() fails on a particular shapefile #119

anyeli · 2017-10-07T14:35:34Z

I'm trying to read this shapefile:

http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_boundary_lines_land.zip

The shapefile has 461 polyline records; shapefile.Reader will tell you this correctly via len() and shapeType. However, Reader.shapes() gives me 114 records and the last record has a type of 1075695572 instead of 3. When I use Reader.shape() on each individual record, it seems to do the right thing.

I'm not sure if the shapefile is technically broken, but if I can use it by reading it with shape(), I think I should be able to read it with shapes().

GeospatialPython · 2017-11-05T04:14:48Z

Interesting. The method used by Reader.shape() is slightly different from the shapes() method. In this particular shapefile, it appears a shape was deleted between shapes 113 and 114 at some point. The shape() method tries to ask the shx index file for the offset of the shape. If it can't find the shx file, it begins with the first shape record and loops through until it hits the end of the file. The other methods just loops through until the end without checking the shx. The shapefile spec allows you to do "lazy deletes" and remove a shape by deleting the data and just leaving a gap which is accounted for in the shx index. That way you don't have to rewrite the entire shapefile. But if you're looping through assuming a continuous series of shapes, it can cause this problem. I think most software does rewrite the whole shapefile after edits to avoid this kind of mess. Whatever software edited this shapefile used a lazy delete. A more robust method for pyshp would be to wrap shape() with shapes() to try and use the shx file if available. In all this time nobody has presented a file like this so it's good to have something to test.

visr · 2019-09-18T18:08:23Z

In all this time nobody has presented a file like this so it's good to have something to test.

Ha I came upon this issue looking for such a testfile, for a dbf reader in julia. But since they are so hard to find, I decided to just not support the deleted record marker fully yet.

It seems because of readers ignoring deleted record markers, they are more often packed away during writes. The linked Natural Earth shape now no longer contains them. QGIS also repacks the files after modifications. If you still want a test file, have a look at the attachments in https://issues.qgis.org/issues/11007#note-30.

karimbahgat added the investigate Needs to be looked at more closely label Jun 23, 2018

karimbahgat added enhancement and removed investigate Needs to be looked at more closely labels Feb 2, 2022

karimbahgat added this to the Adding basic editing capabilities milestone Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reader.shapes() fails on a particular shapefile #119

Reader.shapes() fails on a particular shapefile #119

anyeli commented Oct 7, 2017

GeospatialPython commented Nov 5, 2017

visr commented Sep 18, 2019

Reader.shapes() fails on a particular shapefile #119

Reader.shapes() fails on a particular shapefile #119

Comments

anyeli commented Oct 7, 2017

GeospatialPython commented Nov 5, 2017

visr commented Sep 18, 2019