Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an error while trying to calculate knn weights from dataframe #441

Open
revanthApricelabs opened this issue Nov 15, 2021 · 7 comments

Comments

@revanthApricelabs
Copy link

revanthApricelabs commented Nov 15, 2021

Hey .Im using python 3.8.10,pysal 2.5,scipy 1.7.1,numpy 1.20.0 on wsl2 in windows 11.
I am getting the following error while trying to compute knn weights :

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_177/221878346.py in <module>
----> 1 w = weights.KNN.from_dataframe(geo_df, k=50,distance_metric='arc',radius = 6378.1)
      2 w.transform = 'R'

~/.local/lib/python3.8/site-packages/libpysal/weights/distance.py in from_dataframe(cls, df, geom_col, ids, *args, **kwargs)
    277         elif isinstance(ids, str):
    278             ids = df[ids].tolist()
--> 279         return cls(pts, *args, ids=ids, **kwargs)
    280 
    281     def reweight(self, k=None, p=None, new_data=None, new_ids=None, inplace=True):

~/.local/lib/python3.8/site-packages/libpysal/weights/distance.py in __init__(self, data, k, p, ids, radius, distance_metric, **kwargs)
    121         for i, row in enumerate(to_weight):
    122             row = row.tolist()
--> 123             row.remove(i)
    124             row = [ids[j] for j in row]
    125             focal = ids[i]

ValueError: list.remove(x): x not in list

It worked on other datafiles but keeps failing on this particular dataset.
Also the distance_metric for the KNN.from_dataframe function should be 'arc' in the tutorials containing geopandas dataframes
sample_file.csv

@martinfleis
Copy link
Member

@revanthApricelabs can you share the data? It would help a lot when debugging this!

@revanthApricelabs
Copy link
Author

@revanthApricelabs can you share the data? It would help a lot when debugging this!

Updated the file. Let me know if you have any issues with it

@knaaptime
Copy link
Member

knaaptime commented Nov 15, 2021

i think this is usually related to a mismatched index between the df and the weights constructor. Try setting either the ids or id_order arguments (but not both!) and see if that gets you around the error

if thats indeed the root cause, this has been a pain point for a while and should be addressed by some changes we have planned for the library's internals

@revanthApricelabs
Copy link
Author

Nope still getting same issues even after setting either the ids or id_order arguments.
A Quick fix would be:
change the following lines from

              row.remove(i)
              row = [ids[j] for j in row]

to

#            row.remove(i)
              row = [ids[j] for j in row if j!=i]

@martinfleis
Copy link
Member

I can confirm the issue, using the following snippet. It doesn't even matter which distance metric is used. I'll explore the cause.

df = pandas.read_csv("sample_file.csv")
df = geopandas.GeoDataFrame(df, geometry=geopandas.GeoSeries.from_wkt(df.geo_polygon), crs=4326)
w = libpysal.weights.KNN.from_dataframe(df, k=50)

@ljwolf
Copy link
Member

ljwolf commented Nov 16, 2021

I think this is related to coincident points! (e.g. #164, pysal#941 and I think five others). We're expecting the x to be in the lists returned from the kdtree, and (when points are coincident) there is no guarantee that x will be in its list of neighbors. The patch from @revanthApricelabs works

@ljwolf
Copy link
Member

ljwolf commented Nov 16, 2021

The patch I think was applied in #285 but stalled. I don't know why it did? The second approver should merge, according to our workflow. @sjsrey also explored a second solution in #287, but that one has very different semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants