[WIP] Extend KNN neighbor search beyond coincident sites #287

sjsrey · 2020-05-03T18:47:34Z

If the approach makes sense, it can be extended to Kernel weights and DistanceBand weights. The latter will need slightly different handling for symmetry.

codecov · 2020-05-03T18:52:43Z

Codecov Report

Merging #287 into master will increase coverage by 0.19%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   80.99%   81.19%   +0.19%     
==========================================
  Files         115      115              
  Lines       11580    11713     +133     
==========================================
+ Hits         9379     9510     +131     
- Misses       2201     2203       +2

Impacted Files	Coverage Δ
libpysal/weights/distance.py	`86.71% <100.00%> (+1.67%)`	⬆️
libpysal/weights/tests/test_distance.py	`97.18% <100.00%> (+0.09%)`	⬆️
libpysal/cg/alpha_shapes.py	`85.00% <0.00%> (-1.67%)`	⬇️
libpysal/weights/tests/test_weights.py	`99.65% <0.00%> (+<0.01%)`	⬆️
libpysal/weights/tests/test_util.py	`98.15% <0.00%> (+0.01%)`	⬆️
libpysal/weights/weights.py	`78.92% <0.00%> (+0.90%)`	⬆️
libpysal/cg/tests/test_ashapes.py	`96.49% <0.00%> (+1.03%)`	⬆️
libpysal/weights/util.py	`77.13% <0.00%> (+1.54%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14c7509...523b07f. Read the comment docs.

ljwolf · 2020-05-04T11:55:24Z

I would super prefer #285... is there something wrong with #285?

EDIT: Yes, I (a) forgot to deal with ids and (b) tested on a scenario where th number of coincident points was usually quite small relative to k, so you always got the self-index in the output. This is now resolved.

ljwolf · 2020-05-04T12:53:46Z

OK, to be clear, this does more than simply ensure that co-incident points are assigned valid (co-incident) neighbors, it forces the neighbors of co-incident points to be outside of the coincident set, correct?

ljwolf · 2020-05-04T12:55:10Z

libpysal/weights/distance.py

+        duplicates = duplicated(self.data)
+        coincident = duplicates[:,1].any()
+
+        self.duplicates = duplicates


Does this mean that we'd want all distance-style weights objects gain a duplicates attribute?

The thinking was to let the user know, somehow, that they have coincident points - duplicates is a bad choice in this regard as that might be taken to mean the records are identical in all attributes, whereas I think by coincident implies spatial duplicates.

ljwolf · 2020-05-04T13:13:47Z

This method either ignores or fails on ids.

import libpysal, numpy, uuid

coordinates = numpy.random.random(size=(5,2))
coordinates = numpy.row_stack([coordinates]*5)

ids = [uuid.uuid4().hex for _ in range(100)] 

#fails, since we pass ids as id_order to resolve sorting issues from the dataframe
w = libpysal.weights.KNN(coordinates, id_order=ids) 
# ignores the ids because of libpysal#284
w = libpysal.weights.KNN(coordinates, ids=ids)

tbf I think that's not really this issue's fault, but we'd need to address it.

ljwolf · 2020-05-04T13:37:52Z

libpysal/weights/distance.py

+            for row in duplicate_ids[0]:
+                neighbors[row] = neighbors[duplicates[row, 2]]
+            n = self.data.shape[0]
+            ids = list(range(n))


This forces ids to be 0,n-1. The other part of this if statement, where we keep coincident points, does not do this.... regardless, it means that this fails when we use KNN.from_dataframe on something with an index...

ljwolf · 2020-05-04T13:40:03Z

I wanted to add a remove_coincident=True argument that allows users to keep the coincident points but safely remove the self-neighbor. But, I can't really figure out how to get the ids/id_order stuff fixed without adding further changes.

pedrovma · 2020-05-04T13:43:44Z

Just so I can understand, what happens now if you have different apartment units in a building (same coordinates)? Would they be considered neighbors of each other or will they be assigned neighbors in other buildings?

sjsrey · 2020-05-04T13:44:50Z

I would super prefer #285... is there something wrong with #285?

EDIT: Yes, I (a) forgot to deal with ids and (b) tested on a scenario where th number of coincident points was usually quite small relative to k, so you always got the self-index in the output.

My bad was I didn't see #285 until after I did this.

sjsrey · 2020-05-04T13:52:46Z

OK, to be clear, this does more than simply ensure that co-incident points are assigned valid (co-incident) neighbors, it forces the neighbors of co-incident points to be outside of the coincident set, correct?

Yes, that was the intention.

sjsrey · 2020-05-04T13:57:40Z

Just so I can understand, what happens now if you have different apartment units in a building (same coordinates)? Would they be considered neighbors of each other or will they be assigned neighbors in other buildings?

This would have them assigned neighbors with non-zero distances.

There are two (at least) cases where this happens - the one you point out where the issue is the data lacks sufficient spatial information to spatially disambiguate units in the same apartment building, and the other is with repeat sales data. In the latter, the data may have a temporal attribute that can be used to differentiate the records.

ljwolf · 2020-05-04T14:06:07Z

the data may have a temporal attribute that can be used to differentiate the records

Ah, brilliant! never even thought that you could just send (n,3) arrays into the constructor!

That'd be very strongly sensitive to scaling, right? For data measured in UTM at a monthly timeframe, your inter-temporal neighbors are much "closer" than your spatial neighbors, right? UTM coordinates are typically in thousands of meters, whereas months will be 0...11.

ljwolf · 2020-05-04T14:17:20Z

I think I'd want to see

an option to force neighbors back to being drawn from the coincident set
ids to be handled
use the numpy-oriented fix in remove non-coincident points correctly from knn weights #285 instead of the for looping.

I also don't see how this needs to be ported to kernel/distance band weights, save for estimating the bandwidth of a kernel?

sjsrey · 2020-05-04T14:19:35Z

the data may have a temporal attribute that can be used to differentiate the records

Ah, brilliant! never even thought that you could just send (n,3) arrays into the constructor!

That'd be very strongly sensitive to scaling, right? For data measured in UTM at a monthly timeframe, your inter-temporal neighbors are much "closer" than your spatial neighbors, right? UTM coordinates are typically in thousands of meters, whereas months will be 0...11.

Just to be clear, I wasn't thinking that they would be passing in (n,3) as the data for the knn search, with one of the three being the temporal coordinate. That would raise all kinds of scaling issues that you point out.

The motivation for the approach in the PR was that coincident spatial points violate the law (I think it was Keith Clarke's) that no two events can occupy the same point at the same time. In data sets where at temporal attribute is missing, the coincident points imply this happens.

Maybe the suggested approach should be an option, rather than the default. Another option would be to jigger the coordinates of the coincident points prior to the search - but that also comes with some possible side-effects.

ljwolf · 2020-05-04T14:36:12Z

Maybe the suggested approach should be an option, rather than the default.

Yes, I think that's appropriate. I was trying to add a remove_coincident=False above and was running into the ids/id_order issues.

Another option would be to jigger the coordinates of the coincident points prior to the search

Sure, that'd be reasonable. I think the solution implemented in #285 though is just fine, and better than reducing the precision of the data through jittering.

ljwolf · 2021-12-10T13:52:41Z

I'd like to merge #285 to fix the problem, and then return to this as a "ignore_coincident" option for the constructor...

knaaptime · 2024-07-18T21:45:57Z

i think this is resovled by graph and can be closed

sjsrey added 6 commits May 3, 2020 09:06

ENH: handle coincident points in the case of KNN.from_dataframe

d212f6f

Move coincident handling into constructor

9a4d5f0

numpydoc format

a5e0dc1

tests for coincident points

2552d9e

add coincident nb

c391ce2

remove pandas check

523b07f

sjsrey requested a review from ljwolf May 3, 2020 18:47

sjsrey requested review from darribas, jGaboardi and knaaptime May 3, 2020 18:59

jGaboardi approved these changes May 3, 2020

View reviewed changes

knaaptime approved these changes May 3, 2020

View reviewed changes

ljwolf reviewed May 4, 2020

View reviewed changes

ljwolf changed the title ~~[WIP] Handle coincident points in KNN weights~~ [WIP] Extend KNN neighbor search beyond coincident sites May 4, 2020

ljwolf reviewed May 4, 2020

View reviewed changes

ljwolf added the weights label Oct 19, 2021

ljwolf mentioned this pull request Nov 16, 2021

Getting an error while trying to calculate knn weights from dataframe #441

Open

martinfleis changed the base branch from master to main February 27, 2023 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Extend KNN neighbor search beyond coincident sites #287

[WIP] Extend KNN neighbor search beyond coincident sites #287

sjsrey commented May 3, 2020

codecov bot commented May 3, 2020 •

edited

Loading

ljwolf commented May 4, 2020 •

edited

Loading

ljwolf commented May 4, 2020

ljwolf May 4, 2020 •

edited

Loading

sjsrey May 4, 2020

ljwolf commented May 4, 2020 •

edited

Loading

ljwolf May 4, 2020 •

edited

Loading

ljwolf commented May 4, 2020

pedrovma commented May 4, 2020

sjsrey commented May 4, 2020

sjsrey commented May 4, 2020

sjsrey commented May 4, 2020

ljwolf commented May 4, 2020

ljwolf commented May 4, 2020

sjsrey commented May 4, 2020

ljwolf commented May 4, 2020 •

edited

Loading

ljwolf commented Dec 10, 2021

knaaptime commented Jul 18, 2024

[WIP] Extend KNN neighbor search beyond coincident sites #287

Are you sure you want to change the base?

[WIP] Extend KNN neighbor search beyond coincident sites #287

Conversation

sjsrey commented May 3, 2020

codecov bot commented May 3, 2020 • edited Loading

Codecov Report

ljwolf commented May 4, 2020 • edited Loading

ljwolf commented May 4, 2020

ljwolf May 4, 2020 • edited Loading

Choose a reason for hiding this comment

sjsrey May 4, 2020

Choose a reason for hiding this comment

ljwolf commented May 4, 2020 • edited Loading

ljwolf May 4, 2020 • edited Loading

Choose a reason for hiding this comment

ljwolf commented May 4, 2020

pedrovma commented May 4, 2020

sjsrey commented May 4, 2020

sjsrey commented May 4, 2020

sjsrey commented May 4, 2020

ljwolf commented May 4, 2020

ljwolf commented May 4, 2020

sjsrey commented May 4, 2020

ljwolf commented May 4, 2020 • edited Loading

ljwolf commented Dec 10, 2021

knaaptime commented Jul 18, 2024

codecov bot commented May 3, 2020 •

edited

Loading

ljwolf commented May 4, 2020 •

edited

Loading

ljwolf May 4, 2020 •

edited

Loading

ljwolf commented May 4, 2020 •

edited

Loading

ljwolf May 4, 2020 •

edited

Loading

ljwolf commented May 4, 2020 •

edited

Loading