-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hash code collision for Vector2D and Vector3D #227
base: master
Are you sure you want to change the base?
Fix hash code collision for Vector2D and Vector3D #227
Conversation
This commit modifies the `hashCode()` method for `Vector2D` and the `Vector3D` classes in order to ensure unique values are generated for non-identical instances of vectors. The proposed approach for calculating the hash code value is the same that is used in JTS and HE_Mesh projects. And as quoted in JTS project, this approach was initially 'borrowed' from the Joshua Bloch book 'Effective Java' (1st ed, 2008). In the "Item 9: Always override hashCode when you override equals" he describes the recipe as follows: 1. Store some constant nonzero value, say 17, in an int variable called `result`. 2. For each significant field `f` in your object, do the following: a. Compute an int hash code `c` for the field: i ... ii ... iii If the field is a long, compute (int)(f^(f>>>32)) iv ... v If the field is a double, compute Double.doublToLongBits(f), and then hash the resulting long as in step 2.a.iii vi ... vii ... b. Combine the hash code `c` computed in step 2.a into `result` as follows: result = 31 * result + c
I would use public int hashCode() {
if (isNaN()) {
return 642;
}
int result = Double.hashCode(x);
result = 31 * result + Double.hashCode(y);
result = 31 * result + Double.hashCode(z);
return result;
} |
@aherbert |
This commit uses calls to Double.hashCode() method in order to make the code more readable while keeping the overall approach for calculating the hash code value.
@aherbert Well, not all apparently. Only the ones I've added. I'll try to find out what causes the |
This commit uses the same approach we've used in Vector2D and Vector3D to calculate hash code value for the GreatCircle class instances. Previously, GreatCircle.hashCode was using the Objects.hash(Object ...) wrapper method. In its turn, it has called the Arrays.hashCode(Object[]) method. Unfortunately, this does not prevent hash collisions calculated for the GreatCircle instances. Not investigated the cause yet. As a guess, this might be related to that the Precision.DoubleEquivalence's hashCode() method is not overridden.
Unfortunately, somewhere during the process, this fails to prevent hash collisions when it's calculated for the Anyways, the usage of the same approach as in @Override
public int hashCode() {
int result = 1;
long temp;
temp = pole.hashCode();
result = 31 * result + (int) (temp ^ (temp >>> 32));
temp = u.hashCode();
result = 31 * result + (int) (temp ^ (temp >>> 32));
temp = v.hashCode();
result = 31 * result + (int) (temp ^ (temp >>> 32));
temp = getPrecision().hashCode();
result = 31 * result + (int) (temp ^ (temp >>> 32));
return result;
} Additionally, I fixed my incorrect indentations and variable naming added previously to prevent check errors. All tests are passing now when I run them through Maven. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to change GreatCircle, just GreatCircleTest (which has poor input data and invalid assumptions about the hash code).
@@ -338,7 +338,17 @@ public boolean eq(final GreatCircle other, final Precision.DoubleEquivalence pre | |||
/** {@inheritDoc} */ | |||
@Override | |||
public int hashCode() { | |||
return Objects.hash(pole, u, v, getPrecision()); | |||
int result = 1; | |||
long temp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp
does not have to be a long
here since all the hashCode()
methods will return an int
. As such all the xor shift folding is effectively just randomly flipping the bits based on the sign since (temp >>> 32) == 0
for positive or -1
(32-bits set) for negative int
hash codes. The xor operation then flips the original int bits if they were negative.
The fact that this works over using Object.hashCode is that some of the hashCodes are negative. So your implementation uses these bit flipped before the addition and the hash is different, but still random.
I think the current implementation is fine. What should be changed is the test. This is assuming that non equal objects will have different hash codes. This is not true. The only contract is that equal objects have equal hash codes.
Since the hash code for the 3-D vectors use the xyx coordinates, using unit vectors results in some of the hash codes being the same. Perhaps try updating the vectors for objects b, c, d to use random xyz coords. The following lines should then pass (unless you are very unlucky with your random vectors).
// Probably true (unless we are unlucky with the hashes of the vectors)
Assertions.assertNotEquals(hash, b.hashCode());
Assertions.assertNotEquals(hash, c.hashCode());
Assertions.assertNotEquals(hash, d.hashCode());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation is fine. What should be changed is the test.
At present, there is only one assertion causing the test to fail. For these two circles:
final GreatCircle c = GreatCircles.fromPoleAndU(Vector3D.Unit.PLUS_Z, Vector3D.Unit.MINUS_X, TEST_PRECISION);
final GreatCircle a = GreatCircles.fromPoleAndU(Vector3D.Unit.PLUS_Z, Vector3D.Unit.PLUS_X, TEST_PRECISION);
The test currently assumes the hashes must not be equal.
final int hash = a.hashCode();
Assertions.assertNotEquals(hash, c.hashCode());
I might be misunderstanding this point ( please help me clarify ), but it appears that these two circles are topologically identical meaning they occupy the same space.
If this interpretation is correct, we could simply modify the assumption and assert that the hashes of these two circles are expected to be the same.
Assertions.assertEquals(hash, c.hashCode());
With this small adjustment, the test should pass.
Modifid GreatCircleTest instead to allow all the test successfully pass. See this comment thread for details: apache#227 (review)
@@ -399,16 +399,24 @@ public boolean eq(final Vector3D vec, final Precision.DoubleEquivalence precisio | |||
|
|||
/** | |||
* Get a hashCode for the vector. | |||
* <p>All NaN values have the same hash code.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert formatting. It ad noise the the changes.
return 643 * (164 * Double.hashCode(x) + 3 * Double.hashCode(y) + Double.hashCode(z)); | ||
int result = 17; | ||
long temp; | ||
temp = Double.doubleToLongBits(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was fixed with a different linear combination of Double.hashCode
. This seems to have reverted to using doubleToLongBits
.
@@ -340,15 +340,14 @@ public boolean eq(final Vector2D vec, final Precision.DoubleEquivalence precisio | |||
* Get a hashCode for the 2D coordinates. | |||
* <p> | |||
* All NaN values have the same hash code.</p> | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert formatting
@@ -664,7 +664,7 @@ void testHashCode() { | |||
Assertions.assertEquals(hash, a.hashCode()); | |||
|
|||
Assertions.assertNotEquals(hash, b.hashCode()); | |||
Assertions.assertNotEquals(hash, c.hashCode()); | |||
Assertions.assertEquals(hash, c.hashCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the two circles are the same then this should be tested using Assertions.assertEquals
. If they are the same then the hash code must be the same. If the objects are not equal then the hash code may or may not collide; it doesn't matter.
Any test that uses not equals on hash codes for objects that are not equal can potentially fail due to random hash collision.
- in Vector3D.hashCode(): - use Double.hashCode() instead of the Double.doubleToLongBits() - revert Javadoc formatting - in Vector2D.hashCode(), revert Javadoc formatting
Hi @aherbert Can I ask if you want me to fix anything to get this moving on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think the code changes to the has code functions are fine. I've suggested how to clean up the tests so we do not get sporadic failures using Math.random()
.
@@ -23,7 +23,6 @@ | |||
import java.util.Comparator; | |||
import java.util.List; | |||
import java.util.regex.Pattern; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this formatting.
@@ -339,6 +339,7 @@ public Vector3D reject(final Vector3D base) { | |||
* frame with one of the axes in a predefined direction. The | |||
* following example shows how to build a frame having the k axis | |||
* aligned with the known vector u : | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this formatting.
@@ -664,7 +664,7 @@ void testHashCode() { | |||
Assertions.assertEquals(hash, a.hashCode()); | |||
|
|||
Assertions.assertNotEquals(hash, b.hashCode()); | |||
Assertions.assertNotEquals(hash, c.hashCode()); | |||
Assertions.assertEquals(hash, c.hashCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is testing the hash codes are equal (because the objects are equal) then we should test the objects are equal:
// Equal objects have equal hash codes
Assertions.assertEquals(a, c);
Assertions.assertEquals(hash, c.hashCode());
@@ -1146,6 +1145,65 @@ void testEqualsAndHashCode_signedZeroConsistency() { | |||
Assertions.assertEquals(b.hashCode(), d.hashCode()); | |||
} | |||
|
|||
@Test | |||
void testHashCodeCollisions_symmetricAboutZero() { | |||
final double any = Math.random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this could create sporadic failures. We can formalise this using some random numbers:
@ParameterizedTest
@ValueSource(doubles = {1.23, 4.56, 42, Math.PI})
void testHashCodeCollisions_symmetricAboutZero(double any) {
int zNegHash = negZ.hashCode(); | ||
int zPosHash = posZ.hashCode(); | ||
|
||
String xMessage = String.format("negXHash: %s, posXHash: %s (expected to be non-equal)\n", xNegHash, xPosHash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be passed a suppliers to Assertions. However the Assertions will already have a statement containing the two values and that they were expected to be not equal. So we can just use:
Assertions.assertNotEquals(zNegHash, zPosHash, "+/- z hash");
String yMessage = String.format("negYHash: %s, posYHash: %s (expected to be non-equal)\n", yNegHash, yPosHash); | ||
String zMessage = String.format("negZHash: %s, posZHash: %s (expected to be non-equal)\n", zNegHash, zPosHash); | ||
|
||
Assertions.assertNotEquals(zNegHash, zPosHash, zMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these in z, y, x order? Makes more sense to use x, y, z, order.
} | ||
|
||
@Test | ||
void testHashCodeCollisions_symmetricAboutArbitraryValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as testHashCodeCollisions_symmetricAboutZero. Use a ParameterizedTest and simplify the assertion message.
@ParameterizedTest
@CsvSource(
"1.23, 72.68",
// etc
)
void testHashCodeCollisions_symmetricAboutArbitraryValue(double any, double arb) {
@@ -912,6 +911,53 @@ void testHashCode() { | |||
Assertions.assertEquals(Vector2D.of(0, Double.NaN).hashCode(), Vector2D.of(Double.NaN, 0).hashCode()); | |||
} | |||
|
|||
@Test | |||
void testHashCodeCollisions_symmetricAboutZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous. Use a ParameterizedTest and simplify the assertion message.
A little late to the party, but I think we should consider using https://github.com/jqno/equalsverifier to test |
Even later to the party here... :-) @singhbaljit, I would be on board with trying out EqualsVerifier. We already have GEOMETRY-107 for this. @aherbert, @ivanshuba, we are still waiting on the requested changes, correct? |
The changes I requested to the tests have not been implemented. Without them the tests could sporadically fail. |
This commit modifies the
hashCode()
method forVector2D
and theVector3D
classes in order to ensure unique values are generated for non-identical instances of vectors.The proposed approach for calculating the hash code value is the same that is used in JTS and HE_Mesh projects. And as quoted in JTS project, this approach was initially 'borrowed' from the Joshua Bloch book 'Effective Java, 1st ed, 2008 ( the 2nd edition doesn't contain this recipe ).
In the "Item 9: Always override hashCode when you override equals" he describes the recipe as follows:
result
.f
in your object, do the following: a. Compute anint
hash codec
for the field:i ...
ii ...
iii If the field is a
long
, compute(int)(f^(f>>>32))
iv ...
v If the field is a
double
, computeDouble.doublToLongBits(f)
,and then hash the resulting long as in step 2.a.iii
vi ...
vii ...
b. Combine the hash code
c
computed in step 2.a intoresult
asfollows:
result = 31 * result + c