Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash code collision for Vector2D and Vector3D #227

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ivanshuba
Copy link

@ivanshuba ivanshuba commented Jan 16, 2024

This commit modifies the hashCode() method for Vector2D and the Vector3D classes in order to ensure unique values are generated for non-identical instances of vectors.

The proposed approach for calculating the hash code value is the same that is used in JTS and HE_Mesh projects. And as quoted in JTS project, this approach was initially 'borrowed' from the Joshua Bloch book 'Effective Java, 1st ed, 2008 ( the 2nd edition doesn't contain this recipe ).

In the "Item 9: Always override hashCode when you override equals" he describes the recipe as follows:

  1. Store some constant nonzero value, say 17, in an int variable called result.
  2. For each significant field f in your object, do the following: a. Compute an int hash code c for the field:
    i ...
    ii ...
    iii If the field is a long, compute (int)(f^(f>>>32))
    iv ...
    v If the field is a double, compute Double.doublToLongBits(f),
    and then hash the resulting long as in step 2.a.iii
    vi ...
    vii ...
    b. Combine the hash code c computed in step 2.a into result as
    follows:
    result = 31 * result + c

This commit modifies the `hashCode()` method for `Vector2D` and the
`Vector3D` classes in order to ensure unique values are generated for
non-identical instances of vectors.

The proposed approach for calculating the hash code value is the same
that is used in JTS and HE_Mesh projects. And as quoted in JTS project,
this approach was initially 'borrowed' from the Joshua Bloch book
'Effective Java' (1st ed, 2008).

In the "Item 9: Always override hashCode when you override equals" he
describes the recipe as follows:

1. Store some constant nonzero value, say 17, in an int variable called
   `result`.
2. For each significant field `f` in your object, do the following:
   a. Compute an int hash code `c` for the field:
      i   ...
      ii  ...
      iii If the field is a long, compute (int)(f^(f>>>32))
      iv  ...
      v   If the field is a double, compute Double.doublToLongBits(f),
          and then hash the resulting long as in step 2.a.iii
      vi  ...
      vii ...
   b. Combine the hash code `c` computed in step 2.a into `result` as
      follows:
        result = 31 * result + c
@aherbert
Copy link
Contributor

I would use int Double.hashCode(double) which effectively does the same thing but does not require as many statements, e.g.

    public int hashCode() {
        if (isNaN()) {
            return 642;
        }
        int result = Double.hashCode(x);
        result = 31 * result + Double.hashCode(y);
        result = 31 * result + Double.hashCode(z);
        return result;
    }

@ivanshuba
Copy link
Author

@aherbert
You're correct this is more readable while keeping the overall approach the same.
I've checked, and all tests are passing.

This commit uses calls to Double.hashCode() method in order to make the
code more readable while keeping the overall approach for calculating
the hash code value.
@ivanshuba
Copy link
Author

I've checked, and all tests are passing.

@aherbert Well, not all apparently. Only the ones I've added.

I'll try to find out what causes the GreatCircleTest to fail.

This commit uses the same approach we've used in Vector2D and Vector3D
to calculate hash code value for the GreatCircle class instances.

Previously, GreatCircle.hashCode was using the Objects.hash(Object ...)
wrapper method. In its turn, it has called the Arrays.hashCode(Object[])
method.
Unfortunately, this does not prevent hash collisions calculated for the
GreatCircle instances. Not investigated the cause yet. As a guess, this
might be related to that the Precision.DoubleEquivalence's hashCode()
method is not overridden.
@ivanshuba
Copy link
Author

@aherbert

GreatCircle class was using the Objects.hash() wrapper method to calculate hash value. In its turn, it was calling the Arrays.hashCode() method.

Unfortunately, somewhere during the process, this fails to prevent hash collisions when it's calculated for the GreatCircle instances. I haven't investigated the cause yet. As a guess, this might be related to that the Precision.DoubleEquivalence's hashCode() method is not overridden, not sure.

Anyways, the usage of the same approach as in Vector2D and Vector3D classes resolves the issue for the GreatCircle also.

@Override
public int hashCode() {
    int result = 1;
    long temp;
    temp = pole.hashCode();
    result = 31 * result + (int) (temp ^ (temp >>> 32));
    temp = u.hashCode();
    result = 31 * result + (int) (temp ^ (temp >>> 32));
    temp = v.hashCode();
    result = 31 * result + (int) (temp ^ (temp >>> 32));
    temp = getPrecision().hashCode();
    result = 31 * result + (int) (temp ^ (temp >>> 32));
    return result;
}

Additionally, I fixed my incorrect indentations and variable naming added previously to prevent check errors.

All tests are passing now when I run them through Maven.

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change GreatCircle, just GreatCircleTest (which has poor input data and invalid assumptions about the hash code).

@@ -338,7 +338,17 @@ public boolean eq(final GreatCircle other, final Precision.DoubleEquivalence pre
/** {@inheritDoc} */
@Override
public int hashCode() {
return Objects.hash(pole, u, v, getPrecision());
int result = 1;
long temp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp does not have to be a long here since all the hashCode() methods will return an int. As such all the xor shift folding is effectively just randomly flipping the bits based on the sign since (temp >>> 32) == 0 for positive or -1 (32-bits set) for negative int hash codes. The xor operation then flips the original int bits if they were negative.

The fact that this works over using Object.hashCode is that some of the hashCodes are negative. So your implementation uses these bit flipped before the addition and the hash is different, but still random.

I think the current implementation is fine. What should be changed is the test. This is assuming that non equal objects will have different hash codes. This is not true. The only contract is that equal objects have equal hash codes.

Since the hash code for the 3-D vectors use the xyx coordinates, using unit vectors results in some of the hash codes being the same. Perhaps try updating the vectors for objects b, c, d to use random xyz coords. The following lines should then pass (unless you are very unlucky with your random vectors).

        // Probably true (unless we are unlucky with the hashes of the vectors)
        Assertions.assertNotEquals(hash, b.hashCode());
        Assertions.assertNotEquals(hash, c.hashCode());
        Assertions.assertNotEquals(hash, d.hashCode());

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation is fine. What should be changed is the test.

At present, there is only one assertion causing the test to fail. For these two circles:

final GreatCircle c = GreatCircles.fromPoleAndU(Vector3D.Unit.PLUS_Z, Vector3D.Unit.MINUS_X, TEST_PRECISION);
final GreatCircle a = GreatCircles.fromPoleAndU(Vector3D.Unit.PLUS_Z, Vector3D.Unit.PLUS_X, TEST_PRECISION);

The test currently assumes the hashes must not be equal.

final int hash = a.hashCode();
Assertions.assertNotEquals(hash, c.hashCode());

I might be misunderstanding this point ( please help me clarify ), but it appears that these two circles are topologically identical meaning they occupy the same space.

If this interpretation is correct, we could simply modify the assumption and assert that the hashes of these two circles are expected to be the same.

Assertions.assertEquals(hash, c.hashCode());

With this small adjustment, the test should pass.

Modifid GreatCircleTest instead to allow all the test successfully pass.

See this comment thread for details:
apache#227 (review)
@ivanshuba ivanshuba requested a review from aherbert January 30, 2024 16:11
@@ -399,16 +399,24 @@ public boolean eq(final Vector3D vec, final Precision.DoubleEquivalence precisio

/**
* Get a hashCode for the vector.
* <p>All NaN values have the same hash code.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert formatting. It ad noise the the changes.

return 643 * (164 * Double.hashCode(x) + 3 * Double.hashCode(y) + Double.hashCode(z));
int result = 17;
long temp;
temp = Double.doubleToLongBits(x);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was fixed with a different linear combination of Double.hashCode. This seems to have reverted to using doubleToLongBits.

@@ -340,15 +340,14 @@ public boolean eq(final Vector2D vec, final Precision.DoubleEquivalence precisio
* Get a hashCode for the 2D coordinates.
* <p>
* All NaN values have the same hash code.</p>
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert formatting

@@ -664,7 +664,7 @@ void testHashCode() {
Assertions.assertEquals(hash, a.hashCode());

Assertions.assertNotEquals(hash, b.hashCode());
Assertions.assertNotEquals(hash, c.hashCode());
Assertions.assertEquals(hash, c.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two circles are the same then this should be tested using Assertions.assertEquals. If they are the same then the hash code must be the same. If the objects are not equal then the hash code may or may not collide; it doesn't matter.

Any test that uses not equals on hash codes for objects that are not equal can potentially fail due to random hash collision.

- in Vector3D.hashCode():
  - use Double.hashCode() instead of the Double.doubleToLongBits()
  - revert Javadoc formatting

- in Vector2D.hashCode(), revert Javadoc formatting
@ivanshuba ivanshuba requested a review from aherbert January 31, 2024 12:10
@ivanshuba
Copy link
Author

Hi @aherbert

Can I ask if you want me to fix anything to get this moving on?

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think the code changes to the has code functions are fine. I've suggested how to clean up the tests so we do not get sporadic failures using Math.random().

@@ -23,7 +23,6 @@
import java.util.Comparator;
import java.util.List;
import java.util.regex.Pattern;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this formatting.

@@ -339,6 +339,7 @@ public Vector3D reject(final Vector3D base) {
* frame with one of the axes in a predefined direction. The
* following example shows how to build a frame having the k axis
* aligned with the known vector u :
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this formatting.

@@ -664,7 +664,7 @@ void testHashCode() {
Assertions.assertEquals(hash, a.hashCode());

Assertions.assertNotEquals(hash, b.hashCode());
Assertions.assertNotEquals(hash, c.hashCode());
Assertions.assertEquals(hash, c.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is testing the hash codes are equal (because the objects are equal) then we should test the objects are equal:

// Equal objects have equal hash codes
Assertions.assertEquals(a, c);
Assertions.assertEquals(hash, c.hashCode());

@@ -1146,6 +1145,65 @@ void testEqualsAndHashCode_signedZeroConsistency() {
Assertions.assertEquals(b.hashCode(), d.hashCode());
}

@Test
void testHashCodeCollisions_symmetricAboutZero() {
final double any = Math.random();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this could create sporadic failures. We can formalise this using some random numbers:

@ParameterizedTest
@ValueSource(doubles = {1.23, 4.56, 42, Math.PI})
void testHashCodeCollisions_symmetricAboutZero(double any) {

int zNegHash = negZ.hashCode();
int zPosHash = posZ.hashCode();

String xMessage = String.format("negXHash: %s, posXHash: %s (expected to be non-equal)\n", xNegHash, xPosHash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be passed a suppliers to Assertions. However the Assertions will already have a statement containing the two values and that they were expected to be not equal. So we can just use:

Assertions.assertNotEquals(zNegHash, zPosHash, "+/- z hash");

String yMessage = String.format("negYHash: %s, posYHash: %s (expected to be non-equal)\n", yNegHash, yPosHash);
String zMessage = String.format("negZHash: %s, posZHash: %s (expected to be non-equal)\n", zNegHash, zPosHash);

Assertions.assertNotEquals(zNegHash, zPosHash, zMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these in z, y, x order? Makes more sense to use x, y, z, order.

}

@Test
void testHashCodeCollisions_symmetricAboutArbitraryValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as testHashCodeCollisions_symmetricAboutZero. Use a ParameterizedTest and simplify the assertion message.

@ParameterizedTest
@CsvSource(
   "1.23, 72.68",
   // etc
)
void testHashCodeCollisions_symmetricAboutArbitraryValue(double any, double arb) {

@@ -912,6 +911,53 @@ void testHashCode() {
Assertions.assertEquals(Vector2D.of(0, Double.NaN).hashCode(), Vector2D.of(Double.NaN, 0).hashCode());
}

@Test
void testHashCodeCollisions_symmetricAboutZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous. Use a ParameterizedTest and simplify the assertion message.

@singhbaljit
Copy link
Contributor

A little late to the party, but I think we should consider using https://github.com/jqno/equalsverifier to test hashCode and equals in a standard way, instead of writing all different permutations in the tests.

@darkma773r
Copy link
Contributor

Even later to the party here... :-)

@singhbaljit, I would be on board with trying out EqualsVerifier. We already have GEOMETRY-107 for this.

@aherbert, @ivanshuba, we are still waiting on the requested changes, correct?

@aherbert
Copy link
Contributor

The changes I requested to the tests have not been implemented. Without them the tests could sporadically fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants