Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completed Grouping Implementation #193

Closed
wants to merge 35 commits into from

Conversation

NightOwl888
Copy link
Contributor

This adds the remaining Grouping classes, the Grouping tests, as well as the support classes (TreeSet and LinkedHashMap) required by Grouping. 12/12 test are passing.

There is also a sweeping bug fixed in the test framework - the TestUtil.RandomRealisticUnicodeString() method was returning only random series of numbers, not Unicode characters. Fortunately, the impact wasn't as bad as I thought it would be - there are ~6 additional tests solution-wide failing due to this fix.

I am posting this as a pull request rather than pushing it to master for 2 reasons:

  1. I am hoping it helps get Migrating Lucene.Net to .NET Core #191 merged to master sooner (@conniey, let me know if it would be more helpful or harmful to merge this before you finish the .NET core integration).
  2. There are a few things about this that need more work and further discussion.

API Refactoring

Due to the liberal use of Java wildcard generics, it was a challenge putting something together that would compile. One of the solutions I came up with was to use .NET covariance to fix some of the issues. Unfortunately, covariance in .NET is only supported on interfaces and the fact that the Lucene designers made everything into abstract classes didn't help. Basically, we now have abstract classes that are backed by another abstraction (interfaces).

Furthermore, since the Collector abstraction is also an abstract class rather than an interface, it means these Collector interfaces cannot also be Collectors (which means lots of casting). This is because interfaces cannot inherit abstract classes. So, there are 3 different ways to go from here to get rid of most of the casting:

  1. Change the Collector abstract class into an interface
  2. Create an interface to back the Collector, similar to what was done for Grouping
  3. Use a different solution than covariant interfaces to mimic Java wildcard generics

I am hoping someone can present another option than covariant interfaces that doesn't result in lots of casting, but that may not be a possibility. IMO, it would make more sense to change Collector into an interface (since it has no code and only public members) than to create a backing interface, which would help to avoid confusion over the double-abstraction.

What I mean is that pretty much every place where Collector abstract class is used now would need to change to the Collector interface to ensure that covariant Collector interfaces can inherit Collector, thus making them into a Collector. Keeping a Collector abstract class around after it is essentially no longer called out anywhere (except in subclasses) might be cause for confusion.

Also, it would be helpful if someone would work with Grouping and let us know of any other issues with the API that could be improved.

C5's TreeSet & TreeMap

I brought over the TreeSet and TreeDictionary from C5, along with the applicable tests. Unfortunately, this drags most of the C5 library along with it. We could alternatively reference it via NuGet, but there is not yet .NET core support (just an open pull request).

Also, despite being a "set", it does not implement ISet, which I have now done. It isn't a problem yet, but may eventually be needed.

LinkedHashMap

Unfortunately, the LurchTable that worked well for an LRU cache doesn't keep track of insertion order (even when you specify that option) and therefore doesn't replace LinkedHashMap as was advertised. So, I ended up modifying this solution (the second option) to be backed by HashMap rather than Dictionary so it can support null keys.

LinkedHashMap's main reason for being is that it keeps track of the insertion order of items. We have unwittingly replaced it with Dictionary in several places. The fact that the tests pass is happenstance - Dictionary doesn't guarantee insertion order, but if you add items only and don't delete and re-add, they will enumerate in insertion order. That said, there is nothing stopping someone from changing this behavior from one .NET version to another and we can't rely on it. We should go back and find all of the places where LinkedHashMap is used in Lucene where we are using Dictionary, replace, and test thoroughly.

… dictionary key based on values within the dictionary rather than by reference equality.
…sertion order enumeration of an IDictionary structure.
…lasses (with [Test] attributes) and commented the [Test] attributes in the abstract classes to keep the tests from running in the wrong context."

This reverts commit 4dbc359.
…sses (with [Test] attributes) and commented the [Test] attributes in the abstract classes to keep the tests from running in the wrong context.
…t.StringBuilderExtensions + added unit tests. Fixed buggy StringBuilder.Reverse() method to account for Unicode.
…() method that wasn't converting the code points into characters (so we were just getting random sequences of numbers instead of Unicode).
…variance so they act more like the wildcard generics that were used in Java.
… so it is similar to its original name, AbstractAllGroupHeadsCollector.GroupHead. Due to conflicts with other classes named GroupHead in derived classes of AbstractAllGroupHeadsCollector, the original name could not be used.
…ctDistinctValuesCollector class to nest IGroupCount and GroupCount so the syntax is similar to Lucene.
…ng algorithm. TimSort doesn't call the IComparer<T>.Compare() method unless the values differ. If it is called when they are the same, the TestGrouping.TestRandom() and AllGroupHeadsCollectorTest.TestRandom() tests fail. Added TODO about differences in sorting APIs.
@conniey
Copy link
Contributor

conniey commented Nov 9, 2016

@NightOwl888 Feel free to merge your PR into master whenever its ready. I am just cleaning up code and rerunning tests on my PR.

{
return false;
}
catch (NullReferenceException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you catching NullReferenceException here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same in the JDK - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/AbstractMap.java#AbstractMap.equals%28java.lang.Object%29

Basically, these overloads ensure that if HashMap is used as a dictionary key that they will be compared by their contained values rather than by reference (which is the default in .NET).

@NightOwl888
Copy link
Contributor Author

NightOwl888 commented Nov 10, 2016

I have now merged these changes to master.

Do note that there is a new commit where the test class override stubs can be reverted (if needed): a209d71

@NightOwl888
Copy link
Contributor Author

Oops, looks like the first attempt to push was rejected and I posted the wrong commit before. The new test class override rollback point is: a209d71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants