-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completed Grouping Implementation #193
Conversation
…ort.DataOutputStream tests
…the original C5 implementation
… dictionary key based on values within the dictionary rather than by reference equality.
…sertion order enumeration of an IDictionary structure.
…lasses (with [Test] attributes) and commented the [Test] attributes in the abstract classes to keep the tests from running in the wrong context." This reverts commit 4dbc359.
…sses (with [Test] attributes) and commented the [Test] attributes in the abstract classes to keep the tests from running in the wrong context.
… property was not being set to true.
…t.StringBuilderExtensions + added unit tests. Fixed buggy StringBuilder.Reverse() method to account for Unicode.
…() method that wasn't converting the code points into characters (so we were just getting random sequences of numbers instead of Unicode).
…using the TreeSet.
…variance so they act more like the wildcard generics that were used in Java.
…s in place of GroupDocs on TopGroups.
…e<T> does not automatically bring it along.
… so it is similar to its original name, AbstractAllGroupHeadsCollector.GroupHead. Due to conflicts with other classes named GroupHead in derived classes of AbstractAllGroupHeadsCollector, the original name could not be used.
…ctDistinctValuesCollector class to nest IGroupCount and GroupCount so the syntax is similar to Lucene.
…ng algorithm. TimSort doesn't call the IComparer<T>.Compare() method unless the values differ. If it is called when they are the same, the TestGrouping.TestRandom() and AllGroupHeadsCollectorTest.TestRandom() tests fail. Added TODO about differences in sorting APIs.
…ads() -> CollectedGroupHeads (property)
@NightOwl888 Feel free to merge your PR into master whenever its ready. I am just cleaning up code and rerunning tests on my PR. |
{ | ||
return false; | ||
} | ||
catch (NullReferenceException) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you catching NullReferenceException here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same in the JDK - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/AbstractMap.java#AbstractMap.equals%28java.lang.Object%29
Basically, these overloads ensure that if HashMap is used as a dictionary key that they will be compared by their contained values rather than by reference (which is the default in .NET).
I have now merged these changes to master. Do note that there is a new commit where the test class override stubs can be reverted (if needed): a209d71 |
Oops, looks like the first attempt to push was rejected and I posted the wrong commit before. The new test class override rollback point is: a209d71 |
This adds the remaining Grouping classes, the Grouping tests, as well as the support classes (
TreeSet
andLinkedHashMap
) required by Grouping. 12/12 test are passing.There is also a sweeping bug fixed in the test framework - the
TestUtil.RandomRealisticUnicodeString()
method was returning only random series of numbers, not Unicode characters. Fortunately, the impact wasn't as bad as I thought it would be - there are ~6 additional tests solution-wide failing due to this fix.I am posting this as a pull request rather than pushing it to master for 2 reasons:
API Refactoring
Due to the liberal use of Java wildcard generics, it was a challenge putting something together that would compile. One of the solutions I came up with was to use .NET covariance to fix some of the issues. Unfortunately, covariance in .NET is only supported on interfaces and the fact that the Lucene designers made everything into abstract classes didn't help. Basically, we now have abstract classes that are backed by another abstraction (interfaces).
Furthermore, since the Collector abstraction is also an abstract class rather than an interface, it means these Collector interfaces cannot also be Collectors (which means lots of casting). This is because interfaces cannot inherit abstract classes. So, there are 3 different ways to go from here to get rid of most of the casting:
I am hoping someone can present another option than covariant interfaces that doesn't result in lots of casting, but that may not be a possibility. IMO, it would make more sense to change Collector into an interface (since it has no code and only public members) than to create a backing interface, which would help to avoid confusion over the double-abstraction.
What I mean is that pretty much every place where Collector abstract class is used now would need to change to the Collector interface to ensure that covariant Collector interfaces can inherit Collector, thus making them into a Collector. Keeping a Collector abstract class around after it is essentially no longer called out anywhere (except in subclasses) might be cause for confusion.
Also, it would be helpful if someone would work with Grouping and let us know of any other issues with the API that could be improved.
C5's TreeSet & TreeMap
I brought over the TreeSet and TreeDictionary from C5, along with the applicable tests. Unfortunately, this drags most of the C5 library along with it. We could alternatively reference it via NuGet, but there is not yet .NET core support (just an open pull request).
Also, despite being a "set", it does not implement
ISet
, which I have now done. It isn't a problem yet, but may eventually be needed.LinkedHashMap
Unfortunately, the
LurchTable
that worked well for an LRU cache doesn't keep track of insertion order (even when you specify that option) and therefore doesn't replaceLinkedHashMap
as was advertised. So, I ended up modifying this solution (the second option) to be backed byHashMap
rather thanDictionary
so it can supportnull
keys.LinkedHashMap
's main reason for being is that it keeps track of the insertion order of items. We have unwittingly replaced it withDictionary
in several places. The fact that the tests pass is happenstance -Dictionary
doesn't guarantee insertion order, but if you add items only and don't delete and re-add, they will enumerate in insertion order. That said, there is nothing stopping someone from changing this behavior from one .NET version to another and we can't rely on it. We should go back and find all of the places whereLinkedHashMap
is used in Lucene where we are using Dictionary, replace, and test thoroughly.