-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for LineFileDocs Bottleneck/Performance Improvements #325
Conversation
…t is GZip, unzip to a temp file instead of using a MemoryStream to save RAM
…e: Unzip LineFileDocs at the class level instead of in each test
…ek to line break by reading the stream in chunks instead of 8 bytes at a time.
…is zipped at the class level, so we don't have so much overhead with selecting a random line in the file.
…esWriter::AddBinaryField(): Removed unnecessary array allocation
…w subclasses to specify whether to extract the LineDocsFile to a temp file at the class level. Changed LuceneTestFrameworkInitializer to automatically extract LineDocsFile to a temp file if it is overridden and the specified file is zipped. Also fixed all attributes in LuceneTestCase to correctly allow LuceneTestFrameworkInitializer to be called by NUnit before attempting to call LuceneTestFrameworkInitializer.EnsureInitialized().
… NUnit to unzip the LineDocsFile only 1 time per test run instead of on each class or each test.
… same way that Lucene does, except using a BufferedStream to improve performance.
…varying widely in completion time due to threading contention or deadlock
…ace() rather than System.Char.IsWhiteSpace(), which may return different results.
…nd added some optimizations to Core/TestFactories
…asses to using the Analyzer.NewAnonymous() method inline.
…stead of list for sorting tests
…h that from RavenDB, with permission from its maintainers (https://issues.apache.org/jira/browse/LUCENENET-640?focusedCommentId=17033146&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17033146) (closes apache#251)
I've run all tests locally twice with these changes. The first time was fine but the second time yielded one error:
Error
Full output
|
Thanks for the info. I haven't been running the tests in debug mode, and as I mentioned here, there are a suite of tests that are enabled in debug mode we have been skipping. Once we fix the ability to "turn on assert" in release mode, there will probably be some problems to address that we haven't been aware of because all of the assert "tests" are currently being skipped. Of course, it would be best to fix it that way. Currently, both the "Verbose" setting and the asserts are turned on when running in debug mode. Verbose makes it painfully slow. It would be better if we could turn each of them on and off independently. Not that this problem isn't concerning, but I suspect that it isn't a new problem that was introduced by this PR, which can be confirmed by running the master branch in debug mode. We should probably also check out each tag to determine which release this first appeared in, as I am sure that earlier betas did not throw this exception. When we released 4.8.0-beta00004, the tests were run in debug mode - we switched to running them in release mode at a later point (for sure by 4.8.0-beta00007). IIRC, the primary reason for the switch was because Microsoft changed the behavior of That being said, if we can confirm this PR isn't introducing any new problems (other than the threading contention and ICU test failures, which are also on our radar to fix), we can merge it. |
I confirmed that the error exists in the master branch, and I got it to occur on I also noticed that if the repeat attribute is changed to 10,000 times it will cause an I also saw this error occur from the same test:
I have opened #326 to put a higher priority on this task - we need to get the "asserts" working in Release mode to ensure that our test suite is testing everything that it is supposed to so more issues like this don't creep in without us being aware of them. |
… exception when using OpenBitSet.FastGet() instead of OpenBitSet.Get(), since the size of the bit set is unknown.
Yes of course I forgot about the release vs debug. I've just compiled and re-ran all tests locally in Release and I get 2 other failing tests. I'll keep re-running them to see what happens. TestThaiAnalyzerTestRandomHugeStrings
TestRandomStrings
|
Just ran it again and get a different failure. I'm going to close the solution, rebuild and re-run and see what happens, not sure if these are false positives or not. TestICUTokenizerCJK.csTestRandomStrings
|
…ler completely from random testing
…es outside of the loops that they were nested in so they can be reused.
…ch BytesRef() out of the loop (as it is in Lucene)
…de of the loops they were nested in so they can be reused (like in Lucene)
…ed to use ReaderWriterLockSlim to make reads more efficient
…der: Refactored to use ReaderWriterLockSlim to make reads more efficient
…ter: Refactored to use ReaderWriterLockSlim to make reads more efficient and used LazyInitializer for readerManger and taxoArrays
…proper dispose pattern
…ock(this), implemented proper dispose pattern
…sScorer from patch in Lucene 4.10.4 https://issues.apache.org/jira/browse/LUCENE-6001 (fixes apache#274)
…ed all members to be virtual to allow users to provide their own LRU cache.
…ds from public API, avoid lock (this)
…locking on Dispose() method and made it safe to call dispose multiple times
…se(bool) and implemented proper dispose pattern. Avoid lock (this).
…se LazyInitializer and avoid lock (this)
…d, inlined TryGetValue out variable declarations, cleaned up namespaces, added operator declarations (where recommended), marked fields readonly, removed unused fields, cleaned up line spacing
…ameIntCacheLru, NameHashInt32CacheLRU > NameHashInt32CacheLru. Refactored to utilize a generic type internally using composition to avoid boxing/unboxing without exposing the generic closing type publicly. Added public INameInt32CacheLru as a common interface between NameIntCacheLru and NameHashInt32CacheLru.
…drenIterator into ChildrenEnumerator
…gth to a property
… generic struct that can be used in both TopOrdAndSingleQueue and TopOrdAndInt32Queue. Added Insert method to PriorityQueue to allow adding value types without reading the previous value for reuse.
…ter: Reverted back to using lightweight locking instead of using ReaderWriterLockSlim and LazyInitializer, as performance is better in this configuration.
… out call to EnterUpgradeableReadLock() so reads can happen concurrently.
… BytesRef instantiation outside of outer loop, which significantly improves performance
…er has unfair locking on Monitor.TryEnter(), the code was restructured to disallow any thread that doesn't have a lock into InternalTryCheckoutForFlush(). See #325.
This fixes a bottleneck (see #261) caused by unzipping the line docs file in RAM (~15MB) and then selecting a random line in the file. The .NET
GZipStream
is not seekable, so this was done by copying the entire contents into aMemoryStream
first. This happened during a significant number of the tests (~20%), and happened in each one of those tests.The fix was to set up the test framework to unzip the file to a temp file on the test machine. This happens in 1 of 3 different ways:
LineFileDocs
is used directly in a class that does not specifyLuceneTestCase.UseTempLineDocsFile = true
,LineFileDocs
will unzip the file before it is used (per instance of the class) and deleted when it is disposed.LuceneTestCase.UseTempLineDocsFile = true
is specified in the test fixture, the file will be unzipped in theBeforeClass()
method and deleted in theAfterClass()
method.LuceneTestFrameworkInitializer
to the test project (outside of all namespaces) will cause the file to be unzipped only once for all of the tests in that project and deleted after the last test is finished.There are also several other patches in this PR:
LineFileDocs
was reverted back to Lucene's original implementation, which has revealed some (potential) false positives in some of the ICU tests. ABufferedStream
was added to improve performance.Nightly
,Weekly
,Slow
, andAwaitsFix
attributes so they will wait until NUnit runs the initialization code before running.DeadlockAttribute
to time out tests that we are now seeing threading contention issues with after improving raw speed. This is to ensure that they will fail in the CI environment if they actually deadlock and also can be used to filter out these tests during runs.ICUTokenizer
where it was callingSystem.Char.IsWhiteSpace()
when it should have been callingICU4N.UChar.IsWhiteSpace()
to ensure it is reading the correct version of ICU.DisposableThreadLocal
to that of RavenDB, with permission from its maintainers. (closes The design and implementation of a better ThreadLocal<T> #251)