forked from apache/lucenenet
-
Notifications
You must be signed in to change notification settings - Fork 9
/
CHANGES.txt
4418 lines (3417 loc) · 208 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
=================== Release 4.8.0-beta00006 =====================
Bug
• Lucene.Net.Support.Collections::Equals<T>(): Fixed comparison to include a check whether Count
matches.
•[Pull Request #215] - Lucene.Net.Analysis.Common.Analysis.Core.UpperCaseFilter::UpperCaseFilter():
Removed redundant termAtt initialization in UpperCaseFilter constructor
•[LUCENENET-597] - Lucene.Net.Search.Spans.SpanNearQuery::ToString(): Fixed ordering problem when appending
statements to StringBuilder.
•[Pull Request #224] - GOOD_FAST_HASH_SEED thread safety - avoid static constructor
• Intermittent failures of Lucene.Net.Facet.Taxonomy.WriterCache.TestCharBlockArray.TestArray().
The test was not setup to with encoders that fallback to '?' for unmapped/invalid characters.
Also, the BinaryReader/BinaryWriter was too strict with regard to validating surrogate pairs
for this type of serialization, so implemented custom extension methods over Stream that do
not use encoding.
•[LUCENENET-602] - Some platforms fail to load codecs seemingly because their types are discovered
by using reflection. Changed to using hard-coded codec lists rather than reflection to load the
internal codec types.
regression test to show it is no longer an issue).
• Lucene.Net.Tests.Search.TestMultiTermContsantScore: Safely call Dispose by ensuring the
reference variable is not null
• Lucene.Net.Tests.Search.BaseTestRangeFilter: Safely call Dispose by ensuring the
reference variable is not null
• Lucene.Net.Support.IO.FileSupport::fileCanonPathCache changed to ConcurrentDictionary
to make it thread-safe
• Lucene.Net.Store.NativeFSLockFactory: Fixed locking/disposal of lock instances to be thread-safe
•[Pull Request #225] - Changed "Improve this doc" button to point to the GitHub repo
• Fixed constructor of Lucene.Net.Support.HashMap to use the passed in comparer rather than
EqualityComparer<TKey>.Default.
•[LUCENENET-603] - Fixed ConcurrentMergeScheduler and TaskMergeScheduler so they don't throw exceptions
on background threads and properly throw exceptions on the calling thread during merge failures.
• Lucene.Net.TestFramework: Removed dependency on local file path location of
europarl.lines.txt.gz and embedded the file.
• Lucene.Net.Suggest.Suggest.FileDictionary - Fixed conversion of string to number to be
culture insensitive (it caused the tests in FileDictionaryTest to fail randomly)
• Lucene.Net.Tests.Cli - Fixed issue with xplat root directory specification
(all platforms were trying to set the directory to C:\)
• Lucene.Net.Benchmark.ByTask.Utils.Config: Fixed FormatException caused by converting
number to string in ambient culture and parsing it back to a number in invariant culture
• Lucene.Net.Analysis.Common.Analysis.Util.AbstractAnalysisFactory: Fixed parsing issue
converting string to int in ambient culture
• Lucene.Net.Analysis.Common.Analysis.Miscellaneous.TruncateTokenFilterFactory - Fixed
issue converting string to sbyte in ambient culture
• Lucene.Net.Util.CommandLineUtil.AdjustDirectoryName - IndexOf comparison must be
StringComparison.Ordinal (or in this case, a single char) to be compatible with all
cultures/platforms.
• Lucene.Net.TestFramework.Util.LuceneTestCase.NewFSDirectory - When resolving a type,
we were expecting an exception if the type does not subclass FSDirectory, however, in .NET this
won't happen. We need to explicitly check whether the resolved type is assignable from FSDirectory
or if the type name is nonsense.
• Lucene.Net.Util.StringHelper: - Fixed parsing issue converting string to int in ambient culture
• Lucene.Net.Index.CheckIndex - Fixed issue with converting int to string using ambient context
on VersionInfo comparison
• Lucene.Net.Expressions: Corrected casing on app.config to lower (xplat problem)
• Lucene.Net.Analysis.SmartCn: Corrected casing of folder paths on bigramdict.mem, coredict.mem,
and package.md (xplat problem)
• Lucene.Net.Tests.Support.TestTreeSet: Passing null instead of CultureInfo.InvariantCulture causes
the test to randomly fail depending on the culture of the current thread (which is
randomly selected by LuceneTestCase).
• Lucene.Net.TestFramework.Util.TestUtil.NextLong: The result of the method was always the value of
start when start == long.MinValue and end == long.MaxValue. As a result, many tests
were not actually random.
• Lucene.Net.TestFramework.Index.AlcoholicMergePolicy: The value chosen for Hour was supposed to be
random, but it was setup to be a constant by a mistranslation from Java to .NET
• Lucene.Net.Tests.Index.TestTransactionRollback: Number was failing due to the fact the data that was
being populated wasn't being converted from int to string in invariant culture. Switched back to
original logic, using LastIndexOf(char) rather than LastIndexOf(string).
• Lucene.Net.Grouping.TopGroups - check collection equality if the generic type is a reference
type (as is the default behavior in Java)
• SWEEP: Added StringComparison.Ordinal to all of the string.StartsWith() and string.EndsWith()
methods where it was missing
• Lucene.Net.Tests.QueryParser.Flexible.Precedence.TestPrecedenceQueryParser: Specify short date
format by using DateTime.ParseExact instead of DateTime.Parse
• Lucene.Net.Support.CultureContext: Fixed minor issue with unused variable
• Lucene.Net.TestFramework.JavaCompatibility.SystemTypesHelpers: Overloads of append that take
numeric types need to be converted to the invariant culture. Removed the overloads for
decimal, double, and float, as those need to be dealt with on a case by case basis.
• Lucene.Net.Tests.Analysis.Common.Analysis.Pattern.TestPatternTokenizer.TestSplitting: int.Parse
must be setup in the invariant culture to consistently recognize inputs
• SWEEP: Ensure all enumerators are disposed of properly (except in some cases where enumerators
are set to field variables, see LUCENENET-611)
• Lucene.Net.Highlighter.VectorHighlight.FieldQuery: List<T> replacement for LinkedHashSet<T> preserves insertion
order, but we need to explicitly check to ensure no duplicate values are added
• Lucene.Net.Tests.Search.TestFieldCacheRangeFilter.TestSparseIndex: formatting value must be done in invariant culture
• Lucene.Net.Util.StringHelper - Use Time.CurrentTimeMilliseconds() instead of DateTime.Now.Millisecond. The latter is
• a mis-translation from Java which contains only numbers 0 to 999, the former returns a long based on
Stopwatch.GetTimestamp() that has several orders of magnitude more possible values.
• SWEEP: Ensure all enumerators are disposed of properly (except in some cases where enumerators are set
to field variables, see LUCENENET-611)
• Lucene.Net.TestFramework.Codecs.RAMOnly.RAMOnlyPostingsFormat - string comparison must be done using ordinal to match Java
Improvement
•[Pull Request #206] - Website & API Doc site generator using DocFx script
•[Pull Request #223] - Website updates - DOAP file and copy changes
•[LUCENENET-588] - Made lucene-cli into a dotnet tool NuGet package and updated the documentation on how to install and use it
• Fixed solution and project files so builds can be done cross-platform (in the IDE or via dotnet build)
• Switched to the .snupkg debugy symbols format
• Changed build.ps1 script to install and use only version 2.2.300 of the .NET Core SDK to prevent
build failures due to version drift
•[Pull Request #227 & LUCENENET-608] - Added strong naming to Lucene.Net assemblies to comply with Microsoft guidelines
• Added missing guard clauses for Lucene.Net.Support.HashMap and Lucene.Net.Support.LinkedHashMap constructors
• Upgraded build script to latest dotnet-install.ps1
• Changed NuGet dependency from the unofficial SharpZipLib.NETStandard to the official SharpZipLib
and upgraded to version 1.1.0 from 0.86.0
• Removed hard-coded failure, since we are no longer getting crashes due to background threads
throwing exceptions
• SWEEP: Re-evaluated test times and decorated all tests 5 seconds or over with the LongRunningTestAttribute,
removing the attribute where it was no longer necessary
• SWEEP: Removed the TimeoutAttribute from all tests that are known to run in a short duration
• Upgraded test projects to use Microsoft.NET.Test.Sdk version 16.2.0
• Upgraded test projects to use NUnit3TestAdapter version 3.13.0
• Upgraded test projects to use NUnit version 3.9.0
• Lucene.Net.Analysis.Analyzer: Implemented dispose pattern
• Lucene.Net.Benchmark.ByTask.Tasks.PerfTask: added IDisposable so the
class can be used with a using block (it already had Dispose())
• Setup build.ps1 to run tests in parallel using background jobs
• build.ps1: Added function to summarize the test results on the console
• Removed Version.proj file and moved the version properties into the root Directory.Build.props file
• Renamed TestTargetFramework.proj to TestTargetFramework.props (Some editions of VS2019 don't seem
to like the .proj extension)
• Lucene.Net.TestFramework: Implemented dispose pattern where applicable
• Broke Lucene.Net.Tests project into Lucene.Net.Tests._A-I, Lucene.Net.Tests._J-U, and
Lucene.Net.Tests._U-Z to cut the time it takes to run the tests in the project by about 2/3, running
in parallel
•[Pull Request #216] - Added .NET Standard 2.0 target to projects where it was missing
• Lucene.Net.TestFramework.Util.LuceneTestCase: Throw explicit exception if Directory type cannot be resolved
• Lucene.Net.Benchmark: Use AssemblyQualifiedName for StandardAnalyzer for better reliability with .NET Reflection
• build.ps1: Added option to specify maximum number of parallel jobs to use during testing
• Added .vscode/settings.json file to locate tests and ignore docs path in Visual Studio Code
• SWEEP: Added StringComparison.Ordinal to all string.Equals() calls, as per
https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#recommendations-for-string-usage
• Lucene.Net.TestFramework.Util.LuceneTestCase: Added try catch blocks to write stack traces to the
console if exceptions occur during OneTimeSetUp or OneTimeTearDown
•[LUCENENET-435] SWEEP: CA2200: Rethrow to preserve stack details
(https://docs.microsoft.com/en-us/visualstudio/code-quality/ca2200-rethrow-to-preserve-stack-details)
• Lucene.Net.Support.Search.ReferenceContext: Sealed the class, as none of its
members are virtual anyway.
• Moved TaskMergeScheduler/TestTaskMergeScheduler to the Support folders
• Upgraded NuGet package dependency of Spatial4n to version 0.4.1
New Feature
•[LUCENENET-566 & LUCENENET-573] - Lucene.Net.ICU: Added all missing functionality and tests (100% passing) and changed the NuGet
package dependency from icu.net to ICU4N.
• Created azure-pipelines.yml for Azure DevOps that anybody can use.
=================== Release 4.8.0-beta00005 =====================
Bug
• Added [Obsolete] attribute to Lucene.Net.Field extension methods that are only for
pre-4.0 backward compatibility.
• BREAKING: Lucene.Net.Search.Similarites.BasicStats: Changed m_field from protected to
private (to match Lucene).
• BREAKING: Lucene.Net.Util.MapOfSets: Changed Map property from IDictionary<TKey, HashSet<TValue>>
to IDictionary<TKey, ISet<TValue>> (to match Lucene).
• BREAKING: Lucene.Net.Util.OfflineSorter: Changed setter of BufferSize property to private
(to match Lucene).
• BREAKING: Lucene.Net.Util.RamUsageEstimator.IdentityHashSet<KType>: Changed accessibility
from public to internal (to match Lucene)
• BREAKING: Lucene.Net.Search.BitsFilteredDocIdSet: Changed constructor to throw ArgumentNullException
instead of NullReferenceException (.NET convention)
• Lucene.Net.Search.ConstantScoreQuery.ConstantWeight: Fixed initialization issue with commented code
in GetValueForNormalization() (that diverged from Lucene)
• Lucene.Net.Util.OfflineSorter: Added line to delete the output file (that existed in Lucene).
• BREAKING: Lucene.Net.Analysis.Common.Analysis.Util.AbstractAnalysisFactory: Changed return
type of GetSet from ICollection<T> to ISet<T>.
•[LUCENENET-544] - Turkish stemmer causes an IndexOutOfRange (Reported in prior version, added
regression test to show it is no longer an issue).
•[LUCENENET-544] - replace java doc notation with ms style xml comments notation.
• Lucene.Net.Misc: Removed unnecessary dependency on Lucene.Net.Analysis.Common
• Lucene.Net.Highlighter: Removed unnecessary dependency on Lucene.Net.Analysis.Common
• BREAKING: Lucene.Net.Facet.Taxonomy.TaxonomyReader: Implemented IDisposable and proper dispose pattern.
Public Dispose(bool) method removed.
• Lucene.Net.Misc.IndexMergeTool: Added try-finally block to properly dispose of directories.
•[LUCENENET-521] - Concurrency bug with MMapDirectory (added regression test to confirm fix).
•[LUCENENET-530] - Create a truly Memory Mapped Directory type as MMapDirectory is not really memory mapped
Lucene.Net.Store: Fixed MMapDirectory concurrency using solution suggested by Vincent
Van Den Berghe http://apache.markmail.org/message/hafnuhq2ydhfjmi2.
• Lucene.Net.Store.LockVerifyServer: Read/write 1 byte instead of 1 int (4 bytes).
Also, we don't need 2 streams in .NET for input/output (solution provided by Vincent Van Den Berghe).
• Lucene.Net.TestFramework.Store.MockDirectoryWrapper.IndexInputSlicerAnonymousClass: Fixed Dispose() method.
• Lucene.Net.Index.IndexWriter: Fixed string formatting of numeric values in InfoStream message.
• Lucene.Net.Tests.Querys.CommonTermsQueryTest: Added missing TestRandomIndex() test
• Lucene.Net.Analysis.Common.Analysis.CharFilter.MappingCharFilterFactory: fixed escaping problem in parsing regex
• Lucene.Net.Search.SearcherLifetimeManager: Corrected bug in Lifetime reference count
• Lucene.Net.Spatial: Removed unnecessary dependencies on GeoAPI and NetTopologySuite
•[LUCENENET-593] - Refactored Lucene.Net.Util.Constants so OS identification, processor architecture, and
framework identification are more reliable. Searching for environment variables rather than using well-known APIs
was causing a null reference exception on Linux.
• Lazily initialize codecs to ensure the correct type is loaded if overridden
• Lucene.Net.Search.FieldCacheRangeFilter.AnonymousStringFieldCacheRangeFilter: Fixed Debug.Assert condition that
was causing assert to fail.
• Lucene.Net.Tests.Spatial.SpatialArgsTest.CalcDistanceFromErrPct(): Test fails because floating point asserts
didn't contain any delta and the implementation has changed in .NET Core 2.0 so it is no longer on the nose
(but still well within tolerance for floating point numbers).
• Lucene.Net.Tests.Index.TestConcurrentMergeScheduler: Fixed FailOnlyOnFlush class to match the original,
which was causing TestFlushExceptions() to fail. Also removed throw statement on a background thread that
was causing a crash.
• Lucene.Net.Tests.Search.TestMutiTermConstantScore: Made Small and Reader variables instance members, since
they are being set by instance methods. When they were static, tests could cross threads on the instance.
• Lucene.Net.TestFramework.Util.LuceneTestCase: Added missing catch block for UnauthorizedAccessException,
which does not subclass IOException in .NET as was the case in Java.
• Lucene.Net.Tests.Index.TestTransactionRollback: Fixed issue where the value passed to substring could
potentially go beyond the length of the string.
• Lucene.Net.Benchmark.ByTask.Tasks.TestPerfTasksLogic.TestLocale(): Original test was using no-NO which
is not consistently supported across platforms on .NET. Changed the test (and the documentation) to use nb-NO instead.
• Lucene.Net.Tests.Benchmark.ByTask.TestPerfTasksLogic.TestCollator(): Changed culture from no-NO to nb-NO to ensure
it runs consistently between dev and the CI server.
• Lucene.Net.Support.DictionaryExtensions: Fixed Store() method to save the date using the InvariantCulture so the
format is unaffected by the ambient culture.
• Lucene.Net.TestFramework: Having sequential folder names creates situations where multiple threads are doing
operations on the same folder at the same time. Changed implementation to use GetRandomFileName() to append
a random string instead of an incremental number.
• Lucene.Net.Automaton (BasicAutomata + BasicOperations + MinimizationOperations + SpecialOperations): Corrected
accessibility from internal to public and removed InternalsVisibleTo attributes unneeded as a result of these changes.
• Lucene.Net.Tests.Expressions.TestExpressionSorts: Added missing Collections.Shuffle call
• Patched behavior of all implementations of String.Split() and Regex.Split() using the .TrimEnd() extension method.
In Java, only the empty entries at the end of the array are removed, not all empty entries.
• Lucene.Net.Tests.QueryParser.Flexible.Precedence.TestPrecedenceQueryParser: Fixed test to always use
GregorianCalendar and local time zone.
• Added TimeZoneInfo.ConvertTime() to corresponding locations where time zone had been set in Lucene.
• Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.DoAfterWriter(): Enabled Thread priority for
.NET Core 2.0 tests
• Lucene.Net.Tests.Search.TestMultiTermConstantScore: Added check to ensure a null instance variable
doesn't cause the AfterClass method to fail
• Lucene.Net.Index (ConcurrentMergeScheduler + TaskMergeScheduler): Fixed null reference exception
due to synchronization of list across threads.
•[LUCENENET-592] - Lucene.Net.QueryParser.Flexible.Core.Util.UnescapedCharSequence: Fixed loop condition that
was preventing the ToStringEscaped() method from returning any results.
Improvement
• BREAKING: Changed namespace of Collections class from Lucene.Net to Lucene.Net.Support.
• BREAKING: Changed namespace of IcuBreakIterator class from Lucene.Net to Lucene.Net.Support.
• Fixed several broken XML documentation comment issues.
• Lucene.Net.Codecs.MultiLevelSkipListReader: Implemented proper dispose pattern.
• BREAKING: Lucene.Net.Index.SegmentCommitInfo: Renamed Files() method to GetFiles().
• BREAKING: Lucene.Net.Index.SegmentInfos: Renamed Files() method to GetFiles().
• BREAKING: Lucene.Net.Search.Similarities.LMSimilarity.ICollectionModel: Changed Name
property to GetName() method (consistency).
• BREAKING: Lucene.Net.Util (PagedBytes + PagedBytes.PagedBytesDataInput + PagedBytes.PagedBytesDataOutput):
Changed Pointer > GetPointer(), Position > GetPosition()
• Lucene.Net.Util.PrintStreamInfoStream: Marked obsolete and replaced with class named TextWriterInfoStream.
• Lucene.Net.Util.RamUsageEstimator: Added SizeOf() overloads for ulong, uint, and ushort
• BREAKING: Lucene.Net.MultiTermQuery: Removed nested ConstantScoreAutoRewrite class, since it is exactly
the same as the non-nested ConstantScoreAutoRewrite class. Made public constructor for ConstantScoreAutoRewrite.
• BREAKING: Lucene.Net.Index.SegmentReader.ICoreClosedListener: Renamed ICoreDisposedListener, OnClose() > OnDispose()
• Lucene.Net.Search.ReferenceManager: Implemented proper dispose pattern.
• Lucene.Net.Util.IOUtils: Added Dispose() and DisposeWhileHandlingException() overloads
and marked Close() and CloseWhileHandlingException() overloads [Obsolete].
• Lucene.Net.Search.BooleanQuery: Added documentation to show .NET usage of collection initializer.
• Lucene.Net.Search.MultiPhraseQuery: Implemented IEnumerable<T> so collection initializer can be used and
added documentation to show usage of collection initializer.
• Lucene.Net.Search.PhraseQuery: Implemented IEnumerable<T> so collection initializer can be used and
added documentation to show usage of collection initializer.
• Lucene.Net.Search.NGramPhraseQuery: Added documentation to show usage of collection initializer.
• Lucene.Net.Queries.CommonTermsQuery: Implemented IEnumerable<T> so collection initializer can be used and
added documentation to show usage of collection initializer.
• Lucene.Net.Search.DisjunctionMaxQuery: Added documentation to show usage of collection initializer.
• Lucene.Net.Facet.Range.DoubleRangeFacetCounts: Added missing params keyword on ranges constructor argument.
• BREAKING: Lucene.Net.Support.MathExtension: Renamed MathExtensions and added overloads of ToRadians()
for decimal and int, and added the ToDegrees() method overloads.
• Lucene.Net.Analysis.Stempel: Modified Egothor.Stemmer Compile and DiffIt programs to accept file
encoding on the command line and cleaned up implementation.
•[Pull Request #207] - Added ReferenceManager<G>.GetContext(), which is similar to ReferenceManager<G>.Acquire() but can be
used in a using block to implicitly dereference instead of having to do it explicitly in a finally block.
• Lucene.Net.Support.Document: Added extension methods to make casting to the correct IIndexableField-derived type simpler.
• BREAKING: Lucene.Net.Store.FSDirectory: Removed Fsync() method and m_staleFiles variable and all references to them.
• Lucene.Net.Store.NativeFSLockFactory: Refactored implementation to utilize locking/sharing features of FileStream in .NET
on Windows only - fallback to a different locking strategy on other plaforms. (solution mostly provided by Vincent Van Den Burghe).
• Ported StreamTokenizer from Apache Harmony.
• Moved SystemProperties class from Lucene.Net.TestFramework to Lucene.Net so the more Java System.properties-like
default value and security exception handling can be used globally
• Lucene.Net.Support.Character: Ported Digit(char, int) method from Apache Harmony for use in Lucene.Net.Benchmark.
• Lucene.Net.Support.DictionaryExtensions: Added Load and Store methods from Apache Harmony, so an
IDictionary<string, string> can be used the same way the Properties class is used in Java
(saving and loading the same file format).
• Lucene.Net.Support.StringTokenizer: Did a fresh port from Apache Harmony and ported tests.
• Lucene.Net.Support: Added a SystemConsole class as a stand-in for System.Console, but with the ability to
swap Out and Error to a different destination than System.Out and System.Error.
• Lucene.Net.Support.StringExtensions: Added a TrimEnd() method that can be used on string arrays.
This is to mimic Java's Split() method that removes only the null or empty elements from the end
of the array that is returned, but leaves any prior empty elements intact.
• Lucene.Net.Support.StringBuilderExtensions: Added IndexOf() extension methods and tests from Apache Harmony.
• Lucene.Net.Util.SPIClassIterator: Factored out code to get all non-Microsoft referenced assemblies into a
new class in Support named AssemblyUtils
• BREAKING: Lucene.Net.Index.IIndexableField: Renamed FieldType > IndexableFieldType and added additional FieldType
property on Lucene.Net.Documents.Field that returns FieldType rather than IIndexableFieldType so we can avoid casting.
• Lucene.Net.Documents.Field: Added similar Number value types as in Java so the numeric types can be stored as object
without boxing/unboxing. Also added overloads for numeric GetXXXValue() fields to IIndexableField so numeric values
can be retrieved without boxing/unboxing.
• Lucene.Net.Documents.Field: Added NumericType property that returns an enum so the Field type can be determined without boxing/unboxing.
• Lucene.Net.Documents.Field: Added extension methods GetXXXValueOrDefault() to easily retrieve the numeric value if it is not a
concern that it could be null.
• BREAKING: Lucene.Net.Facet.Taxonomy.WriterCache.CharBlockArray: Refactored to use BinaryReader/BinaryWriter for serialzation
and eliminated the 2 serialization support classes StreamUtils and CharBlockArrayConverter
• BREAKING: Removed indescriminate use of [Serializable] and chose specific targets to make serializable,
namely types that maintain a single value, collection, or array.
• BREAKING: Added ICloneable to all places where it was used in Lucene, but made it a compilation option that can be used
in custom builds only, since Microsoft discourages use of this interface.
• Moved Intern() functionality to StringExtensions rather than using string.Intern() directly.
• Eliminated [Debuggable] attribute and added [MethodImpl(MethodImplOptions.NoInlining)] to each potential match
for the StackTraceHelper, which allows tests that use it to work in release mode. Solution provided by Vincent Van Den Berghe.
• Changed to new .csproj format and merged Lucene.Net.sln and Lucene.Net.Portable.sln files into one Lucene.Net.sln file.
• Require VS 2017+ to load solution
• Lucene.Net.Search.Suggest.Analyzing.FSTUtil.Path<T>.Output: Changed accessibility to public. This wasn't made public until
Lucene 5.1, but doing it for 4.8 since it is required by end users.
• BREAKING: Lucene.Net.MMapDirectory: Removed UnmapHack/UNMAP_SUPPORTED features since these are not needed in .NET.
• Lucene.Net.Suggest.Analyzing.FreeTextSuggester: Changed to use Path.GetRandomFileName() instead of using random
integers to make the file name. Changed to delete the folder using System.IO.Directory.Delete and rearranged
try catch statements so the Lucene Directory disposes before deleting the OS directory.
• Lucene.Net.TestFramework.Index.BasePostingsFormatTestCase + Lucene.Net.Suggest.Analyzing
(AnalyzingInfixSuggesterTest + TestFreeTextSuggester) + Lucene.Net.Tests.Index.TestCodecs:
Added using blocks to make the tests run more reliably.
• BREAKING: Lucene.Net.Support.IO.FileSupport: Removed unused GetFiles(), GetLuceneIndexFiles(), and Sync() methods
• Lucene.Net.Support.IO.FileSupport: Made class static
• Lucene.Net.Support.IO.FileSupport: Fixed several issues with CreateTempFile() implementation
• Lucene.Net.Support.IO.FileSupport: Added GetCanonicalPath() method + tests
• Swapped GetCanonicalPath() call into each of the locations where it was originally used in Lucene
• Lucene.Net + Lucene.Net.Facet + Lucene.Net.ICU: Added extension methods to Document class for adding
non-obsolete project-related Field types
• BREAKING: Lucene.Net.Facet: De-nested DrillSidewaysResult from DrillSideways class
• BREAKING: Lucene.Net.Support: Removed CharAt() method from StringCharSequenceWrapper
• Lucene.Net.Support: Added StringBuilderCharSequenceWrapper class and StringBuilder.ToCharSequence()
extension method.
•[LUCENENET-592] - Lucene.Net.QueryParser.Flexible: Added an overload of type StringBuilder for all
ICharSequence-based methods and constructors.
• BREAKING: Changed .NET Standard from 1.5 to 1.6 due to missing required API in Microsoft.Extensions.DependencyModel
New Feature
• Lucene.Net.Search.Filter: Added NewAnonymous() method for easy creation of anonymous classes via delegate methods.
• Lucene.Net.Search.DocIdSet: Added NewAnonymous() method for easy creation of anonymous classes via delegate methods.
• Lucene.Net.Search.Collector: Added Collector.NewAnonymous() method for easy creation of anonymous classes via delegate methods.
•[LUCENENET-514] - Ported Lucene.Net.Analysis.SmartCn (Smart Chinese Analyzer)
•[LUCENENET-569] - Ported Lucene.Net.Analysis.Phonetic
•[LUCENENET-563] - Ported Lucene.Net.Demo (part of lucene-cli utility)
•[LUCENENET-577] - Port Lock Stress Test CLI Utility (part of lucene-cli utility)
•[LUCENENET-576] - Port IndexUpgrader CLI Utility (part of lucene-cli utility)
•[LUCENENET-575] - Port CheckIndex CLI Utility (part of lucene-cli utility)
•[LUCENENET-582] - Port Index Splitter CLI Utility (part of lucene-cli utility)
•[LUCENENET-585] - Port High Freq Terms CLI Utility (part of lucene-cli utility)
•[LUCENENET-584] - Port Get Term Info CLI Utility (part of lucene-cli utility)
•[LUCENENET-581] - Port Compound File Extractor CLI Utility (part of lucene-cli utility)
•[LUCENENET-586] - Port Index Merge Tool CLI Utility (part of lucene-cli utility)
•[LUCENENET-583] - Port Multi-Pass Index Splitter CLI Utility (part of lucene-cli utility)
•[LUCENENET-579] - Port Print Taxonomy Stats CLI Utility (part of lucene-cli utility)
•[LUCENENET-578] - Port Lock Verify Server CLI Utility (part of lucene-cli utility)
•[LUCENENET-588] - Create unified CLI tool (lucene-cli) to wrap all Lucene maintenance tools and demos for .NET
•[LUCENENET-565 & Pull Request #209] - Port Lucene.Net.Replicator
•[LUCENENET-567] - Port Lucene.Net.Analysis.Kuromoji
• Added Collation features of Lucene.Net.Analysis.ICU to Lucene.Net.ICU (as linked files).
•[LUCENENET-564] - Port Lucene.Net.Benchmarks
• Added .NET Standard 2.0 support
• Added .NET Framework 4.5 support
• Created JavaDocToMarkdownConverter utility to assist with converting java docs to markdown docs for docfx.
=================== Release 4.8.0-beta00004 =====================
Bug
• AssemblyVersion and other metadata not being set on .NET Standard assemblies
for Lucene.Net.QueryParser and Lucene.Net.Expressions.
• Attempting to load default codec in static Codec constructor causes failure
in environments that don't allow Reflection to be used that early in the lifecycle.
This also improves performance when a custom DefaultCodecFactory is used.
=================== Release 4.8.0-beta00003 =====================
Bug
• BREAKING CHANGE: Fixed codec issue with incorrect calculation on x86 platforms that produced the wrong
result. This fix means this version or any future version may not be able to read indexes from any
prior 4.8.x build that has indexes with binary doc values that were generated using an x86 platform.
• Added missing null check for values retrieved from table of Lucene45DocValuesProducer.
• Fixed assertion failure in FstEnum due to missing check for IsLast. This may have also produced
incorrect runtime behavior.
• Fixed assertion failure in UnicodeUtil due to incorrect cast.
Improvement
• Added overloads of Analyzer.NewAnonymous() that accept a delegate method for InitReader.
• Added constructor overloads on MMapDirectory, NIOFSDirectory, and SimpleFSDirectory that accept
string rather than DirectoryInfo to specify the directory.
• Added test to verify index compatibility with Lucene when using binary doc values.
=================== Release 4.8.0-beta00002 =====================
Bug
•[Pull Request #205] - Made FSDirectory stale files set synchronized
Improvement
• Added InstallSDK and Restore tasks to the build script, and made them dependencies of the Test task
which will make the test more reliable from the CLI/CI server.
• Added ability to test from CLI in release.
• Updated syntax of build script to use the standard - (for short command) and -- (for long command),
i.e. "build -t" or "build --test".
=================== Release 4.8.0-beta00001 =====================
Bug
•[LUCENENET-516] - ChainedFilter class not available in Lucene.net build 3.0.3
•[LUCENENET-571] - Lucene.Net.QueryParser.Flexible Not Fully Implemented in .NET Core
•[LUCENENET-558] - Some possible null reference exceptions in ListExtensions.cs
•[LUCENENET-542] - Snowball Analyser - stemming issue
Improvement
•[LUCENENET-572] - Lucene.Net.Expressions - removed dependency on .NET Core configuration packages
Task
•[LUCENENET-540] - Make changes to Contrib\Analyzers\Miscellaneous to sync with Java 4.3 version
May 6th 2017 - Completed most of v4.8
September 6th 2014 - Started work on revamping the project code and structure towards a v4.8 release
=================== 3.0.3 trunk (not yet released) =====================
Bug
•[LUCENENET-54] - ArgumentOurOfRangeException caused by SF.Snowball.Ext.DanishStemmer
•[LUCENENET-420] - String.StartsWith has culture in it.
•[LUCENENET-423] - QueryParser differences between Java and .NET when parsing range queries involving dates
•[LUCENENET-445] - Lucene.Net.Index.TestIndexWriter.TestFutureCommit() Fails
•[LUCENENET-464] - The Lucene.Net.FastVectorHighligher.dll of the latest release 2.9.4 breaks any ASP.NET application
•[LUCENENET-472] - Operator == on Parameter does not check for null arguments
•[LUCENENET-473] - Fix linefeeds in more than 600 files
•[LUCENENET-474] - Missing License Headers in trunk after 3.0.3 merge
•[LUCENENET-475] - DanishStemmer doesn't work.
•[LUCENENET-476] - ScoreDocs in TopDocs is ambiguos when using Visual Basic .Net
•[LUCENENET-477] - NullReferenceException in ThreadLocal when Lucene.Net compiled for .Net 2.0
•[LUCENENET-478] - Parts of QueryParser are outdated or weren't previously ported correctly
•[LUCENENET-479] - QueryParser.SetEnablePositionIncrements(false) doesn't work
•[LUCENENET-483] - Spatial Search skipping records when one location is close to origin, another one is away and radius is wider
•[LUCENENET-484] - Some possibly major tests intermittently fail
•[LUCENENET-485] - IndexOutOfRangeException in FrenchStemmer
•[LUCENENET-490] - QueryParser is culture-sensitive
•[LUCENENET-493] - Make lucene.net culture insensitive (like the java version)
•[LUCENENET-494] - Port error in FieldCacheRangeFilter
•[LUCENENET-495] - Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations
•[LUCENENET-500] - Lucene fails to run in medium trust ASP.NET Application
Improvement
•[LUCENENET-179] - SnowballFilter speed improvment
•[LUCENENET-407] - Signing the assembly
•[LUCENENET-408] - Mark assembly as CLS compliant; make AlreadyClosedException serializable
•[LUCENENET-466] - optimisation for the GermanStemmer.vb
•[LUCENENET-504] - FastVectorHighlighter - support for prefix query
•[LUCENENET-506] - FastVectorHighlighter should use Query.ExtractTerms as fallback
New Feature
•[LUCENENET-463] - Would like to be able to use a SimpleSpanFragmenter for extrcting whole sentances
•[LUCENENET-481] - Port Contrib.MemoryIndex
Task
•[LUCENENET-446] - Make Lucene.Net CLS Compliant
•[LUCENENET-471] - Remove Package.html and Overview.html artifacts
•[LUCENENET-480] - Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
•[LUCENENET-487] - Remove Obsolete Members, Fields that are marked as obsolete and to be removed in 3.0
•[LUCENENET-503] - Update binary names
Sub-task
•[LUCENENET-468] - Implement the Dispose pattern properly in classes with Close
•[LUCENENET-470] - Change Getxxx() and Setxxx() methods to .NET Properties
=================== 2.9.4 trunk =====================
Bug fixes
* LUCENENET-355 [LUCENE-2387]: Don't hang onto Fieldables from the last doc indexed,
in IndexWriter, nor the Reader in Tokenizer after close is
called. (digy) [Ruben Laguna, Uwe Schindler, Mike McCandless]
Change Log Copied from Lucene
======================= Release 2.9.2 2010-02-26 =======================
Bug fixes
* LUCENE-2045: Fix silly FileNotFoundException hit if you enable
infoStream on IndexWriter and then add an empty document and commit
(Shai Erera via Mike McCandless)
* LUCENE-2088: addAttribute() should only accept interfaces that
extend Attribute. (Shai Erera, Uwe Schindler)
* LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
and equals methods, cause bad things to happen when caching
BooleanQueries. (Chris Hostetter, Mike McCandless)
* LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
the same time, it's possible for commit to return control back to
one of the threads before all changes are actually committed.
(Sanne Grinovero via Mike McCandless)
* LUCENE-2166: Don't incorrectly keep warning about the same immense
term, when IndexWriter.infoStream is on. (Mike McCandless)
* LUCENE-2158: At high indexing rates, NRT reader could temporarily
lose deletions. (Mike McCandless)
* LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
implementation class when interface was loaded by a different
class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
* LUCENE-2257: Increase max number of unique terms in one segment to
termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
(Tom Burton-West via Mike McCandless)
* LUCENE-2260: Fixed AttributeSource to not hold a strong
reference to the Attribute/AttributeImpl classes which prevents
unloading of custom attributes loaded by other classloaders
(e.g. in Solr plugins). (Uwe Schindler)
* LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
only one payload is present. (Erik Hatcher, Mike McCandless
via Uwe Schindler)
* LUCENE-2270: Queries consisting of all zero-boost clauses
(for example, text:foo^0) sorted incorrectly and produced
invalid docids. (yonik)
* LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
little performance, and ties up possibly large amounts of memory
for apps that index large docs. (Ross Woolf via Mike McCandless)
API Changes
* LUCENE-2190: Added a new class CustomScoreProvider to function package
that can be subclassed to provide custom scoring to CustomScoreQuery.
The methods in CustomScoreQuery that did this before were deprecated
and replaced by a method getCustomScoreProvider(IndexReader) that
returns a custom score implementation using the above class. The change
is necessary with per-segment searching, as CustomScoreQuery is
a stateless class (like all other Queries) and does not know about
the currently searched segment. this API works similar to Filter's
getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
Uwe Schindler)
* LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
will cause backwards compatibility problems when upgrading Lucene. See
the Version javadocs for additional information.
(Robert Muir)
Optimizations
* LUCENE-2086: When resolving deleted terms, do so in term sort order
for better performance (Bogdan Ghidireac via Mike McCandless)
* LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
(Uwe Schindler, Robert Muir)
Test Cases
* LUCENE-2114: Change TestFilteredSearch to test on multi-segment
index as well. (Simon Willnauer via Mike McCandless)
* LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
that checks if clearAttributes() was called correctly.
(Uwe Schindler, Robert Muir)
* LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
Documentation
* LUCENE-2114: Improve javadocs of Filter to call out that the
provided reader is per-segment (Simon Willnauer via Mike
McCandless)
======================= Release 2.9.1 2009-11-06 =======================
Changes in backwards compatibility policy
* LUCENE-2002: Add required Version matchVersion argument when
constructing QueryParser or MultiFieldQueryParser and, default (as
of 2.9) enablePositionIncrements to true to match
StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
Bug fixes
* LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
BooleanScorer for scoring), whereby some matching documents fail to
be collected. (Fulin Tang via Mike McCandless)
* LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
([email protected] via Mike McCandless)
* LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
when the reader is a near real-time reader. (Jake Mannix via Mike
McCandless)
* LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
Mark Miller via Mike McCandless)
* LUCENE-1992: Fix thread hazard if a merge is committing just as an
exception occurs during sync (Uwe Schindler, Mike McCandless)
* LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
cannot exceed 2048 MB, and throw IllegalArgumentException if it
does. (Aaron McKee, Yonik Seeley, Mike McCandless)
* LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
by client code. (Uwe Schindler)
* LUCENE-2016: Replace illegal U+FFFF character with the replacement
char (U+FFFD) during indexing, to prevent silent index corruption.
(Peter Keegan, Mike McCandless)
API Changes
* Un-deprecate search(Weight weight, Filter filter, int n) from
Searchable interface (deprecated by accident). (Uwe Schindler)
* Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
* LUCENE-1987: Un-deprecate some ctors of Token, as they will not
be removed in 3.0 and are still useful. Also add some missing
o.a.l.util.Version constants for enabling invalid acronym
settings in StandardAnalyzer to be compatible with the coming
Lucene 3.0. (Uwe Schindler)
* LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
to allow controlling per-IndexSearcher whether scores are computed
when sorting by field. (Uwe Schindler, Mike McCandless)
Documentation
* LUCENE-1955: Fix Hits deprecation notice to point users in right
direction. (Mike McCandless, Mark Miller)
* Fix javadoc about score tracking done by search methods in Searcher
and IndexSearcher. (Mike McCandless)
* LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
(Luke Nezda via Mike McCandless)
======================= Release 2.9.0 2009-09-23 =======================
Changes in backwards compatibility policy
* LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
longer computes a document score for each hit by default. If
document score tracking is still needed, you can call
IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
both per-hit and maxScore tracking; however, this is deprecated
and will be removed in 3.0.
Alternatively, use Searchable.search(Weight, Filter, Collector)
and pass in a TopFieldCollector instance, using the following code
sample:
<code>
TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
true /* trackDocScores */,
true /* trackMaxScore */,
false /* docsInOrder */);
searcher.search(query, tfc);
TopDocs results = tfc.topDocs();
</code>
Note that your Sort object cannot use SortField.AUTO when you
directly instantiate TopFieldCollector.
Also, the method search(Weight, Filter, Collector) was added to
the Searchable interface and the Searcher abstract class to
replace the deprecated HitCollector versions. If you either
implement Searchable or extend Searcher, you should change your
code to implement this method. If you already extend
IndexSearcher, no further changes are needed to use Collector.
Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
valid scores. Lucene uses these values internally in certain
places, so if you have hits with such scores, it will cause
problems. (Shai Erera via Mike McCandless)
* LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
have been moved into FieldCache. ExtendedFieldCache is now deprecated and
contains only a few declarations for binary backwards compatibility.
ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
ExtendedFieldCache and FieldCache, FieldCache can now additionally return
long[] and double[] arrays in addition to int[] and float[] and StringIndex.
The interface changes are only notable for users implementing the interfaces,
which was unlikely done, because there is no possibility to change
Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
class. Some of the method signatures have changed, but it should be fairly
easy to see what adjustments must be made to existing code to sync up
with the new API. You can find more detail in the API Changes section.
Going forward Searchable will be kept for convenience only and may
be changed between minor releases without any deprecation
process. It is not recommended that you implement it, but rather extend
Searcher.
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
* LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
has some backwards breaks in rare cases. We did our best to make the
transition as easy as possible and you are not likely to run into any problems.
If your tokenizers still implement next(Token) or next(), the calls are
automatically wrapped. The indexer and query parser use the new API
(eg use incrementToken() calls). All core TokenStreams are implemented using
the new API. You can mix old and new API style TokenFilters/TokenStream.
Problems only occur when you have done the following:
You have overridden next(Token) or next() in one of the non-abstract core
TokenStreams/-Filters. These classes should normally be final, but some
of them are not. In this case, next(Token)/next() would never be called.
To fail early with a hard compile/runtime error, the next(Token)/next()
methods in these TokenStreams/-Filters were made final in this release.
(Michael Busch, Uwe Schindler)
* LUCENE-1763: MergePolicy now requires an IndexWriter instance to
be passed upon instantiation. As a result, IndexWriter was removed
as a method argument from all MergePolicy methods. (Shai Erera via
Mike McCandless)
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
compat break and caused custom SpanQuery implementations to fail at runtime
in a variety of ways. this issue attempts to remedy things by causing
a compile time break on custom SpanQuery implementations and removing
the PayloadSpans class, with its functionality now moved to Spans. To
help in alleviating future back compat pain, Spans has been changed from
an interface to an abstract class.
(Hugh Cayless, Mark Miller)
* LUCENE-1808: Query.createWeight has been changed from protected to
public. this will be a back compat break if you have overridden this
method - but you are likely already affected by the LUCENE-1693 (make Weight
abstract rather than an interface) back compat break if you have overridden
Query.creatWeight, so we have taken the opportunity to make this change.
(Tim Smith, Shai Erera via Mark Miller)
* LUCENE-1708 - IndexReader.document() no longer checks if the document is
deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
(Shai Erera via Mike McCandless)
Changes in runtime behavior
* LUCENE-1424: QueryParser now by default uses constant score auto
rewriting when it generates a WildcardQuery and PrefixQuery (it
already does so for TermRangeQuery, as well). Call
setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
McCandless)
* LUCENE-1575: As of 2.9, the core collectors as well as
IndexSearcher's search methods that return top N results, no
longer filter documents with scores <= 0.0. If you rely on this
functionality you can use PositiveScoresOnlyCollector like this:
<code>
TopDocsCollector tdc = new TopScoreDocCollector(10);
Collector c = new PositiveScoresOnlyCollector(tdc);
searcher.search(query, c);
TopDocs hits = tdc.topDocs();
...
</code>
* LUCENE-1604: IndexReader.norms(String field) is now allowed to
return null if the field has no norms, as long as you've
previously called IndexReader.setDisableFakeNorms(true). this
setting now defaults to false (to preserve the fake norms back
compatible behavior) but in 3.0 will be hardwired to true. (Shon
Vella via Mike McCandless).
* LUCENE-1624: If you open IndexWriter with create=true and
autoCommit=false on an existing index, IndexWriter no longer
writes an empty commit when it's created. (Paul Taylor via Mike
McCandless)
* LUCENE-1593: When you call Sort() or Sort.setSort(String field,
boolean reverse), the resulting SortField array no longer ends
with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
internally by docID). (Shai Erera via Michael McCandless)
* LUCENE-1542: When the first token(s) have 0 position increment,
IndexWriter used to incorrectly record the position as -1, if no
payload is present, or Integer.MAX_VALUE if a payload is present.
this causes positional queries to fail to match. The bug is now
fixed, but if your app relies on the buggy behavior then you must
call IndexWriter.setAllowMinus1Position(). That API is deprecated
so you must fix your application, and rebuild your index, to not
rely on this behavior by the 3.0 release of Lucene. (Jonathan
Mamou, Mark Miller via Mike McCandless)
* LUCENE-1715: Finalizers have been removed from the 4 core classes
that still had them, since they will cause GC to take longer, thus
tying up memory for longer, and at best they mask buggy app code.
DirectoryReader (returned from IndexReader.open) & IndexWriter
previously released the write lock during finalize.
SimpleFSDirectory.FSIndexInput closed the descriptor in its
finalizer, and NativeFSLock released the lock. It's possible
applications will be affected by this, but only if the application
is failing to close reader/writers. (Brian Groose via Mike
McCandless)
* LUCENE-1717: Fixed IndexWriter to account for RAM usage of
buffered deletions. (Mike McCandless)
* LUCENE-1727: Ensure that fields are stored & retrieved in the
exact order in which they were added to the document. this was
true in all Lucene releases before 2.3, but was broken in 2.3 and
2.4, and is now fixed in 2.9. (Mike McCandless)
* LUCENE-1678: The addition of Analyzer.reusableTokenStream
accidentally broke back compatibility of external analyzers that
subclassed core analyzers that implemented tokenStream but not
reusableTokenStream. this is now fixed, such that if
reusableTokenStream is invoked on such a subclass, that method
will forcefully fallback to tokenStream. (Mike McCandless)
* LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
startOffset, endOffset and type. this is not likely to affect any
Tokenizer chains, as Tokenizers normally always set these three values.
this change was made to be conform to the new AttributeImpl.clear() and
AttributeSource.clearAttributes() to work identical for Token as one for all
AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
* LUCENE-1483: When searching over multiple segments, a new Scorer is now created
for each segment. Searching has been telescoped out a level and IndexSearcher now
operates much like MultiSearcher does. The Weight is created only once for the top
level Searcher, but each Scorer is passed a per-segment IndexReader. this will
result in doc ids in the Scorer being internal to the per-segment IndexReader. It
has always been outside of the API to count on a given IndexReader to contain every
doc id in the index - and if you have been ignoring MultiSearcher in your custom code
and counting on this fact, you will find your code no longer works correctly. If a
custom Scorer implementation uses any caches/filters that rely on being based on the
top level IndexReader, it will need to be updated to correctly use contextless
caches/filters eg you can't count on the IndexReader to contain any given doc id or
all of the doc ids. (Mark Miller, Mike McCandless)
* LUCENE-1846: DateTools now uses the US locale to format the numbers in its
date/time strings instead of the default locale. For most locales there will
be no change in the index format, as DateFormatSymbols is using ASCII digits.
The usage of the US locale is important to guarantee correct ordering of
generated terms. (Uwe Schindler)
* LUCENE-1860: MultiTermQuery now defaults to
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
was SCORING_BOOLEAN_QUERY_REWRITE). this means that PrefixQuery
and WildcardQuery will now produce constant score for all matching
docs, equal to the boost of the query. (Mike McCandless)
API Changes
* LUCENE-1419: Add expert API to set custom indexing chain. this API is
package-protected for now, so we don't have to officially support it.
Yet, it will give us the possibility to try out different consumers
in the chain. (Michael Busch)
* LUCENE-1427: DocIdSet.iterator() is now allowed to throw
IOException. (Paul Elschot, Mike McCandless)
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
AttributeSource instead of the Token class, which is now a utility class that
holds common Token attributes. All attributes that the Token class had have
been moved into separate classes: TermAttribute, OffsetAttribute,
PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
The new API is much more flexible; it allows to combine the Attributes
arbitrarily and also to define custom Attributes. The new API has the same
performance as the old next(Token) approach. For conformance with this new
API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
(Michael Busch, Uwe Schindler; additional contributions and bug fixes by
Daniel Shane, Doron Cohen)
* LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
These methods can be used to avoid additional calls to doc().
(Michael Busch)
* LUCENE-1468: Deprecate Directory.list(), which sometimes (in
FSDirectory) filters out files that don't look like index files, in
favor of new Directory.listAll(), which does no filtering. Also,
listAll() will never return null; instead, it throws an IOException
(or subclass). Specifically, FSDirectory.listAll() will throw the
newly added NoSuchDirectoryException if the directory does not
exist. (Marcel Reutegger, Mike McCandless)
* LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
you to record an opaque commitUserData (maps String -> String) into
the commit written by IndexReader. this matches IndexWriter's
commit methods. (Jason Rutherglen via Mike McCandless)
* LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
enable compressing & decompressing binary content, external to
Lucene's indexing. Deprecated Field.Store.COMPRESS.
* LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
(Otis Gospodnetic via Mike McCandless)
* LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
to denote issues when offsets in TokenStream tokens exceed the length of the
provided text. (Mark Harwood)
* LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
a new Collector abstract class. For easy migration, people can use
HitCollectorWrapper which translates (wraps) HitCollector into
Collector. Note that this class is also deprecated and will be
removed when HitCollector is removed. Also TimeLimitedCollector
is deprecated in favor of the new TimeLimitingCollector which
extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
* LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
it is used nowhere in core/contrib and there is only a very ineffective
default implementation available. If you want to position a TermEnum
to another Term, create a new one using IndexReader.terms(Term).
(Uwe Schindler)
* LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
not make sense for all subclasses of MultiTermQuery. Check individual
subclasses to see if they support getTerm(). (Mark Miller)
* LUCENE-1636: Make TokenFilter.input final so it's set only
once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
* LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
(but left an FSDirectory base class). Added an FSDirectory.open
static method to pick a good default FSDirectory implementation
given the OS. FSDirectories should now be instantiated using
FSDirectory.open or with public constructors rather than
FSDirectory.getDirectory(), which has been deprecated.
(Michael McCandless, Uwe Schindler, yonik)
* LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
Instead, when sorting by field, the application should explicitly
state the type of the field. (Mike McCandless)
* LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
require up front specification of enablePositionIncrement (Mike
McCandless)
* LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
of the new nextDoc() and advance(). The new methods return the doc Id they
landed on, saving an extra call to doc() in most cases.
For easy migration of the code, you can change the calls to next() to
nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
However it is advised that you take advantage of the returned doc ID and not
call doc() following those two.
Also, doc() was deprecated in favor of docID(). docID() should return -1 or
NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
iterator has exhausted. Otherwise it should return the current doc ID.
(Shai Erera via Mike McCandless)
* LUCENE-1672: All ctors/opens and other methods using String/File to
specify the directory in IndexReader, IndexWriter, and IndexSearcher
were deprecated. You should instantiate the Directory manually before
and pass it to these classes (LUCENE-1451, LUCENE-1658).
(Uwe Schindler)
* LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
of Lucene's core into new contrib/remote package. Searchable no
longer extends java.rmi.Remote (Simon Willnauer via Mike
McCandless)
* LUCENE-1677: The global property
org.apache.lucene.SegmentReader.class, and
ReadOnlySegmentReader.class are now deprecated, to be removed in
3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
McCandless)
* LUCENE-1673: Deprecated NumberTools in favour of the new
NumericRangeQuery and its new indexing format for numeric or
date values. (Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
this method to obtain a scorer matching the capabilities of the Collector
wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
efficient if out-of-order documents scoring is allowed by a Collector.
Collector must now implement acceptsDocsOutOfOrder. If you write a
Collector which does not care about doc ID orderness, it is recommended
that you return true. Weight has a scoresDocsOutOfOrder method, which by
default returns false. If you create a Weight which will score documents
out of order if requested, you should override that method to return true.
BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
deprecated as they are not needed anymore. BooleanQuery will now score docs
out of order when used with a Collector that can accept docs out of order.
Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
a top level reader and docID.
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
* LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
chaining & mapping of characters before tokenizers run. CharStream (subclass of
Reader) is the base class for custom java.io.Reader's, that support offset
correction. Tokenizers got an additional method correctOffset() that is passed
down to the underlying CharStream if input is a subclass of CharStream/-Filter.
(Koji Sekiguchi via Mike McCandless, Uwe Schindler)
* LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike
McCandless)
* LUCENE-1625: CheckIndex's programmatic API now returns separate
classes detailing the status of each component in the index, and
includes more detailed status than previously. (Tim Smith via
Mike McCandless)
* LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
score auto rewrite mode by default. The new classes also have new
ctors taking field and term ranges as Strings (see also
LUCENE-1424). (Uwe Schindler)