Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance decrease 30x when running on .NET 8 #933

Closed
1 task done
AddictedCS opened this issue Mar 22, 2024 · 37 comments · Fixed by #938
Closed
1 task done

Performance decrease 30x when running on .NET 8 #933

AddictedCS opened this issue Mar 22, 2024 · 37 comments · Fixed by #938
Assignees
Labels

Comments

@AddictedCS
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

After upgrading to .NET 8 I've noticed a dramatic performance decrease. I've localized the issue to SearcherTaxonomyManager.maybeRefresh method. I don't know the internals of this method though, so any help will be of an immense value.

Below if a Benchmark that reproduces the issue. The benchmark is simulating document updates, using maybeRefresh to refresh the indexer. This code has been running absolutely fine on net5, net6, net7, but is now completely unusable on .NET 8.

namespace Emy.Benchmark;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Facet;
using Lucene.Net.Facet.Taxonomy;
using Lucene.Net.Facet.Taxonomy.Directory;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using Directory = System.IO.Directory;

[SimpleJob(RuntimeMoniker.Net70, baseline: true)]
[SimpleJob(RuntimeMoniker.Net80)]
[RPlotExporter]
[GcServer(true)]
public class LuceneBenchmark
{
    private DirectoryTaxonomyWriter taxonomyWriter;
    private IndexWriter indexWriter;
    private Document[] documents;
    private FacetsConfig facetsConfig;
    private SearcherTaxonomyManager searcherManager;
    
    [GlobalSetup]
    public void Setup()
    {
        if (Directory.Exists("test_index"))
        {
            Directory.Delete("test_index", true);
        }

        if (Directory.Exists("test_facets"))
        {
            Directory.Delete("test_facets", true);
        }

        var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
        var luceneDirectory = new MMapDirectory("test_index");
        indexWriter = new IndexWriter(luceneDirectory, new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer));
        taxonomyWriter = new DirectoryTaxonomyWriter(new MMapDirectory("test_facets"));
        facetsConfig = new FacetsConfig();
        searcherManager = new SearcherTaxonomyManager(indexWriter, true, new SearcherFactory(), taxonomyWriter);
        facetsConfig.SetRequireDimCount("track_id", true);
        
        documents = new Document[N];
        for (int i = 0; i < N; i++)
        {
            var facet = GenerateRandomString(5);
            documents[i] = new Document
            {
                new StringField("_id", i.ToString(), Field.Store.YES),
                new TextField("content", GenerateRandomString(10), Field.Store.YES),
                new FacetField("track_id", facet)
            };
        }
    }

    [Params(100)] 
    public int N;

    [Benchmark]
    public int IndexDocumentsBenchmark() => IndexDocuments();
    
    private int IndexDocuments()
    {
        for (int i = 0; i < documents.Length; ++i)
        {
            var taxonomyDocument = facetsConfig.Build(taxonomyWriter, documents[i]);
            indexWriter.UpdateDocument(new Term("_id", i.ToString()), taxonomyDocument);
            searcherManager.MaybeRefresh(); // maybe refresh causing dramatic performance drop on .NET 8.0
        }

        return documents.Length;
    }

    private static string GenerateRandomString(int length)
    {
        // more spaces added on purpose
        const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789         ";
        char[] stringChars = new char[length];

        for (int i = 0; i < length; i++)
        {
            stringChars[i] = chars[Random.Shared.Next(chars.Length)];
        }

        return new string(stringChars);
    }
}

The results are attached below.
Screenshot 2024-03-22 at 10 01 29 AM

Attaching project details, for those who want to reproduce it locally:

<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFrameworks>net7.0;net8.0</TargetFrameworks>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="BenchmarkDotNet" Version="0.13.12" />
        <PackageReference Include="Lucene.Net" Version="4.8.0-beta00016" />
        <PackageReference Include="Lucene.Net.Facet" Version="4.8.0-beta00016" />
        <PackageReference Include="Lucene.Net.QueryParser" Version="4.8.0-beta00016" />
        <PackageReference Include="Lucene.Net.Analysis.Common" Version="4.8.0-beta00016" />
    </ItemGroup>

</Project>

Any help on this will be greatly appreciated, as I'm right now blocked from migrating toward .NET 8 because of this issue.
I've noticed somehow similar issue #929, and the fix (using ServerGc) didn't help.
Thanks!

Expected Behavior

Better performance.

Steps To Reproduce

Described in the attached benchmark.

Exceptions (if any)

No exceptions

Lucene.NET Version

4.8.0-beta00016

.NET Version

.NET8

Operating System

MacOS ARM

Anything else?

No response

@eladmarg
Copy link
Contributor

i think the problem related to the locking,
the MaybeRefresh has a lock inside to prevent multiple threads refreshing while single refresh is sufficient.

@NightOwl888 added this work around to solve other locking problems.
but i guess for the refresh we can avoid the expensive Monitor calls.

the question is how long we want to support the older frameworks.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

Not quite the same severe reduction on Windows/Intel (older CPU):

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.3930/22H2/2022Update)
Intel Core i9-7900X CPU 3.30GHz (Kaby Lake), 1 CPU, 20 logical and 10 physical cores
.NET SDK 8.0.202
  [Host]   : .NET 6.0.28 (6.0.2824.12007), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.17 (7.0.1724.11508), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL

Server=True

| Method                  | Job      | Runtime  | N   | Mean       | Error    | StdDev   | Ratio | RatioSD |
|------------------------ |--------- |--------- |---- |-----------:|---------:|---------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 100 |   562.9 ms | 11.23 ms | 24.89 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 100 | 2,184.3 ms | 43.26 ms | 94.04 ms |  3.89 |    0.25 |

Seems like Lucene.Net.Store.FSDirectory+FSIndexOutput.Dispose(Boolean) goes from taking up 1.9% of the execution time to taking 25.2%:

.NET 7.0
image
image
image
LuceneBenchmarks.NET 7.0.zip

.Net 8.0
image
image
image
LuceneBenchmarks.NET 8.0.zip

I don't have the time now to dig further, but you can get dotTrace outputs by:
https://blog.jetbrains.com/dotnet/2023/07/11/dottrace-comes-to-benchmarkdotnet/

Obviously you need access to dotTrace to open the traces from there, but could be useful.

@AddictedCS considering your results are from a M2, it might be interesting to add the trace logger, if you don't have dotTrace you can upload the files here instead I guess, then others may have a look. the magnitude of difference makes me wonder if it's the same pattern or a different one.

Edit: After a bit further digging, I end up in "flushing" of file streams. At this point things seems to begin to diverge between .NET 7.0 and .NET 8.0 but can't make out why.

It also spent significant more time in pieces I can't figure if I can dig deeper into, (Stack traces without user methods) - I have not succeeded in digging out more there, but I am by no means an expert in dotTrace.

@paulirwin
Copy link
Contributor

@AddictedCS Thank you for the report. I can reproduce a significant discrepancy on a M1 Pro MacBook Pro with beta 16:

BenchmarkDotNet v0.13.12, macOS Sonoma 14.3.1 (23D60) [Darwin 23.3.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.202
  [Host]   : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method                  | Job      | Runtime  | N   | Mean       | Error     | StdDev    | Ratio | RatioSD |
|------------------------ |--------- |--------- |---- |-----------:|----------:|----------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 100 |   329.3 ms |   3.78 ms |   3.35 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 100 | 9,685.6 ms | 191.91 ms | 534.98 ms | 29.77 |    2.12 |

For me, the ratio is ~30x, not 12,000x, but that's still unacceptable.

I tried it against master with similar results (note: I dropped N to 25 just to speed up the test run, but similar ratio):

BenchmarkDotNet v0.13.12, macOS Sonoma 14.3.1 (23D60) [Darwin 23.3.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.202
  [Host]   : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method                  | Job      | Runtime  | N  | Mean        | Error     | StdDev     | Ratio | RatioSD |
|------------------------ |--------- |--------- |--- |------------:|----------:|-----------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 25 |    62.54 ms |  1.192 ms |   1.464 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 25 | 2,276.04 ms | 89.073 ms | 262.632 ms | 32.70 |    2.25 |

I'll dig into this on my end as well and will update here if I find anything.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

@AddictedCS and @paulirwin Could either of you or both (since you have different chips) run with the DotTraceDiagnoser attached, if you don't have dotTrace feel free to zip the reports then I (and/or perhaps others) can dig into them a little.

Seeing as from my findings we do end up in the IO level, this is obviously different implementations due to platforms, but going from 300ms -> 10000ms is a HUGE factor compared to on windows/intel where the factor is down about 4x. (not to mention the (12k x which is insanity) So I am wondering if it's the same areas or if other areas light up.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

My findings are reproduceable outside of Lucene.NET, so it might be something we need to workaround or bring to Microsoft. Findings on writing and flushing small buffers to FileStreams. The problem goes away for larger files. However for the test case here, it almost solely end up producing files in the 1-4Kb range so I wondered.

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.3930/22H2/2022Update)
Intel Core i9-7900X CPU 3.30GHz (Kaby Lake), 1 CPU, 20 logical and 10 physical cores
.NET SDK 8.0.202
  [Host]   : .NET 6.0.28 (6.0.2824.12007), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.17 (7.0.1724.11508), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL

Server=True

| Method      | Job      | Runtime  | Mean         | Error      | StdDev     | Median       | Ratio | RatioSD |
|------------ |--------- |--------- |-------------:|-----------:|-----------:|-------------:|------:|--------:|
| Write1Kb    | .NET 7.0 | .NET 7.0 |     50.26 us |   0.945 us |   2.404 us |     49.53 us |  1.00 |    0.00 |
| Write1Kb    | .NET 8.0 | .NET 8.0 |    667.60 us |  13.219 us |  27.299 us |    670.53 us | 13.21 |    0.88 |
|             |          |          |              |            |            |              |       |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 |     52.46 us |   1.032 us |   1.060 us |     52.30 us |  1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 |    663.18 us |  13.111 us |  31.161 us |    666.55 us | 12.72 |    0.79 |
|             |          |          |              |            |            |              |       |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 |     52.38 us |   1.016 us |   1.170 us |     52.48 us |  1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 |    653.99 us |  12.956 us |  22.348 us |    654.32 us | 12.48 |    0.49 |
|             |          |          |              |            |            |              |       |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 |  1,085.58 us |  21.673 us |  43.780 us |  1,074.37 us |  1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 |  1,095.43 us |  21.008 us |  34.516 us |  1,092.27 us |  1.01 |    0.05 |
|             |          |          |              |            |            |              |       |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 |  1,616.97 us |  44.302 us | 130.625 us |  1,643.63 us |  1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 |  1,587.57 us |  39.897 us | 117.638 us |  1,596.55 us |  0.99 |    0.09 |
|             |          |          |              |            |            |              |       |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | 16,713.86 us | 331.006 us | 453.085 us | 16,754.16 us |  1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | 16,828.06 us | 320.014 us | 393.005 us | 16,844.23 us |  1.01 |    0.03 |

Could you try that as well:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnostics.dotTrace;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<FileStreamBenchmarks>();

[SimpleJob(RuntimeMoniker.Net70, baseline: true)]
[SimpleJob(RuntimeMoniker.Net80)]
[RPlotExporter]
[GcServer(true)]
[DotTraceDiagnoser]
public class FileStreamBenchmarks
{
    private readonly byte[] buffer1kb = 1.KiloBytes();
    private readonly byte[] buffer2kb = 2.KiloBytes();
    private readonly byte[] buffer4kb = 4.KiloBytes();
    private readonly byte[] buffer512kb = 512.KiloBytes();
    private readonly byte[] buffer1024Kb = 1.MegaBytes();
    private readonly byte[] buffer16Mb = 16.MegaBytes();

    [GlobalSetup]
    public void Setup()
    {
        if (Directory.Exists("test_dir"))
            Directory.Delete("test_dir", true);
        Directory.CreateDirectory("test_dir");
    }
    
    [Benchmark] public int Write1Kb() =>  WriteData(buffer1kb);
    [Benchmark] public int Write2Kb() =>  WriteData(buffer2kb);
    [Benchmark] public int Write4Kb() =>  WriteData(buffer4kb);
    [Benchmark] public int Write512Kb() =>  WriteData(buffer512kb);
    [Benchmark] public int Write1024Kb() =>  WriteData(buffer1024Kb);
    [Benchmark] public int Write16Mb() =>  WriteData(buffer16Mb);

    private int WriteData(byte[] buffer)
    {
        using FileStream stream = 
            new FileStream(
                path: Path.Combine("test_dir", $"test.{buffer.Length}.bin"),
                mode: FileMode.OpenOrCreate,
                access: FileAccess.Write,
                share: FileShare.ReadWrite,
                bufferSize: 16384);
        stream.Write(buffer, 0, buffer.Length);
        stream.Flush(flushToDisk: true);
        return buffer.Length;
    }
}

public static class Bytes
{
    public static byte[] KiloBytes(this int self)
    {
        byte[] buffer = new byte[1024 * self];
        Random.Shared.NextBytes(buffer);
        return buffer;
    }

    public static byte[] MegaBytes(this int self) => (self * 1024).KiloBytes();
}

I will probably keep expanding the above for different options.

@eladmarg
Copy link
Contributor

image

from my analysis, its flush method consume
image

which is going to FileStream.Flush()
so this isn't related to the code, but the framework / runtime.

so as @jeme said, this probably a regression Microsoft should investigate.

@paulirwin
Copy link
Contributor

@jeme Awesome find, thanks! I ran the benchmarks on my M1 Pro and sure enough there's a large discrepancy, up to 179x slower on .NET 8:

BenchmarkDotNet v0.13.12, macOS Sonoma 14.3.1 (23D60) [Darwin 23.3.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.202
  [Host]   : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | Mean         | Error      | StdDev     | Median       | Ratio  | RatioSD |
|------------ |--------- |--------- |-------------:|-----------:|-----------:|-------------:|-------:|--------:|
| Write1Kb    | .NET 7.0 | .NET 7.0 |     32.04 us |   0.303 us |   0.283 us |     32.02 us |   1.00 |    0.00 |
| Write1Kb    | .NET 8.0 | .NET 8.0 |  5,209.14 us | 143.338 us | 418.125 us |  5,123.13 us | 159.53 |   10.16 |
|             |          |          |              |            |            |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 |     31.66 us |   0.565 us |   0.501 us |     31.65 us |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 |  5,395.70 us | 149.807 us | 441.708 us |  5,269.80 us | 179.62 |   14.61 |
|             |          |          |              |            |            |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 |     31.57 us |   0.454 us |   0.379 us |     31.57 us |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 |  5,494.01 us | 159.845 us | 468.799 us |  5,481.13 us | 174.13 |   13.45 |
|             |          |          |              |            |            |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 |  5,272.88 us | 103.347 us | 169.802 us |  5,307.65 us |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 |  5,439.62 us | 107.396 us | 147.005 us |  5,404.71 us |   1.04 |    0.04 |
|             |          |          |              |            |            |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 |  5,825.75 us | 127.982 us | 365.140 us |  5,804.00 us |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 |  5,619.59 us | 111.783 us | 223.243 us |  5,591.22 us |   0.96 |    0.07 |
|             |          |          |              |            |            |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | 14,257.58 us | 282.783 us | 347.283 us | 14,285.54 us |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | 12,851.95 us | 302.271 us | 891.252 us | 12,353.02 us |   0.95 |    0.08 |

@eladmarg
Copy link
Contributor

after some thinking with myself, this is a design bug.
we shouldn't refresh the index so frequently.

in the end of the day, this method is performing actual write to disk fsync() which is expensive.
there is nothing to do about it, and many of the databases or anyone who works with filesystem suffers from this.

this should have a debounce, that will limit the number of calls per timeframe.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

@eladmarg I am curious how you ran your trace, it's a Dispose that has the Hotspot when I attach the profiler to the test, we can't really avoid paying that tax when disposing the FSIndexOutput. (Although calling flush(true) explicitly is probably unnecessary as the same thing would happen on dispose.

Now I can't say that there isn't other situations where flush might be called to frequently. However in regards to this particular sample, it calls searcherManager.MaybeRefresh() for every single document update, it's my understanding that that is actually warned against (well, differently at least).

In addition you should periodically call [ReferenceManager.maybeRefresh()](https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/ReferenceManager.html#maybeRefresh()). While it's possible to call this just before running each query, this is discouraged since it penalizes the unlucky queries that do the reopen. It's better to use a separate background thread, that periodically calls maybeReopen. Finally, be sure to call [ReferenceManager.close()](https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/ReferenceManager.html#close()) once you are done.
source: https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/SearcherManager.html

I would have assumed that the same rule of thump applies here even though its a SearcherTaxonomyManager event though the docs does not explicitly mention this.

I am not to familiar with the Taxonomy stuff, but since this is solely for small files, i am hoping the regression will be less as an index grow, unless we do flush(true) for small amounts of data in other cases as well.

I am currently executing a bigger test to see where the breakpoint is (how many bytes before before et equalizes)

@eladmarg
Copy link
Contributor

its not related to the index / code.
all performance degradation by the filestream.

just to validate this, i changed the FSDirectory file stream with flags, and the problem gone.

file = new FileStream(
    path: Path.Combine(parent.m_directory.FullName, name),
    mode: FileMode.OpenOrCreate,
    access: FileAccess.Write,
    share: FileShare.ReadWrite,
    bufferSize: CHUNK_SIZE,
    FileOptions.Asynchronous | FileOptions.RandomAccess | FileOptions.WriteThrough);

any chance to let Microsoft to know about it? here

@AddictedCS
Copy link
Author

AddictedCS commented Mar 23, 2024

Hey guys, thanks for helping,

I've reran the benchmark multiple times now, and here are the results:

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method                  | Job      | Runtime  | N  | Mean        | Error     | StdDev     | Ratio | RatioSD |
|------------------------ |--------- |--------- |--- |------------:|----------:|-----------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 25 |    59.25 ms |  0.839 ms |   0.862 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 25 | 2,174.27 ms | 47.201 ms | 139.172 ms | 33.67 |    1.25 |

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method                  | Job      | Runtime  | N  | Mean        | Error     | StdDev     | Ratio | RatioSD |
|------------------------ |--------- |--------- |--- |------------:|----------:|-----------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 25 |    62.84 ms |  1.423 ms |   3.823 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 25 | 2,161.11 ms | 46.276 ms | 136.446 ms | 34.70 |    2.80 |

I can't get the 12k ratio now as I'm running it at home, and I concur with @paulirwin that it is "only" 30x, which is still unacceptable. I'm quite confident I got 12000x degradation originally, though I've had it on a slightly modified benchmark, which I simplified when opening the GitHub issue. I didn't stash the changes, so now they are gone. If I manage to get it again, I will post it.

@jeme thanks a deeper benchmark, here are my results

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | Mean         | Error      | StdDev     | Median       | Ratio  | RatioSD |
|------------ |--------- |--------- |-------------:|-----------:|-----------:|-------------:|-------:|--------:|
| Write1Kb    | .NET 7.0 | .NET 7.0 |     28.18 μs |   0.337 μs |   0.299 μs |     28.17 μs |   1.00 |    0.00 |
| Write1Kb    | .NET 8.0 | .NET 8.0 |  4,675.38 μs |  89.870 μs | 150.152 μs |  4,667.45 μs | 164.77 |    6.64 |
|             |          |          |              |            |            |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 |     28.12 μs |   0.413 μs |   0.366 μs |     28.12 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 |  4,633.83 μs |  90.353 μs | 253.360 μs |  4,624.14 μs | 171.17 |    7.64 |
|             |          |          |              |            |            |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 |     29.03 μs |   0.240 μs |   0.201 μs |     28.98 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 |  4,449.07 μs |  88.012 μs | 179.785 μs |  4,459.38 μs | 151.98 |    6.81 |
|             |          |          |              |            |            |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 |  4,815.22 μs |  94.760 μs | 141.833 μs |  4,813.58 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 |  4,997.23 μs |  97.949 μs | 120.290 μs |  5,009.46 μs |   1.03 |    0.04 |
|             |          |          |              |            |            |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 |  5,093.80 μs |  91.040 μs |  85.159 μs |  5,106.26 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 |  5,299.68 μs | 104.049 μs |  92.237 μs |  5,292.41 μs |   1.04 |    0.02 |
|             |          |          |              |            |            |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | 12,569.03 μs | 248.962 μs | 572.032 μs | 12,473.78 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | 13,267.28 μs | 264.196 μs | 714.270 μs | 13,017.40 μs |   1.06 |    0.06 |

The results corroborate with @jeme, which is 150x performance degradation.

@eladmarg I respectfully disagree that this is a design bug. The pattern that is used in the initial benchmark is quite often used by those who use Lucene, calling it soft commit. This paradigm is used in Solr and other custom implementations. The index is not flushed to the disk, by refreshing the searcher after index writer update the changes are reflected in new searches which is important when near-realtime updates are required.
In case you know a better way to handle this use case I'm willing to learn.

In any case, @jeme benchmark points to a framework performance issue.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

its not related to the index / code. all performance degradation by the filestream.

just to validate this, i changed the FSDirectory file stream with flags, and the problem gone.

file = new FileStream(
    path: Path.Combine(parent.m_directory.FullName, name),
    mode: FileMode.OpenOrCreate,
    access: FileAccess.Write,
    share: FileShare.ReadWrite,
    bufferSize: CHUNK_SIZE,
    FileOptions.Asynchronous | FileOptions.RandomAccess | FileOptions.WriteThrough);

any chance to let Microsoft to know about it? here

Ohh mo***** - now I have to run the tests again with different variations on those FileOptions :D...
Thanks though! ;)...

I will probably attempt to submit this once I have run those more detailed benchmarks to get a deeper picture.

@eladmarg
Copy link
Contributor

@AddictedCS - i agree with you totally,
what i meant is that even if you calling 1000 times per second to MaybeRefresh() (which is exactly soft commit)
the library shouldn't call fsync so much (fsync is hard commit)

hence, should be a debounce mechanism on the library which somehow isn't happening.

of course this is another discussion and isn't related to this netcore framework bug which shouldn't happen in the first place.

@jeme
Copy link
Contributor

jeme commented Mar 23, 2024

@eladmarg At least such a debounce can actually be implemented on the caller side, it doesn't exist in the java version (if I am to judge from the documentation) - So i guess the general stance would be that Lucene.net would not add such a thing.

Just found this at random: https://github.com/coddicat/DebounceThrottle/blob/master/DebounceThrottle/DebounceDispatcherGeneric.cs

Maybe that can be of inspiration for people seeking this.

@eladmarg
Copy link
Contributor

just for curiosity, (after i changed the FileStream) - the problem was gone on 8,
but seems like on 9 Microsoft also has some performance degragation.

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3296/23H2/2023Update/SunValley3)
AMD Ryzen Threadripper 3970X, 1 CPU, 16 logical and 16 physical cores
.NET SDK 9.0.100-preview.2.24157.14
  [Host]   : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.12805), X64 RyuJIT AVX2

Server=True

| Method      | Job      | Runtime  | Mean       | Error      | StdDev     | Median     | Ratio | RatioSD |
|------------ |--------- |--------- |-----------:|-----------:|-----------:|-----------:|------:|--------:|
| Write1Kb    | .NET 8.0 | .NET 8.0 |   4.970 ms |  0.3795 ms |  1.1130 ms |   4.857 ms |  1.00 |    0.00 |
| Write1Kb    | .NET 9.0 | .NET 9.0 |  10.373 ms |  0.7447 ms |  2.1839 ms |  10.273 ms |  2.20 |    0.70 |
|             |          |          |            |            |            |            |       |         |
| Write2Kb    | .NET 8.0 | .NET 8.0 |   4.566 ms |  0.3104 ms |  0.9105 ms |   4.601 ms |  1.00 |    0.00 |
| Write2Kb    | .NET 9.0 | .NET 9.0 |   9.956 ms |  1.2714 ms |  3.7486 ms |   8.913 ms |  2.21 |    0.82 |
|             |          |          |            |            |            |            |       |         |
| Write4Kb    | .NET 8.0 | .NET 8.0 |   4.340 ms |  0.3761 ms |  1.0970 ms |   4.229 ms |  1.00 |    0.00 |
| Write4Kb    | .NET 9.0 | .NET 9.0 |  12.500 ms |  0.8261 ms |  2.3835 ms |  12.313 ms |  3.09 |    0.98 |
|             |          |          |            |            |            |            |       |         |
| Write512Kb  | .NET 8.0 | .NET 8.0 |  11.235 ms |  0.7612 ms |  2.2205 ms |  10.852 ms |  1.00 |    0.00 |
| Write512Kb  | .NET 9.0 | .NET 9.0 |  40.424 ms |  3.9548 ms | 11.6608 ms |  36.904 ms |  3.90 |    1.77 |
|             |          |          |            |            |            |            |       |         |
| Write1024Kb | .NET 8.0 | .NET 8.0 |  14.783 ms |  0.6947 ms |  2.0484 ms |  14.620 ms |  1.00 |    0.00 |
| Write1024Kb | .NET 9.0 | .NET 9.0 |  52.843 ms |  1.4089 ms |  4.1321 ms |  53.069 ms |  3.64 |    0.60 |
|             |          |          |            |            |            |            |       |         |
| Write16Mb   | .NET 8.0 | .NET 8.0 | 724.461 ms | 14.1549 ms | 20.7481 ms | 726.672 ms |  1.00 |    0.00 |
| Write16Mb   | .NET 9.0 | .NET 9.0 | 529.327 ms | 10.2676 ms | 15.9854 ms | 528.721 ms |  0.73 |    0.03 |

so, after all - this isn't a lucenenet issue, its only Microsoft bug.

this issue can be closed.

@AddictedCS
Copy link
Author

AddictedCS commented Mar 24, 2024

Since I can't remember the exact benchmark data setup which generated 12k X degradation, I will edit the issue to reflect the reproducible 30x degradation.

I agree, it seems to be a Microsoft issue, exacerbated when the code executes on a macOS ARM chip. The 150x degradation in FileStream when writing small files is baffling.

BTW, I've executed @jeme benchmark with added flags in FileStream initialization: FileOptions.Asynchronous | FileOptions.RandomAccess | FileOptions.WriteThrough. I was expecting the issue to be gone (if the FSDirectory change helped, why wouldn't it work in the localized benchmark). The results show a slight improvement, but the performance is still unbearable, 120x degradation on macOS ARM.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | Mean         | Error      | StdDev     | Median       | Ratio  | RatioSD |
|------------ |--------- |--------- |-------------:|-----------:|-----------:|-------------:|-------:|--------:|
| Write1Kb    | .NET 7.0 | .NET 7.0 |     41.62 μs |   0.226 μs |   0.211 μs |     41.62 μs |   1.00 |    0.00 |
| Write1Kb    | .NET 8.0 | .NET 8.0 |  4,943.71 μs |  97.288 μs | 180.330 μs |  4,986.08 μs | 121.49 |    4.02 |
|             |          |          |              |            |            |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 |     41.02 μs |   0.809 μs |   2.117 μs |     41.68 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 |  4,860.25 μs |  94.275 μs | 129.045 μs |  4,846.69 μs | 119.22 |    9.06 |
|             |          |          |              |            |            |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 |     41.47 μs |   0.422 μs |   0.374 μs |     41.46 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 |  4,907.08 μs |  97.922 μs | 146.565 μs |  4,886.79 μs | 119.32 |    3.41 |
|             |          |          |              |            |            |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 |  5,153.03 μs | 102.662 μs | 233.814 μs |  5,156.53 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 |  5,154.92 μs |  84.612 μs |  97.439 μs |  5,161.52 μs |   1.01 |    0.03 |
|             |          |          |              |            |            |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 |  5,237.81 μs | 102.777 μs | 114.236 μs |  5,205.71 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 |  5,219.92 μs |  91.374 μs |  85.472 μs |  5,228.99 μs |   1.00 |    0.02 |
|             |          |          |              |            |            |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | 10,028.00 μs | 198.229 μs | 404.930 μs |  9,891.89 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | 10,148.77 μs | 202.662 μs | 465.649 μs | 10,070.86 μs |   1.02 |    0.02 |

@jeme let me know if you intend to open an issue in dotnet/core repository. I will close this issue then.

@AddictedCS AddictedCS changed the title Performance decrease 12000x when running on .NET 8 Performance decrease 30x when running on .NET 8 Mar 24, 2024
@AddictedCS
Copy link
Author

And here are the results for different sets of parameters. The params options set as follows:

[Params((FileOptions.Asynchronous | FileOptions.RandomAccess | FileOptions.WriteThrough),
        (FileOptions.Asynchronous | FileOptions.RandomAccess),
        (FileOptions.Asynchronous | FileOptions.WriteThrough),
        (FileOptions.RandomAccess | FileOptions.WriteThrough),
        (FileOptions.Asynchronous),
        (FileOptions.RandomAccess),
        (FileOptions.WriteThrough))]
    public FileOptions Options;

The degradation varies between 120x-180x.


BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | Options              | Mean         | Error      | StdDev       | Median       | Ratio  | RatioSD |
|------------ |--------- |--------- |--------------------- |-------------:|-----------:|-------------:|-------------:|-------:|--------:|
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **WriteThrough**         |     **36.68 μs** |   **0.291 μs** |     **0.243 μs** |     **36.75 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | WriteThrough         |  4,973.23 μs |  86.943 μs |    81.327 μs |  4,998.96 μs | 135.48 |    2.83 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | WriteThrough         |     40.38 μs |   0.792 μs |     1.323 μs |     40.69 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | WriteThrough         |  4,941.10 μs |  98.673 μs |   228.689 μs |  4,975.19 μs | 123.60 |    6.87 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | WriteThrough         |     41.22 μs |   0.524 μs |     0.490 μs |     41.37 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | WriteThrough         |  4,929.82 μs |  97.550 μs |   267.042 μs |  5,001.38 μs | 115.77 |    6.19 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | WriteThrough         |  4,920.57 μs |  95.867 μs |   257.541 μs |  4,963.36 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | WriteThrough         |  5,087.13 μs |  99.106 μs |   250.454 μs |  5,112.59 μs |   1.04 |    0.08 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | WriteThrough         |  5,389.87 μs | 106.514 μs |   242.586 μs |  5,344.43 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | WriteThrough         |  5,399.41 μs | 106.635 μs |   275.259 μs |  5,349.17 μs |   1.00 |    0.07 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | WriteThrough         | 10,069.75 μs | 199.011 μs |   397.447 μs |  9,939.55 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | WriteThrough         | 10,609.77 μs | 207.562 μs |   414.524 μs | 10,532.28 μs |   1.05 |    0.03 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **Rando(...)rough [26]** |     **40.71 μs** |   **0.808 μs** |     **1.415 μs** |     **41.18 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] |  5,069.47 μs |  98.337 μs |   153.100 μs |  5,086.67 μs | 125.11 |    6.63 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | Rando(...)rough [26] |     39.86 μs |   0.642 μs |     0.502 μs |     39.70 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] |  4,876.37 μs |  97.067 μs |   219.097 μs |  4,917.56 μs | 120.90 |    7.23 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | Rando(...)rough [26] |     41.13 μs |   0.738 μs |     0.654 μs |     41.01 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] |  5,010.03 μs |  97.431 μs |   108.294 μs |  4,998.43 μs | 122.20 |    3.19 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | Rando(...)rough [26] |  5,150.66 μs | 109.929 μs |   320.668 μs |  5,148.15 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] |  5,119.08 μs | 100.497 μs |   140.883 μs |  5,122.52 μs |   1.02 |    0.07 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | Rando(...)rough [26] |  5,475.21 μs | 108.031 μs |   183.446 μs |  5,480.69 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] |  5,193.73 μs | 103.179 μs |   160.638 μs |  5,182.32 μs |   0.95 |    0.06 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | Rando(...)rough [26] | 10,108.22 μs | 201.090 μs |   406.212 μs | 10,000.39 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | Rando(...)rough [26] | 10,041.39 μs | 199.535 μs |   398.493 μs |  9,930.46 μs |   0.99 |    0.03 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **Async(...)rough [26]** |     **40.99 μs** |   **0.609 μs** |     **0.540 μs** |     **40.92 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | Async(...)rough [26] |  4,951.12 μs |  95.386 μs |   228.538 μs |  4,981.84 μs | 123.35 |    5.46 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | Async(...)rough [26] |     41.40 μs |   0.418 μs |     0.391 μs |     41.54 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | Async(...)rough [26] |  4,904.11 μs |  96.342 μs |   187.908 μs |  4,935.10 μs | 119.32 |    5.34 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | Async(...)rough [26] |     40.69 μs |   0.796 μs |     0.852 μs |     40.71 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | Async(...)rough [26] |  5,124.55 μs | 101.950 μs |   297.392 μs |  5,121.22 μs | 124.31 |    9.37 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | Async(...)rough [26] |  5,129.82 μs |  98.803 μs |   117.618 μs |  5,130.39 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | Async(...)rough [26] |  5,302.62 μs | 105.567 μs |   278.104 μs |  5,323.11 μs |   1.04 |    0.05 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | Async(...)rough [26] |  5,414.26 μs | 107.056 μs |   187.501 μs |  5,455.81 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | Async(...)rough [26] |  5,625.89 μs | 111.672 μs |   261.030 μs |  5,609.76 μs |   1.05 |    0.07 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | Async(...)rough [26] | 10,061.14 μs | 201.102 μs |   396.956 μs |  9,938.46 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | Async(...)rough [26] | 10,175.71 μs | 202.610 μs |   481.524 μs | 10,036.81 μs |   1.01 |    0.07 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **Rando(...)rough [40]** |     **41.07 μs** |   **0.256 μs** |     **0.227 μs** |     **41.12 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] |  5,051.85 μs |  94.100 μs |   122.357 μs |  5,036.35 μs | 122.72 |    3.31 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | Rando(...)rough [40] |     40.25 μs |   0.366 μs |     0.342 μs |     40.19 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] |  5,171.39 μs | 101.519 μs |   177.803 μs |  5,183.12 μs | 127.15 |    4.60 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | Rando(...)rough [40] |     41.20 μs |   0.700 μs |     0.655 μs |     41.17 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] |  5,076.50 μs | 101.333 μs |   160.725 μs |  5,106.31 μs | 122.77 |    4.92 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | Rando(...)rough [40] |  5,235.58 μs | 102.545 μs |   140.365 μs |  5,232.39 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] |  5,143.66 μs |  98.169 μs |   127.647 μs |  5,149.63 μs |   0.98 |    0.04 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | Rando(...)rough [40] |  5,377.18 μs | 107.115 μs |   287.759 μs |  5,353.74 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] |  5,292.95 μs | 103.324 μs |   118.988 μs |  5,291.10 μs |   0.98 |    0.05 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | Rando(...)rough [40] | 10,045.56 μs | 196.723 μs |   359.719 μs |  9,975.61 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | Rando(...)rough [40] | 10,132.24 μs | 200.504 μs |   409.577 μs | 10,062.68 μs |   1.01 |    0.02 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **RandomAccess**         |     **27.67 μs** |   **0.263 μs** |     **0.246 μs** |     **27.77 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | RandomAccess         |  4,458.41 μs |  87.946 μs |    97.752 μs |  4,467.41 μs | 161.07 |    4.47 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | RandomAccess         |     27.98 μs |   0.146 μs |     0.136 μs |     27.98 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | RandomAccess         |  4,865.28 μs |  96.690 μs |   233.517 μs |  4,916.11 μs | 172.14 |   12.04 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | RandomAccess         |     28.37 μs |   0.228 μs |     0.213 μs |     28.33 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | RandomAccess         |  4,769.64 μs |  93.957 μs |   181.022 μs |  4,769.61 μs | 162.15 |    5.40 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | RandomAccess         |  5,007.21 μs |  80.269 μs |    71.156 μs |  5,015.11 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | RandomAccess         |  5,076.00 μs |  82.783 μs |    77.436 μs |  5,062.07 μs |   1.01 |    0.02 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | RandomAccess         |  5,237.61 μs | 104.104 μs |   158.977 μs |  5,247.27 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | RandomAccess         |  5,219.69 μs | 100.639 μs |    89.214 μs |  5,219.81 μs |   1.01 |    0.04 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | RandomAccess         | 12,571.40 μs | 260.453 μs |   763.864 μs | 12,604.09 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | RandomAccess         | 12,711.32 μs | 428.128 μs | 1,255.625 μs | 12,962.35 μs |   1.02 |    0.12 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **Asynchronous**         |     **27.61 μs** |   **0.291 μs** |     **0.272 μs** |     **27.64 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | Asynchronous         |  5,026.22 μs |  98.316 μs |   100.964 μs |  5,003.53 μs | 182.20 |    4.59 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | Asynchronous         |     28.23 μs |   0.538 μs |     0.576 μs |     28.34 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | Asynchronous         |  4,946.44 μs |  61.866 μs |    57.870 μs |  4,952.79 μs | 175.12 |    3.58 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | Asynchronous         |     28.64 μs |   0.239 μs |     0.224 μs |     28.68 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | Asynchronous         |  4,969.19 μs |  94.254 μs |    88.165 μs |  4,971.37 μs | 173.51 |    2.59 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | Asynchronous         |  4,971.69 μs |  53.480 μs |    47.409 μs |  4,978.96 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | Asynchronous         |  5,003.56 μs |  76.701 μs |    64.049 μs |  5,022.03 μs |   1.01 |    0.02 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | Asynchronous         |  5,145.74 μs |  74.349 μs |    69.546 μs |  5,149.54 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | Asynchronous         |  5,232.12 μs |  87.522 μs |    77.586 μs |  5,208.61 μs |   1.02 |    0.02 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | Asynchronous         | 10,666.04 μs | 210.645 μs |   453.436 μs | 10,518.88 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | Asynchronous         | 10,651.27 μs | 211.794 μs |   442.092 μs | 10,555.68 μs |   1.00 |    0.07 |
|             |          |          |                      |              |            |              |              |        |         |
| **Write1Kb**    | **.NET 7.0** | **.NET 7.0** | **Rando(...)onous [26]** |     **27.64 μs** |   **0.541 μs** |     **0.722 μs** |     **27.79 μs** |   **1.00** |    **0.00** |
| Write1Kb    | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] |  4,903.27 μs |  89.176 μs |   148.993 μs |  4,925.48 μs | 176.32 |    7.55 |
|             |          |          |                      |              |            |              |              |        |         |
| Write2Kb    | .NET 7.0 | .NET 7.0 | Rando(...)onous [26] |     28.48 μs |   0.374 μs |     0.312 μs |     28.54 μs |   1.00 |    0.00 |
| Write2Kb    | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] |  4,942.60 μs |  97.433 μs |   162.789 μs |  4,956.53 μs | 168.21 |    4.54 |
|             |          |          |                      |              |            |              |              |        |         |
| Write4Kb    | .NET 7.0 | .NET 7.0 | Rando(...)onous [26] |     28.51 μs |   0.229 μs |     0.203 μs |     28.44 μs |   1.00 |    0.00 |
| Write4Kb    | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] |  4,910.03 μs |  58.850 μs |    55.048 μs |  4,930.39 μs | 172.19 |    2.28 |
|             |          |          |                      |              |            |              |              |        |         |
| Write512Kb  | .NET 7.0 | .NET 7.0 | Rando(...)onous [26] |  5,081.34 μs |  78.255 μs |    73.200 μs |  5,086.79 μs |   1.00 |    0.00 |
| Write512Kb  | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] |  5,045.27 μs |  99.064 μs |   114.082 μs |  5,056.80 μs |   0.99 |    0.03 |
|             |          |          |                      |              |            |              |              |        |         |
| Write1024Kb | .NET 7.0 | .NET 7.0 | Rando(...)onous [26] |  5,276.99 μs | 104.845 μs |   136.329 μs |  5,274.49 μs |   1.00 |    0.00 |
| Write1024Kb | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] |  5,228.08 μs | 103.547 μs |   286.930 μs |  5,200.77 μs |   0.96 |    0.05 |
|             |          |          |                      |              |            |              |              |        |         |
| Write16Mb   | .NET 7.0 | .NET 7.0 | Rando(...)onous [26] | 10,703.79 μs | 213.802 μs |   473.770 μs | 10,511.99 μs |   1.00 |    0.00 |
| Write16Mb   | .NET 8.0 | .NET 8.0 | Rando(...)onous [26] | 12,362.44 μs | 476.119 μs | 1,373.711 μs | 12,661.85 μs |   1.12 |    0.15 |

@AddictedCS
Copy link
Author

AddictedCS commented Mar 24, 2024

I may have narrowed down the issue to the Buffer Size value. If the buffer size is larger than the payload, performance drops significantly on NET 8. If it is less than the buffer size, performance is on par with .NET 7 or even slightly faster.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method     | Job      | Runtime  | BufferSize | Mean        | Error      | StdDev     | Ratio  | RatioSD |
|----------- |--------- |--------- |----------- |------------:|-----------:|-----------:|-------:|--------:|
| Write1Kb   | .NET 7.0 | .NET 7.0 | 1024       | 4,887.55 μs |  93.198 μs | 117.866 μs |   1.00 |    0.00 |
| Write1Kb   | .NET 8.0 | .NET 8.0 | 1024       | 4,801.20 μs |  95.386 μs | 207.361 μs |   0.97 |    0.06 |
|            |          |          |            |             |            |            |        |         |
| Write2Kb   | .NET 7.0 | .NET 7.0 | 1024       | 5,008.96 μs |  98.276 μs | 140.944 μs |   1.00 |    0.00 |
| Write2Kb   | .NET 8.0 | .NET 8.0 | 1024       | 4,749.93 μs |  94.161 μs | 267.120 μs |   0.97 |    0.05 |
|            |          |          |            |             |            |            |        |         |
| Write4Kb   | .NET 7.0 | .NET 7.0 | 1024       | 4,961.12 μs |  98.620 μs | 216.472 μs |   1.00 |    0.00 |
| Write4Kb   | .NET 8.0 | .NET 8.0 | 1024       | 4,886.21 μs |  97.469 μs | 285.861 μs |   0.97 |    0.07 |
|            |          |          |            |             |            |            |        |         |
| Write512Kb | .NET 7.0 | .NET 7.0 | 1024       | 5,310.07 μs | 104.943 μs | 204.684 μs |   1.00 |    0.00 |
| Write512Kb | .NET 8.0 | .NET 8.0 | 1024       | 5,110.50 μs |  99.717 μs | 271.287 μs |   0.95 |    0.06 |
|            |          |          |            |             |            |            |        |         |
| Write1Kb   | .NET 7.0 | .NET 7.0 | 4096       |    29.20 μs |   0.479 μs |   0.425 μs |   1.00 |    0.00 |
| Write1Kb   | .NET 8.0 | .NET 8.0 | 4096       | 4,813.45 μs |  95.872 μs | 214.431 μs | 164.20 |    7.41 |
|            |          |          |            |             |            |            |        |         |
| Write2Kb   | .NET 7.0 | .NET 7.0 | 4096       |    27.79 μs |   0.549 μs |   1.018 μs |   1.00 |    0.00 |
| Write2Kb   | .NET 8.0 | .NET 8.0 | 4096       | 5,037.09 μs |  91.438 μs | 122.067 μs | 182.70 |    8.50 |
|            |          |          |            |             |            |            |        |         |
| Write4Kb   | .NET 7.0 | .NET 7.0 | 4096       | 4,860.28 μs |  96.641 μs | 203.848 μs |   1.00 |    0.00 |
| Write4Kb   | .NET 8.0 | .NET 8.0 | 4096       | 5,110.86 μs |  94.640 μs |  92.949 μs |   1.04 |    0.03 |
|            |          |          |            |             |            |            |        |         |
| Write512Kb | .NET 7.0 | .NET 7.0 | 4096       | 5,038.07 μs | 100.634 μs | 272.069 μs |   1.00 |    0.00 |
| Write512Kb | .NET 8.0 | .NET 8.0 | 4096       | 5,340.02 μs | 105.455 μs | 217.782 μs |   1.08 |    0.08 |
|            |          |          |            |             |            |            |        |         |
| Write1Kb   | .NET 7.0 | .NET 7.0 | 16384      |    28.26 μs |   0.295 μs |   0.261 μs |   1.00 |    0.00 |
| Write1Kb   | .NET 8.0 | .NET 8.0 | 16384      | 4,619.00 μs |  91.768 μs | 231.909 μs | 163.67 |    7.33 |
|            |          |          |            |             |            |            |        |         |
| Write2Kb   | .NET 7.0 | .NET 7.0 | 16384      |    28.54 μs |   0.447 μs |   0.419 μs |   1.00 |    0.00 |
| Write2Kb   | .NET 8.0 | .NET 8.0 | 16384      | 4,631.66 μs |  92.227 μs | 254.021 μs | 161.77 |   10.73 |
|            |          |          |            |             |            |            |        |         |
| Write4Kb   | .NET 7.0 | .NET 7.0 | 16384      |    28.88 μs |   0.571 μs |   1.166 μs |   1.00 |    0.00 |
| Write4Kb   | .NET 8.0 | .NET 8.0 | 16384      | 4,737.23 μs |  94.672 μs | 255.952 μs | 164.40 |   11.82 |
|            |          |          |            |             |            |            |        |         |
| Write512Kb | .NET 7.0 | .NET 7.0 | 16384      | 4,938.89 μs |  97.829 μs | 224.779 μs |   1.00 |    0.00 |
| Write512Kb | .NET 8.0 | .NET 8.0 | 16384      | 4,987.08 μs |  99.350 μs | 202.946 μs |   1.02 |    0.08 |

If we disable buffering altogether by specifying bufferSize: 0, the performance gets aligned between .NET 7 and .NET 8. This is a clear indication something is messed up on the framework level.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method     | Job      | Runtime  | Mean     | Error     | StdDev    | Ratio | RatioSD |
|----------- |--------- |--------- |---------:|----------:|----------:|------:|--------:|
| Write1Kb   | .NET 7.0 | .NET 7.0 | 4.569 ms | 0.0902 ms | 0.2246 ms |  1.00 |    0.00 |
| Write1Kb   | .NET 8.0 | .NET 8.0 | 4.634 ms | 0.0925 ms | 0.2108 ms |  1.02 |    0.07 |
|            |          |          |          |           |           |       |         |
| Write2Kb   | .NET 7.0 | .NET 7.0 | 4.678 ms | 0.0930 ms | 0.1770 ms |  1.00 |    0.00 |
| Write2Kb   | .NET 8.0 | .NET 8.0 | 4.702 ms | 0.0905 ms | 0.0846 ms |  0.99 |    0.03 |
|            |          |          |          |           |           |       |         |
| Write4Kb   | .NET 7.0 | .NET 7.0 | 4.710 ms | 0.0929 ms | 0.1790 ms |  1.00 |    0.00 |
| Write4Kb   | .NET 8.0 | .NET 8.0 | 4.739 ms | 0.0944 ms | 0.1413 ms |  1.01 |    0.05 |
|            |          |          |          |           |           |       |         |
| Write512Kb | .NET 7.0 | .NET 7.0 | 4.905 ms | 0.0903 ms | 0.0844 ms |  1.00 |    0.00 |
| Write512Kb | .NET 8.0 | .NET 8.0 | 4.906 ms | 0.0930 ms | 0.0955 ms |  1.00 |    0.03 |

It seems the problem is in the FSync call; at least, this is the difference between buffered, non-buffered and buffer size less than the payload, implementation.

Buffered (buffer size larger than the payload) stack trace (slow performance):
Screenshot 2024-03-24 at 6 20 30 PM

Non-buffered stack trace (fast performance):
Screenshot 2024-03-24 at 7 04 19 PM

Buffered (buffer size less than payload) (fast performance):
Screenshot 2024-03-24 at 8 53 02 PM

In summary, the DotNet Framework seems to call the FSync() method when the buffer is bigger than the payload, which leads to performance degradation.

@paulirwin
Copy link
Contributor

Would one of you like to submit an issue to the .NET team on this? Feel free to use my benchmark results as well and please tag me in it. Thanks!

@AddictedCS
Copy link
Author

Closing this issue, moving discussion to dotnet/runtime#100229

@jeme
Copy link
Contributor

jeme commented Mar 25, 2024

Unfortunately, when lining up the data, this is likely a bug in .NET 7 rather than 8, which means the performance will stay is it is going forward. (disabling the buffers brings the numbers up to .NET 8 numbers, not down to .NET 7 numbers)

@AddictedCS
Copy link
Author

AddictedCS commented Mar 25, 2024

Yeah, can we then revive the discussion on soft-commit? maybeRefresh should theoretically not touch the disk, and only the indexWritter.Flush and Commit should do it. Up until .NET 7 it seems this was inadvertedly the case because of the dotnet/runtime#77373, now since .NET 8 it becomes a problem.

@AddictedCS
Copy link
Author

AddictedCS commented Mar 25, 2024

BTW, if we analyze the 77374 and 77373 the bug report and the fix could not provide a test case to reproduce the issue.

No simple reproduction is available but the problem is obvious from the method itself.

The decision to remove the if statement may not be that obvious if the original writer had put it there on purpose as an optimization. It's suspicious that the bug existed for years, and nobody reported a data corruption issue.

@paulirwin
Copy link
Contributor

Does anyone know if this behaves the same in Java Lucene?

@jeme
Copy link
Contributor

jeme commented Mar 25, 2024

@paulirwin On this level I guess it becomes hard to say as you would need to know how Java's internal IO libraries would work. It's not a 1-to-1 for sure, but since Java and .NET has somewhat different IO layers, there can be reasoning behind the differences.

https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/store/FSDirectory.java#L411

@AddictedCS in most cases data will get flushed correctly anyways on dispose, it's however something that might be left for the OS or even the actual disk/diskdrivers to manage (cashes on the disk it self helps speed up IO) which means that you would find your self in a rather special case if that where to happen and depending on the software, I don't think you would get to the position where you would find this bug anyways, but if you are writing something that should be ACID compliant you would need it.

@paulirwin
Copy link
Contributor

IMO, if Java also flushes in this scenario and matches .NET 8, then I'm inclined to keep the behavior the same (at least for now...) and instead better document performance considerations. If Java matches .NET <= 7, then I think we can explore a solution here for the 4.8 release, although given the part of code this is in, this is a very risky change that would need to be tested sufficiently.

@jeme
Copy link
Contributor

jeme commented Mar 25, 2024

@paulirwin As far as the hard flush goes, one can get rid of this as:

public class CustomMMapDirectory : MMapDirectory
{
    public CustomMMapDirectory(DirectoryInfo path, LockFactory lockFactory) : base(path, lockFactory) { }
    public CustomMMapDirectory(DirectoryInfo path) : base(path) { }
    public CustomMMapDirectory(DirectoryInfo path, LockFactory lockFactory, int maxChunkSize) : base(path, lockFactory, maxChunkSize) { }
    public CustomMMapDirectory(string path, LockFactory lockFactory) : base(path, lockFactory) { }
    public CustomMMapDirectory(string path) : base(path) { }
    public CustomMMapDirectory(string path, LockFactory lockFactory, int maxChunkSize) : base(path, lockFactory, maxChunkSize) { }

    public override IndexOutput CreateOutput(string name, IOContext context)
    {
        EnsureOpen();

        EnsureCanWrite(name);
        return new CustomFSIndexOutput(this, name);
    }

    protected class CustomFSIndexOutput : IndexOutput
    {
        private readonly CustomMMapDirectory directory;
        public const int DEFAULT_BUFFER_SIZE = 16384;
        private const int CHUNK_SIZE = DEFAULT_BUFFER_SIZE;
        internal readonly string name;

        private readonly CRC32 crc = new CRC32();
        private readonly FileStream file;
        private volatile bool isOpen; // remember if the file is open, so that we don't try to close it more than once

        public override long Length => file.Length;
        public override long Position => file.Position;

        public override long Checksum => crc.Value;

        public CustomFSIndexOutput(CustomMMapDirectory directory, string name)
        {
            this.directory = directory;
            this.name = name;
            file = new FileStream(
                path: Path.Combine(this.directory.m_directory.FullName, name),
                mode: FileMode.OpenOrCreate,
                access: FileAccess.Write,
                share: FileShare.ReadWrite,
                bufferSize: CHUNK_SIZE);
            isOpen = true;
        }


        public override void WriteByte(byte b)
        {
            CheckDisposed();

            crc.Update(b);
            file.WriteByte(b);
        }

        public override void WriteBytes(byte[] b, int offset, int length)
        {
            CheckDisposed();

            crc.Update(b, offset, length);
            file.Write(b, offset, length);
        }

        public override void Flush()
        {
            CheckDisposed();
            file.Flush();
        }

        protected override void Dispose(bool disposing)
        {
            if (!disposing) return;

            //Method is empty
            //parent.OnIndexOutputClosed(this);
            if (!isOpen) return;

            Exception priorE = null; 
            try
            {
                file.Flush(flushToDisk: false);
            }
            catch (Exception ioe) when (ioe is IOException or UnauthorizedAccessException or ObjectDisposedException)
            {
                priorE = ioe;
            }
            finally
            {
                isOpen = false;
                IOUtils.DisposeWhileHandlingException(priorE, file);
            }
        }

        public override void Seek(long pos)
        {
            CheckDisposed();

            file.Seek(pos, SeekOrigin.Begin);
        }

        private void CheckDisposed()
        {
            if (!isOpen)
                throw new ObjectDisposedException("");
        }
    }
}

But you take on some of the responsibility instead of leaving it to the core library, you also need to accept that your leaving the code more open to failure, so if your index is your only storage then perhaps it's best not to do so, even though a error here would be considered very rare. But as a short term fix that should do.

However that does not really change the behavior as @AddictedCS requests, and since we don't really want to diverge to much from the java source, that is a bit more problematic. I did look into how "easy/hard" it might be to extend the index writer and maybe just override the GetReader part to avoid flushing, as it stands, that's a much more involved task unfortunately.

@paulirwin
Copy link
Contributor

I created a reproduction of the original benchmark using jmh. Forgive my quick and dirty Java port 😄 Please review and let me know if I made any mistakes.

Make a project using the jmh mvn archetype, replace the benchmark code with the code below, and add mvn dependencies as below

package org.example;

import org.apache.commons.io.FileUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.facet.FacetField;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.SearcherFactory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;
import org.apache.lucene.util.Version;
import org.openjdk.jmh.annotations.*;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.concurrent.ThreadLocalRandom;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class MyBenchmark {

    private DirectoryTaxonomyWriter taxonomyWriter;
    private IndexWriter indexWriter;
    private Document[] documents;
    private FacetsConfig facetsConfig;
    private SearcherTaxonomyManager searcherManager;

    @Setup
    public void setup() {
        if (Files.exists(Paths.get("test_index"))) {
            try {
                FileUtils.deleteDirectory(new File("test_index"));
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        if (Files.exists(Paths.get("test_facets"))) {
            try {
                FileUtils.deleteDirectory(new File("test_facets"));
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);

        try {
            Directory luceneDirectory = new MMapDirectory(new File("test_index"));
            indexWriter = new IndexWriter(luceneDirectory, new IndexWriterConfig(Version.LUCENE_48, analyzer));
            taxonomyWriter = new DirectoryTaxonomyWriter(new MMapDirectory(new File("test_facets")));
            searcherManager = new SearcherTaxonomyManager(indexWriter, true, new SearcherFactory(), taxonomyWriter);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

        facetsConfig = new FacetsConfig();
        facetsConfig.setRequireDimCount("track_id", true);

        documents = new Document[N];
        for (int i = 0; i < N; i++)
        {
            String facet = generateRandomString(5);
            documents[i] = new Document();
            documents[i].add(new StringField("_id", Integer.toString(i), Field.Store.YES));
            documents[i].add(new TextField("content", generateRandomString(10), Field.Store.YES));
            documents[i].add(new FacetField("track_id", facet));
        }
    }

    @Param({"25"})
    public int N;

    @Benchmark
    public void indexDocumentsBenchmark() {
        for (int i = 0; i < documents.length; ++i)
        {
            try {
                Document taxonomyDocument = facetsConfig.build(taxonomyWriter, documents[i]);
                indexWriter.updateDocument(new Term("_id", Integer.toString(i)), taxonomyDocument);
                searcherManager.maybeRefresh(); // maybe refresh causing dramatic performance drop on .NET 8.0
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    }

    private static String generateRandomString(int length)
    {
        // more spaces added on purpose
        final String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789         ";
        char[] stringChars = new char[length];

        for (int i = 0; i < length; i++)
        {
            stringChars[i] = chars.charAt(ThreadLocalRandom.current().nextInt(chars.length()));
        }

        return new String(stringChars);
    }
}

additional mvn deps:

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>4.8.0</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-facet</artifactId>
    <version>4.8.0</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-common</artifactId>
    <version>4.8.0</version>
</dependency>
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.15.1</version>
</dependency>

I ran it against JDK 11, 17, and 21 via specifying a manual path to the JVM to run it on with the -jvm arg, i.e. java -jar ./target/benchmarks.jar -jvm /opt/homebrew/Cellar/openjdk/21.0.2/bin/java

Java 21:

Benchmark                            (N)  Mode  Cnt  Score   Error  Units
MyBenchmark.indexDocumentsBenchmark   25  avgt   25  0.068 ± 0.004   s/op

Java 17:

Benchmark                            (N)  Mode  Cnt  Score   Error  Units
MyBenchmark.indexDocumentsBenchmark   25  avgt   25  0.062 ± 0.002   s/op

Java 11:

Benchmark                            (N)  Mode  Cnt  Score   Error  Units
MyBenchmark.indexDocumentsBenchmark   25  avgt   25  0.061 ± 0.001   s/op

The results on the order of ~60-70ms (see my N=25 results above from master), with no significant difference between Java versions, imply to me that Java behaves similarly to .NET <= 7, so we should consider a core library fix for this that is lucenenet specific. Unless I've done something wrong above...

@AddictedCS
Copy link
Author

Changing to file.Flush(flushToDisk: false); may not be such a bad idea after all. I understand the implications (no ACID guarantees), but up until .NET 8 they weren't either.

Just to add as an example, the vast majority of developers use the Dispose method to ensure the stream is closed and flushed, and by default, Dispose uses Flush(false).

Here is an example of the modified benchmark (I've removed the Flush(true), method leaving the responsibility to close and flush the stream to Dispose method:

    private static int WriteData(byte[] buffer, int bufferSize)
    {
        using var stream = 
            new FileStream(
                path: Path.Combine("test_dir", $"test.{buffer.Length}.bin"),
                mode: FileMode.OpenOrCreate,
                access: FileAccess.Write,
                share: FileShare.ReadWrite,
                bufferSize: bufferSize);
        stream.Write(buffer, 0, buffer.Length);
        // stream.Flush(flushToDisk: true);
        return buffer.Length;
    }

The results are so much faster now, both .NET 7 and 8 showing great performance.


BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | PayloadSize | BufferSize | Mean     | Error    | StdDev   | Median   | Ratio | RatioSD |
|------------ |--------- |--------- |------------ |----------- |---------:|---------:|---------:|---------:|------:|--------:|
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 1024       | 26.65 μs | 0.506 μs | 0.449 μs | 26.53 μs |  1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 1024       | 27.52 μs | 0.545 μs | 1.407 μs | 26.86 μs |  1.06 |    0.08 |
|             |          |          |             |            |          |          |          |          |       |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 4096       | 27.55 μs | 0.466 μs | 0.389 μs | 27.48 μs |  1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 4096       | 26.86 μs | 0.454 μs | 0.486 μs | 26.83 μs |  0.97 |    0.02 |
|             |          |          |             |            |          |          |          |          |       |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 1024       | 26.10 μs | 0.500 μs | 0.513 μs | 26.28 μs |  1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 1024       | 25.85 μs | 0.511 μs | 1.121 μs | 26.00 μs |  0.96 |    0.07 |
|             |          |          |             |            |          |          |          |          |       |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 4096       | 27.15 μs | 0.461 μs | 0.616 μs | 27.03 μs |  1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 4096       | 26.87 μs | 0.425 μs | 0.355 μs | 26.89 μs |  0.99 |    0.04 |

@paulirwin
Copy link
Contributor

I think I agree with that. If this is supposed to semantically represent a soft commit, I don't think we should have to do a hard flush to disk.

@paulirwin
Copy link
Contributor

Looks like we could add back the Fsync method as part of this change, if we were to do it:

// LUCENENET specific: Fsync is pointless in .NET, since we are
// calling FileStream.Flush(true) before the stream is disposed
// which means we never need it at the point in Java where it is called.
//protected virtual void Fsync(string name)
//{
// IOUtils.Fsync(Path.Combine(m_directory.FullName, name), false);
//}

Interestingly, the blame shows the most recent commit of that block to be a revert of a revert of a commit 😄 cc @NightOwl888

Also note this from the Java Lucene 4.8.0 release notes:

* LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring
  that all file metadata is persisted on disk in case of power failure.
  This does not work on all file systems and operating systems, but Linux
  and MacOSX are known to work. On Windows, fsyncing a directory is not
  possible with Java APIs.  (Mike McCandless, Uwe Schindler)

I'm going to reopen this issue until we come up with a solution. My instinct is to support Flush(false) and add back the Fsync method, but I'm not an expert in this area and I don't fully understand the revert/revert history here yet. I'd like Shad's input for sure.

@paulirwin paulirwin reopened this Mar 25, 2024
@NightOwl888
Copy link
Contributor

@paulirwin - I agree. If it is possible to mimic an fsync in .NET, we should also do that. This may be the root cause of #894.

As for the history, this goes back to this conversation: https://lists.apache.org/[email protected]:2017-1 and anything from Vincent after that point. He helped pull what we had in the Store namespace together and make it more stable and efficient. It was around this time when we nix'd fsync, but at that point .NET Core support was still in the works and we had no way to test on anything but Windows. So, the benefit of supporting fsync seems more clear now. Whether providing fsync is feasible in .NET is another question. We had to drastically alter the way file locking works in order to provide support in .NET and I am not sure whether that is also relevant to this conversation.

I haven't gone over the benchmarks in detail, but adding the FileOptions also sounds like a good idea provided it is tested thoroughly. Not sure whether FileOptions.Asynchronous is going to shoot us in the foot. Also, we would need to review to make sure that the same options make sense for all callers. If not, we may need to consider a way to change the options on the public or internal APIs.

Based on the LUCENE-5588 comment, it also seems that supporting that JRE crash test (ACID) on Windows is probably not possible. It would probably be worth a try to disable that test on Windows (and remove the [AwaitsFix] attribute) to see whether it always succeeds on Linux and macOS. It would have to be tested dozens of times to be sure though - it rarely fails.

@AddictedCS
Copy link
Author

I just wanted to add that CustomMMapDirectory fixes the issue. Below are the results from the original Benchmark with CustomMMapDirectory instead of MMapDirectory.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 8.0.203
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method                  | Job      | Runtime  | N  | Mean     | Error    | StdDev   | Ratio | RatioSD |
|------------------------ |--------- |--------- |--- |---------:|---------:|---------:|------:|--------:|
| IndexDocumentsBenchmark | .NET 7.0 | .NET 7.0 | 25 | 61.97 ms | 1.220 ms | 2.039 ms |  1.00 |    0.00 |
| IndexDocumentsBenchmark | .NET 8.0 | .NET 8.0 | 25 | 60.47 ms | 1.208 ms | 3.183 ms |  0.98 |    0.06 |

Thanks, @jeme, for the example!

@jeme
Copy link
Contributor

jeme commented Mar 26, 2024

just dropping this here as it may be worth knowing https://ayende.com/blog/164673/the-difference-between-fsync-write-through-through-the-os-eyes
Ayende, being one who wrote large parts of a database (RavenDB) which AFAIK actually uses a branched of version of Lucene.NET (with a different storage layer, Voron) i guess it's worth taking into account.

What bugs me out a little is that I thought the whole idea with introducing getting the reader from the IndexWriter was to enable NRT where updates would be able to be made available prior to a commit, I implicitly assumed that they also wasn't necessarily flushed to storage (disk), specifically to speed things up, but that is clearly not the case (when we look at the Java source), so that seems like a huge thing to leave on the table.

@turowicz
Copy link

Any fixes coming out anytime soon?

@eladmarg
Copy link
Contributor

this is probably by design and not Lucene specific issue.
however, maybe @AddictedCS - if possible, maybe we can add the CustomMMapDirectory to a contrib folder

@NightOwl888 what do you think?

@NightOwl888
Copy link
Contributor

I have done a bit of research on the fsync issue. The following info is helpful:

  1. Everything You Always Wanted To Know About fsync()
  2. Testing Lucene's index durability after crash or power loss

Here is the fsync approach in Lucene 4.8.0:

https://github.com/apache/lucene/blob/8fdf89690404c0e65784b2c5477552b9dec58591/lucene/core/src/java/org/apache/lucene/util/IOUtils.java#L372-L418

It supports either a file name or a directory name (a string) and is called like this:

https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/store/FSDirectory.java#L299-L316

Correct me if I am wrong, but it seems that .NET only supports fsync on individual files, but not on all of the files in a directory.

So, if that assumption is true, the only way to do fsync is to use native code. And indeed that is what is done in ravendb.

It also looks like the fsync call needs to be synchronized with any writes or flushes to the file buffer, but maybe Lucene already does external synchronization.

Do note that our file locking approach depends on FileStream, so if for some reason we cannot use FileStream, we will also need to redesign our file locking approach.

Directory Level Fsync

Ravendb has a method that accepts a directory path and will do the fsync on the whole directory, so this appears to be a suitable replacement for the FileChannel.force() call in Java when the FileChannel is based on a directory.

Unfortunately, it seems that ravendb isn't covered under one of the licenses that are acceptable by Apache. They were GNU back on version 3.5 and prior, but it doesn't look like any of those versions have this set up yet on non-Windows OSes.

File Level Fsync

In Lucene, "stale files" were tracked in a synchronized HashSet<string>. These are fsynced first before doing the fsync at the directory level.

I looked at the source of FileStream and indeed when you call Flush(false) it will simply flush into the OS cache but not do the FSync. It also does not FSync if you call Dispose(). Technically, we could open a new FileStream instance on a file string for writing and then call Flush(true) to do the file-level FSync call, but there would be some overhead to instantiate the FileStream instance. And although there is an option that can be passed that is supposed to indicate not to create a buffer, it doesn't appear to be implemented, so there would need to be at least a 1 byte array allocated as well.

So, it appears that a better option might be to use the same low level native calls that Ravendb uses at the file level as well.

Synchronization

This is the tricky bit. In Java, the FileChannel instance does blocking to ensure that no write/flush/fsync operations can happen simultaneously according to the docs in Apache Harmony. I haven't dug much into the implementation as to how to allow individual files to be written during normal application use but then be able to gain an exclusive lock on the entire directory to do the directory-level fsync.

It also appears that the need to block makes it impossible (or at least extremely difficult) to use FileOptions.Asynchronous. It is kind of pointless anyway if you don't explicitly call the FlushAsync() and WriteAsync() methods, and it takes you down a completely different implementation. Also, when it is enabled on Linux, the FSync happens automatically on FlushAsync() and on Dispose().


It would be great to get some feedback on this to see if anyone has any ideas that we could integrate. And of course setting up the project for deploying the native bits via NuGet on the various operating systems that .NET Core supports is not something I have looked into. If someone with experience doing this could provide some guidance, that would be great.

@paulirwin paulirwin self-assigned this May 10, 2024
paulirwin added a commit that referenced this issue Aug 12, 2024
* Restore fsync behavior in FSDirectory via P/Invoke

This restores the commented-out fsync behavior in FSDirectory to help
mitigate a performance regression in .NET 8.

* Use System.IO.Directory.Exists to avoid caching exists status

* Add unit test for ConcurrentHashSet.ExceptWith

* Improve errors thrown by CreateFileW

* Change FileSystemInfo use to string in IOUtils.Fsync

* Change Debug.Assert to Debugging use

* Lucene.Net.Index.TestIndexWriterOnJRECrash::TestNRTThreads_Mem(): Removed AwaitsFix attribute. The FSync implementation should fix this test.

* Make ExceptWith atomic

* Improve error handling if directory not found on Linux/macOS

* Refactor interop methods into separate partial class files

* Lucene.Net.Index.TestIndexWriterOnJRECrash::TestNRTThreads_Mem(): Added [Repeat(25)] attribute.

* Lucene.Net.Index.TestIndexWriterOnJRECrash: Instead of using a temp file to pass the process ID to kill back to the original test process, open a socket and listen for the process ID to be written.

* Synchronize access to stale files collection

This is necessary to prevent race conditions, even though this code is
not in the upstream Java code. A thread could try to add an item to the
collection after it has been synced in `Sync` but before it is removed
from the collection, then the file is removed from the collection,
resulting in a missed sync.

* Rename syncLock to m_syncLock

* Lucene.Net.Index.TestIndexWriterOnJRECrash: Added try/finally block and refactored to ensure the TcpListener and Process are cleaned up at the end of each test iteration. This makes it run ~20% faster.

* Refactor rename namespace to Lucene.Net.Native

* Mark JRE crash test as [AwaitsFix]

---------

Co-authored-by: Shad Storhaug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants