Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multithreading #362

Open
AlizerUncaged opened this issue Jul 29, 2023 · 8 comments · May be fixed by #369
Open

Support for multithreading #362

AlizerUncaged opened this issue Jul 29, 2023 · 8 comments · May be fixed by #369

Comments

@AlizerUncaged
Copy link

I have the following code below to seed a MongoDb instance of 500K+ objects

        List<Member> members = new();

        for (int i = 0; i < 500000; i++)
        {
            var data = $"{_random.Next()}";
            members.Add(new Member()
            {
                Email = $"{data}@random.com",
            });
        }

        _applicationMongoDbContext.Members.AddRange(members);

        await _applicationMongoDbContext.SaveChangesAsync();

Creating the objects takes a few milliseconds but the await _applicationMongoDbContext.SaveChangesAsync() method takes around 20 minutes I checked my VPS and found out it seems to be only utilizing the first core

image

@Turnerj
Copy link
Member

Turnerj commented Jul 29, 2023

Is that screenshot of core utilization system wide or just the dotnet process?

@AlizerUncaged
Copy link
Author

Pretty much just the dotnet process, this server is a fresh install

@solo812
Copy link

solo812 commented Sep 28, 2023

Any update on this?

@Turnerj
Copy link
Member

Turnerj commented Sep 30, 2023

I'm looking into it right now but I don't think the problem is SaveChangesAsync, it is the call to AddRange (at least in my own tests). A call to AddRange effectively internally calls Add, this then is checking the state of the entity in the change tracker.

As part of the change tracking, it needs to check ID values of each entry to ensure it doesn't already exist. While MongoFramework knows what the property is, it only knows it via the PropertyInfo type and uses GetValue - this is not a slow method per-se but it isn't something you want to call a lot.

With 500,000 entities, the adding the 1st one needs to only get its own ID, the 2nd one needs to get its own and the 1st, the 3rd one needs to get all 3 IDs. That means even at 10,000 entities, it is doing the call 10,000 times.

I likely have two options:

  • Attempt to cache the ID (this is problematic for reasons)
  • Find another way to access the real ID (this is likely the best bet for now)

The temporary alternative solution, do smaller batches of say 5,000 items, add them, save changes, clear the change tracker. Repeat this until you've cleared through all your data.

@Turnerj
Copy link
Member

Turnerj commented Sep 30, 2023

Here's the result of a quick benchmark I put together. At 100 entities it takes 0.3ms, at 1000 it takes 32ms. So for 10x the entities, it was getting 100x slower.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 357.0 us 7.02 us 12.66 us -
SetEntityState 1000 32,937.1 us 657.91 us 1,043.52 us 30 B

@Turnerj
Copy link
Member

Turnerj commented Sep 30, 2023

Added one more iteration, at 10000 entities it takes 3402ms. This is definitely the problem area.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 352.7 us 6.66 us 6.84 us 2 B
SetEntityState 1000 32,571.1 us 161.22 us 134.62 us 30 B
SetEntityState 10000 3,402,924.9 us 45,554.20 us 40,382.61 us 3656 B

@Turnerj
Copy link
Member

Turnerj commented Sep 30, 2023

One thing I'm experimenting with is, instead of using reflection, creating a delegate dynamically via expressions. That does improve performance quite a bit though the scaling is still pretty bad.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 54.62 us 1.080 us 1.326 us -
SetEntityState 1000 4,291.31 us 84.723 us 104.047 us 4 B
SetEntityState 10000 392,415.23 us 7,712.945 us 8,252.765 us 1280 B

The worst case here, at 10000 entities, now takes 392ms which is about ~88% faster. I'll see though if there is a nicer way I can improve the algorithm to avoid the calls in the first place.

@Turnerj
Copy link
Member

Turnerj commented Oct 1, 2023

Managed to find an algorithmic improvement so I don't need to check the ID of every entry if the entry we're setting doesn't have an ID defined:

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 30.62 us 0.588 us 0.764 us -
SetEntityState 1000 1,061.44 us 20.910 us 19.559 us 2 B
SetEntityState 10000 87,913.98 us 1,750.216 us 2,395.715 us 80 B

And if I stack the algorithmic improvement with the created delegate:

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 22.62 us 0.450 us 0.645 us -
SetEntityState 1000 911.66 us 17.737 us 18.979 us 1 B
SetEntityState 10000 82,002.73 us 1,611.556 us 3,104.921 us 69 B

@Turnerj Turnerj linked a pull request Oct 1, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants