Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: DAG sync and move merge outside of net package #2658

Merged

Conversation

fredcarle
Copy link
Collaborator

Relevant issue(s)

Resolves #2624

Description

This PR simplifies the DAG sync process withing the net package and moves the merge functionality to the db package. The merge is now initiated via an event channel.

Note: I did a search and replace for SchemaVersionId to SchemaVersionID. It's in its own commit. I've also remove the tests/integration/net/order tests as they are now annoying to maintain an will become even more irrelevant when we refactor the WaitForSync functionality of our test framework.

Another note: I've reduced the severity of the race condition on my Mac. We had a lot of leaking go routines and what is left of them is WaitForSync methods that sometimes seem to leak and also badger cache and libp2p transport that seem to leak go routines on close but I'm not sure how to handle these last two.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

make test

Specify the platform(s) on which this was tested:

  • MacOS

@fredcarle fredcarle added refactor This issue specific to or requires *notable* refactoring of existing codebases and components area/p2p Related to the p2p networking system labels May 31, 2024
@fredcarle fredcarle added this to the DefraDB v0.12 milestone May 31, 2024
@fredcarle fredcarle requested a review from a team May 31, 2024 04:47
@fredcarle fredcarle self-assigned this May 31, 2024
@fredcarle fredcarle marked this pull request as ready for review May 31, 2024 04:48
if err != nil {
return err
}
mt, err := mp.getHeads(ctx)
Copy link
Contributor

@AndrewSisley AndrewSisley May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: The flow and concept-status of mergeProcessor could be improved I think. At the moment it has the following problems:

  1. It is only called in this function, and given that it's functions do not return anything at any stage, it is unclear why it is not wrapped up into a single func on db
  2. Party because most of the functions do not return anything, it is unclear without reading the implementation of mergeProcessor as to whether the order in which each function is called matters or not, and if assuming it does matter, what that order needs to be.
  3. It is strange that getComposites takes the return value of getHeads, when the host object is essentially a state-machine whose primary reason to be appears to manage state produced by it's host functions.
  4. That getComposites does not return anything despite being called getFoo confuses things further, and casts more doubt onto how many side affects each function has, making me want to read the implementation of getHeads to see if it also stores state in the state-machine as well as returning mt.
  5. Partly because of the uncertainty around state-machineness and mt, and partly because it is quite dense and recursive code, I found it quite hard to be sure what getComposites was actually doing when reading it's implementation.

Overall I think mergeProcessor needs some work, although what kind of work is required I am not fully sure. Maybe removing the semi-statemachineness and some more documentation would be enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. I pushed it as is because I was tired of all the troubleshooting I did but I will improve it before merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you like the change

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, and it is very definitely an improvement, but I think mergeProcessor needs some love before this can be merged.

return nil, err
}

col := cols[0].(*collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Please document this [0]: why it is okay to do so now, why it is safe (and we are sure that there will be at least one), and why it might need to change in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks :)

internal/db/merge.go Outdated Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/refactor/dag-sync branch 3 times, most recently from a7d3b0a to 524c547 Compare May 31, 2024 18:06
Copy link
Member

@nasdf nasdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice changes! Should make future network improvements much easier.

// DeleteDocIndex deletes the index for the given document.
// WARNING: This method is only for internal use and is not supposed to be called by the client
// as it might compromise the integrity of the database. This method will be removed in the future
DeleteDocIndex(context.Context, *Document) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: nice to see these removed from the client interface

internal/db/merge.go Outdated Show resolved Hide resolved
bp.handleChildBlocks(ctx, session, block)
// Initiate a sync of the block's children
bp.wg.Add(1)
bp.handleChildBlocks(ctx, block)

return nil
}

func (bp *blockProcessor) handleChildBlocks(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: this is much nicer

net/process.go Outdated Show resolved Hide resolved
}

// If the CRDT is nil, it means the field is not part
// of the schema and we can safely ignore it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: When can this happen? The only time I can think of is if the block was created on a higher schema version with a new field, and then we would still want the block synced - I cant remember if this was fully working, or if there is a ticket for handling the datastore, but I thought the blocks would still be synced (they affect the composite cid, so I don't think it is right to drop them even if we cant yet handle the datastore side).

Copy link
Collaborator Author

@fredcarle fredcarle Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the block sync process. That has already been taken care of in the net package. Here we merge the synced blocks into the datastore. Like you said, this can only happen if the block was created on a different schema version. If the node merging doesn't know about this field, it can't merge it into the datastore.

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good Fred - thanks a load for this :) Just one more question for you before merge :)

@fredcarle fredcarle force-pushed the fredcarle/refactor/dag-sync branch 3 times, most recently from 525783d to 6cf9e4f Compare June 3, 2024 16:09
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitting now (will continue later today)

events/dag_sync.go Outdated Show resolved Hide resolved
internal/db/merge.go Outdated Show resolved Hide resolved
internal/db/merge.go Outdated Show resolved Hide resolved
Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just have couple questions and suggestions

internal/db/merge.go Outdated Show resolved Hide resolved
internal/db/merge.go Show resolved Hide resolved
net/node.go Outdated Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/refactor/dag-sync branch 2 times, most recently from f1b6cab to c5660b4 Compare June 7, 2024 00:02
@fredcarle fredcarle force-pushed the fredcarle/refactor/dag-sync branch from c5660b4 to 9a817eb Compare June 7, 2024 01:16
Copy link

codecov bot commented Jun 7, 2024

Codecov Report

Attention: Patch coverage is 69.03553% with 122 lines in your changes missing coverage. Please review.

Project coverage is 77.97%. Comparing base (a7004b2) to head (6aaeb51).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2658      +/-   ##
===========================================
- Coverage    78.06%   77.97%   -0.10%     
===========================================
  Files          308      308              
  Lines        23077    23134      +57     
===========================================
+ Hits         18015    18037      +22     
- Misses        3690     3714      +24     
- Partials      1372     1383      +11     
Flag Coverage Δ
all-tests 77.97% <69.04%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
datastore/memory/memory.go 95.43% <100.00%> (+0.06%) ⬆️
events/events.go 100.00% <ø> (ø)
http/client_collection.go 41.83% <ø> (+1.17%) ⬆️
internal/core/crdt/counter.go 67.05% <100.00%> (ø)
internal/core/crdt/lwwreg.go 85.19% <100.00%> (+7.41%) ⬆️
internal/core/key.go 84.33% <100.00%> (ø)
internal/db/collection_index.go 91.30% <100.00%> (+3.03%) ⬆️
internal/db/config.go 100.00% <100.00%> (ø)
internal/db/errors.go 64.98% <ø> (ø)
internal/merkle/clock/clock.go 66.67% <100.00%> (ø)
... and 9 more

... and 12 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7004b2...6aaeb51. Read the comment docs.

@fredcarle fredcarle merged commit 0c134e5 into sourcenetwork:develop Jun 7, 2024
36 of 37 checks passed
@fredcarle fredcarle deleted the fredcarle/refactor/dag-sync branch June 7, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/p2p Related to the p2p networking system refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move post DAG-sync merge mechanics outside the net package.
5 participants