Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert reference counted resources to a new Ref API #417

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

dcoutts
Copy link
Collaborator

@dcoutts dcoutts commented Oct 4, 2024

Instead of a referencing counting API, where we must increment and decrement reference counts correctly, we have an API organised around a singular Ref. The rules for Ref are simple and can be dynamically checked (in test/debug builds): release exactly once, do not use after release. And refs can be duplicated to give new refs (instead of incrementing reference counts). Under the hood it's the same reference counting operations.

@dcoutts dcoutts force-pushed the dcoutts/refcount-vs-reference branch 2 times, most recently from 30aa13a to 0f240ba Compare October 28, 2024 10:13
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following are just a few comments that came up while scanning the PR. I'll let @mheinzel and you discuss the details of the approach, if that's okay

Overall, I think the sketch could benefit from a small example (with instances for HasFinaliser and HasSharedFinaliser), so that it is clear how all the moving parts interact

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
Copy link
Collaborator

@mheinzel mheinzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

The debugging/assertion functionality could still be improved, I think. Would an error from assertNotReleased currently tell you much about which (type of) resource wasn't released? Also, we don't find out yet if some Ref just never gets closed, right? For that we'd need some global state or rely on GHC finalisers, as we discussed at some point.

The API seems sensible, though, which I think is the main point of the PR.

I also left some comments on minor details.

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
@dcoutts dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 0f240ba to 9f6e3e4 Compare November 21, 2024 11:33
@dcoutts dcoutts marked this pull request as ready for review November 21, 2024 11:34
@dcoutts dcoutts changed the title WIP: sketch out a reference API Convert reference counted resources to a new Ref API Nov 21, 2024
@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 21, 2024

I think the existing review comments have been addressed by the various FIXUP patches.

Obviously, expecting plenty more from a full review of the no-longer-draft PR.

🤞 on CI now passing on all ghc versions.

@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 22, 2024

Demo:

Consider this bug:

--- a/src/Database/LSMTree/Internal/MergeSchedule.hs
+++ b/src/Database/LSMTree/Internal/MergeSchedule.hs
@@ -1315,7 +1315,7 @@ expectCompletedMerge reg (Merging (mr@(DeRef MergingRun {..}))) = do
     -- return a fresh reference to the run
     r' <- allocateTemp reg (dupRef r) releaseRef
     freeTemp reg (releaseRef mr)
-    pure r'
+    pure r -- oops! should be returning r', now r' is forgotten

Can we detect it in tests? Yes!

  Normal.StateMachine
    propLockstep_RealImpl_MockFS_IO: FAIL (0.49s)
      *** Failed! (after 14 tests and 1 shrink):
      Exception:
        RefNeverReleased RefId 233
        Allocation site: CallStack (from HasCallStack):
          dupRef, called at src/Database/LSMTree/Internal/MergeSchedule.hs:1316:29 in lsm-tree-0.1.0.0-inplace:Database.LSMTree.Internal.MergeSchedule
      do pure ()
      Use --quickcheck-replay="(SMGen 11928930769735925559 10864872194820762253,13)" to reproduce.

and it pinpoints the exact line where the reference that we forgot is allocated.

Furthermore, there were actually two references forgotten here, the one from dupRef, but also an inner ref to a blob file. Identifying the latter would not be useful, since it was the outer ref being forgotten that was the problem. So bonus points here for identifying the outer reference. We do this by reporting the last to be allocated, since outer references are allocated after inner ones.

@dcoutts dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 2a052da to 2ffd6b8 Compare November 22, 2024 12:17
Copy link
Collaborator

@mheinzel mheinzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just minor comments.

src-control/Control/RefCount.hs Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src/Database/LSMTree/Internal/MergeSchedule.hs Outdated Show resolved Hide resolved
src/Database/LSMTree/Common.hs Show resolved Hide resolved
test-control/Test/Control/RefCount.hs Outdated Show resolved Hide resolved
test/Main.hs Show resolved Hide resolved
test/Test/Database/LSMTree/Internal/Merge.hs Outdated Show resolved Hide resolved
@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 25, 2024

@mheinzel thanks muchly for the detailed review!

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that we can now check references dynamically!

I'm suggesting some changes, but nothing fundamental

I included style nitpicks, but you don't necessarily have to apply those suggestions. I think it's still useful to post the nitpicks, if only to help us understand each other's different styles of coding

src/Database/LSMTree/Internal/Run.hs Show resolved Hide resolved
src/Database/LSMTree/Common.hs Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src-control/Control/RefCount.hs Outdated Show resolved Hide resolved
src/Database/LSMTree/Internal/BlobRef.hs Show resolved Hide resolved
src/Database/LSMTree/Internal/Snapshot.hs Show resolved Hide resolved
test/Database/LSMTree/Class/Normal.hs Outdated Show resolved Hide resolved
test/Test/Database/LSMTree/Internal/Merge.hs Show resolved Hide resolved
@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 27, 2024

@jorisdral Thanks for the detailed review. Still going through it all. I mostly agree with all the suggestions.

All use cases provide a finaliser. The new API will require it.
Pass just the necesary fields to the finaliser, rather than passing the
whole Run (which creates an apparent recursive knot).
We already do singular reference counting, so these functions were
already largely unused. Last uses were mkRefCounterN with (RefCount 1)
and similar, or tests for the functionality.

In particular remove the RefCount arg to Run.fromMutable, which was
always being used with (RefCount 1).

Remove RefCount type, and demote readRefCount to a test-only API,
returning Int.
Instead the RunReader just needs to retain various bits from the Run. In
particular the only thing it needs to retain (dupRef) is the BlobFile,
and a bit of caching config.

There's a bunch of class constraints that propagate and add a bit of
noise to this patch.

This will simplify matters once the Run is converted to the Ref API.
@dcoutts dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 2ffd6b8 to 2483272 Compare November 28, 2024 12:45
Copy link
Collaborator

@mheinzel mheinzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the only point I'm not really sure about is how we want to handle #417 (comment), but I think either way is okay.

@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 29, 2024

the only point I'm not really sure about is how we want to handle #417 (comment), but I think either way is okay.

Yeah. I added some TODOs about this. I think we should look at it in a follow-up.

When we switch Run to use the Ref API, these access functions will avoid
having to make changes at all use sites (using DeRef).
Make NumEntries a monoid for addition.
@dcoutts dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 4e23829 to 4e2998f Compare November 30, 2024 00:17
This is centered around the notion of singular references, as opposed
to actions to manage a reference count.

Under the hood, a (Ref obj) does manage a reference count within the
obj, but each reference is a single thing. So instead of operations to
increment and decrement reference counts, we have functions to create a
Ref, duplicate a new Ref and release a Ref.

The weak reference API is given similar treatment.
Including the mechanism to check that:
* Refs are not freed more than once
* Refs are not used after being freed
* Refs are not freed less than once
Done in two parts:
1. generally, for all tests, but exceptions ma be thrown after the test
2. specifically for the state machine tests, properly integrated
   so that failure reporting and shrinking work.

Doing 1. properly for all tests would be quite an effort, because it
needs setup and teardown for each test run.

Without proper teardown for each test run, exceptions from refs that are
forgotten before being released are thrown in any subsequent Ref
operation, and finally in checkForgottenRefs at the very end of the
testsuite. So this will reliably catch the errors, but will not identify
where they come from (not even which test!).
A BlobRef references a BlobFile, specifically:
* RawBlobRef still directly uses a (BlobFile m h)
* StrongBlobRef now uses a (Ref (BlobFile m h))
* WeakBlobRef now uses a (WeakRef (BlobFile m h))

This representation change helps to clarify the implementation of the
various BlobRef functions.

Run and WriteBufferBlobs also contain a BlobFile, now a Ref BlobFile.
In particular, WBB.removeReference is replaced by releaseRef, and
WBB.addReference by dupRef.

The TableContent and Cursor contain a WriteBufferBlobs, which becomes
a Ref WriteBufferBlobs.
Most of the changes are just replacing Run m h with Ref (Run m h) all
over the place, and replacing Run.removeReference with releaseRef.

The slightly more interesting changes are in the modules Run and
MergeSchedule, where we have to change the style of duplication from
incrementing reference countsto returning new references.
Now that it is no longer used.

The low level API is also not used except in tests. These could perhaps
be adapted to use the high level API only.
It's the only place it is used now.
And add a TODO about how to do it more cleanly.
In particular the rules for RunReader, and Readers and Curors with
BlobRef validity is much more subtle and complex than it should be.
Document what it is, but add some TODOs to simplify it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants