Convert reference counted resources to a new Ref API #417

dcoutts · 2024-10-04T14:43:15Z

Instead of a referencing counting API, where we must increment and decrement reference counts correctly, we have an API organised around a singular Ref. The rules for Ref are simple and can be dynamically checked (in test/debug builds): release exactly once, do not use after release. And refs can be duplicated to give new refs (instead of incrementing reference counts). Under the hood it's the same reference counting operations.

jorisdral

The following are just a few comments that came up while scanning the PR. I'll let @mheinzel and you discuss the details of the approach, if that's okay

Overall, I think the sketch could benefit from a small example (with instances for HasFinaliser and HasSharedFinaliser), so that it is clear how all the moving parts interact

src-control/Control/RefCount.hs

mheinzel

Looks nice!

The debugging/assertion functionality could still be improved, I think. Would an error from assertNotReleased currently tell you much about which (type of) resource wasn't released? Also, we don't find out yet if some Ref just never gets closed, right? For that we'd need some global state or rely on GHC finalisers, as we discussed at some point.

The API seems sensible, though, which I think is the main point of the PR.

I also left some comments on minor details.

src-control/Control/RefCount.hs

dcoutts · 2024-11-21T17:14:22Z

I think the existing review comments have been addressed by the various FIXUP patches.

Obviously, expecting plenty more from a full review of the no-longer-draft PR.

🤞 on CI now passing on all ghc versions.

dcoutts · 2024-11-22T11:12:29Z

Demo:

Consider this bug:

--- a/src/Database/LSMTree/Internal/MergeSchedule.hs
+++ b/src/Database/LSMTree/Internal/MergeSchedule.hs
@@ -1315,7 +1315,7 @@ expectCompletedMerge reg (Merging (mr@(DeRef MergingRun {..}))) = do
     -- return a fresh reference to the run
     r' <- allocateTemp reg (dupRef r) releaseRef
     freeTemp reg (releaseRef mr)
-    pure r'
+    pure r -- oops! should be returning r', now r' is forgotten

Can we detect it in tests? Yes!

  Normal.StateMachine
    propLockstep_RealImpl_MockFS_IO: FAIL (0.49s)
      *** Failed! (after 14 tests and 1 shrink):
      Exception:
        RefNeverReleased RefId 233
        Allocation site: CallStack (from HasCallStack):
          dupRef, called at src/Database/LSMTree/Internal/MergeSchedule.hs:1316:29 in lsm-tree-0.1.0.0-inplace:Database.LSMTree.Internal.MergeSchedule
      do pure ()
      Use --quickcheck-replay="(SMGen 11928930769735925559 10864872194820762253,13)" to reproduce.

and it pinpoints the exact line where the reference that we forgot is allocated.

Furthermore, there were actually two references forgotten here, the one from dupRef, but also an inner ref to a blob file. Identifying the latter would not be useful, since it was the outer ref being forgotten that was the problem. So bonus points here for identifying the outer reference. We do this by reporting the last to be allocated, since outer references are allocated after inner ones.

mheinzel

Looks great, just minor comments.

src-control/Control/RefCount.hs

src/Database/LSMTree/Internal/MergeSchedule.hs

src/Database/LSMTree/Common.hs

test-control/Test/Control/RefCount.hs

test/Main.hs

test/Test/Database/LSMTree/Internal/Merge.hs

dcoutts · 2024-11-25T14:11:16Z

@mheinzel thanks muchly for the detailed review!

jorisdral

Very nice that we can now check references dynamically!

I'm suggesting some changes, but nothing fundamental

I included style nitpicks, but you don't necessarily have to apply those suggestions. I think it's still useful to post the nitpicks, if only to help us understand each other's different styles of coding

src/Database/LSMTree/Internal/Run.hs

src/Database/LSMTree/Common.hs

src-control/Control/RefCount.hs

src/Database/LSMTree/Internal/BlobRef.hs

src/Database/LSMTree/Internal/Snapshot.hs

test/Database/LSMTree/Class/Normal.hs

test/Test/Database/LSMTree/Normal/StateMachine.hs

test/Test/Database/LSMTree/Internal/Merge.hs

dcoutts · 2024-11-27T16:48:42Z

@jorisdral Thanks for the detailed review. Still going through it all. I mostly agree with all the suggestions.

All use cases provide a finaliser. The new API will require it.

Pass just the necesary fields to the finaliser, rather than passing the whole Run (which creates an apparent recursive knot).

We already do singular reference counting, so these functions were already largely unused. Last uses were mkRefCounterN with (RefCount 1) and similar, or tests for the functionality. In particular remove the RefCount arg to Run.fromMutable, which was always being used with (RefCount 1). Remove RefCount type, and demote readRefCount to a test-only API, returning Int.

Instead the RunReader just needs to retain various bits from the Run. In particular the only thing it needs to retain (dupRef) is the BlobFile, and a bit of caching config. There's a bunch of class constraints that propagate and add a bit of noise to this patch. This will simplify matters once the Run is converted to the Ref API.

src-control/Control/RefCount.hs

mheinzel

Looks good, the only point I'm not really sure about is how we want to handle #417 (comment), but I think either way is okay.

dcoutts · 2024-11-29T16:23:35Z

the only point I'm not really sure about is how we want to handle #417 (comment), but I think either way is okay.

Yeah. I added some TODOs about this. I think we should look at it in a follow-up.

When we switch Run to use the Ref API, these access functions will avoid having to make changes at all use sites (using DeRef).

Make NumEntries a monoid for addition.

This is centered around the notion of singular references, as opposed to actions to manage a reference count. Under the hood, a (Ref obj) does manage a reference count within the obj, but each reference is a single thing. So instead of operations to increment and decrement reference counts, we have functions to create a Ref, duplicate a new Ref and release a Ref. The weak reference API is given similar treatment.

Including the mechanism to check that: * Refs are not freed more than once * Refs are not used after being freed * Refs are not freed less than once

Done in two parts: 1. generally, for all tests, but exceptions ma be thrown after the test 2. specifically for the state machine tests, properly integrated so that failure reporting and shrinking work. Doing 1. properly for all tests would be quite an effort, because it needs setup and teardown for each test run. Without proper teardown for each test run, exceptions from refs that are forgotten before being released are thrown in any subsequent Ref operation, and finally in checkForgottenRefs at the very end of the testsuite. So this will reliably catch the errors, but will not identify where they come from (not even which test!).

A BlobRef references a BlobFile, specifically: * RawBlobRef still directly uses a (BlobFile m h) * StrongBlobRef now uses a (Ref (BlobFile m h)) * WeakBlobRef now uses a (WeakRef (BlobFile m h)) This representation change helps to clarify the implementation of the various BlobRef functions. Run and WriteBufferBlobs also contain a BlobFile, now a Ref BlobFile.

In particular, WBB.removeReference is replaced by releaseRef, and WBB.addReference by dupRef. The TableContent and Cursor contain a WriteBufferBlobs, which becomes a Ref WriteBufferBlobs.

Most of the changes are just replacing Run m h with Ref (Run m h) all over the place, and replacing Run.removeReference with releaseRef. The slightly more interesting changes are in the modules Run and MergeSchedule, where we have to change the style of duplication from incrementing reference countsto returning new references.

Now that it is no longer used. The low level API is also not used except in tests. These could perhaps be adapted to use the high level API only.

It's the only place it is used now.

And add a TODO about how to do it more cleanly.

In particular the rules for RunReader, and Readers and Curors with BlobRef validity is much more subtle and complex than it should be. Document what it is, but add some TODOs to simplify it.

dcoutts force-pushed the dcoutts/refcount-vs-reference branch 2 times, most recently from 30aa13a to 0f240ba Compare October 28, 2024 10:13

jorisdral reviewed Oct 28, 2024

View reviewed changes

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved

mheinzel reviewed Oct 28, 2024

View reviewed changes

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 0f240ba to 9f6e3e4 Compare November 21, 2024 11:33

dcoutts marked this pull request as ready for review November 21, 2024 11:34

dcoutts requested review from recursion-ninja and wenkokke as code owners November 21, 2024 11:34

dcoutts changed the title ~~WIP: sketch out a reference API~~ Convert reference counted resources to a new Ref API Nov 21, 2024

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 2a052da to 2ffd6b8 Compare November 22, 2024 12:17

mheinzel reviewed Nov 25, 2024

View reviewed changes

jorisdral requested changes Nov 26, 2024

View reviewed changes

dcoutts added 4 commits November 28, 2024 12:43

Make ref counter finaliser manditory, not optional

8c5ebaa

All use cases provide a finaliser. The new API will require it.

Eliminate use of RecursiveDo in Run construction

041b816

Pass just the necesary fields to the finaliser, rather than passing the whole Run (which creates an apparent recursive knot).

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 2ffd6b8 to 2483272 Compare November 28, 2024 12:45

jorisdral reviewed Nov 28, 2024

View reviewed changes

src-control/Control/RefCount.hs Outdated Show resolved Hide resolved

jorisdral approved these changes Nov 28, 2024

View reviewed changes

jorisdral mentioned this pull request Nov 28, 2024

Revisit snapshot labels #473

Open

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 2483272 to 41861b2 Compare November 28, 2024 16:52

mheinzel approved these changes Nov 29, 2024

View reviewed changes

dcoutts added 3 commits November 30, 2024 00:16

Drop unused MonadFix class constraints everywhere

92d4637

Introduce a few Run access functions to simplify later refactoring

59082b7

When we switch Run to use the Ref API, these access functions will avoid having to make changes at all use sites (using DeRef).

Minor refactor: simplify summing run sizes

e825bf9

Make NumEntries a monoid for addition.

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 4e23829 to 4e2998f Compare November 30, 2024 00:17

dcoutts added 12 commits December 1, 2024 01:45

Add tests for the new Ref API

9866986

Including the mechanism to check that: * Refs are not freed more than once * Refs are not used after being freed * Refs are not freed less than once

Convert WriteBufferBlobs to Ref style, and update users

ebf3908

In particular, WBB.removeReference is replaced by releaseRef, and WBB.addReference by dupRef. The TableContent and Cursor contain a WriteBufferBlobs, which becomes a Ref WriteBufferBlobs.

Add a TODO in RunReader

6123e93

Remove the old API for RefCount

f736a36

Now that it is no longer used. The low level API is also not used except in tests. These could perhaps be adapted to use the high level API only.

Move readRefCount out of RefCount API and into tests

5a1bf0b

It's the only place it is used now.

Document BlobRef validity strategy for Cursors

bd3fcae

And add a TODO about how to do it more cleanly.

Update various API docs that talk about reference counting.

098910e

In particular the rules for RunReader, and Readers and Curors with BlobRef validity is much more subtle and complex than it should be. Document what it is, but add some TODOs to simplify it.

Add more strictness to resolve NoThunks test failures

be3770e

dcoutts force-pushed the dcoutts/refcount-vs-reference branch from 4e2998f to be3770e Compare December 1, 2024 01:47

dcoutts enabled auto-merge December 1, 2024 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert reference counted resources to a new Ref API #417

Convert reference counted resources to a new Ref API #417

dcoutts commented Oct 4, 2024 •

edited

Loading

jorisdral left a comment •

edited

Loading

mheinzel left a comment •

edited

Loading

dcoutts commented Nov 21, 2024

dcoutts commented Nov 22, 2024

mheinzel left a comment

dcoutts commented Nov 25, 2024

jorisdral left a comment •

edited

Loading

dcoutts commented Nov 27, 2024

mheinzel left a comment

dcoutts commented Nov 29, 2024

Convert reference counted resources to a new Ref API #417

Are you sure you want to change the base?

Convert reference counted resources to a new Ref API #417

Conversation

dcoutts commented Oct 4, 2024 • edited Loading

jorisdral left a comment • edited Loading

Choose a reason for hiding this comment

mheinzel left a comment • edited Loading

Choose a reason for hiding this comment

dcoutts commented Nov 21, 2024

dcoutts commented Nov 22, 2024

mheinzel left a comment

Choose a reason for hiding this comment

dcoutts commented Nov 25, 2024

jorisdral left a comment • edited Loading

Choose a reason for hiding this comment

dcoutts commented Nov 27, 2024

mheinzel left a comment

Choose a reason for hiding this comment

dcoutts commented Nov 29, 2024

dcoutts commented Oct 4, 2024 •

edited

Loading

jorisdral left a comment •

edited

Loading

mheinzel left a comment •

edited

Loading

jorisdral left a comment •

edited

Loading