[DataApi] Get BlobCount By AccountID #541

siddimore · 2024-05-03T02:48:24Z

Why are these changes needed?

These changes enable querying blob count by AccountId

Changes

Add GSI AccountId
Add method to GetBlobMetadataCount by AccountId
Assign accountKey to accountId in BlobAuthHeader.
Update DataApiHandler to getBlobMetadataCount by AccountId
Define DataApi Endpt for getting BlobCount by AccountId

Checks

I've made sure the lint is passing in this PR.
I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

ian-shim

Overall looks good to me.
Is the primary purpose of adding the new field and index to support count query?
I feel like getting the actual metadata might be more useful. Maybe we can support that later?

disperser/dataapi/server.go

dmanc · 2024-05-08T00:08:57Z

disperser/common/inmem/store.go

+	q.mu.RLock()
+	defer q.mu.RUnlock()
+	count := int32(0)
+	for _, meta := range q.Metadata {


This for loop can potentially be really large right?

true but this is only for testing purpose.....

siddimore · 2024-05-08T05:08:24Z

Overall looks good to me. Is the primary purpose of adding the new field and index to support count query? I feel like getting the actual metadata might be more useful. Maybe we can support that later?

@ian-shim yes that is correct this is a stop gap solution to get blobCount by AccountID. I have suggested couple more approaches in a document that make it more real-time based on DynamodB streams

mooselumph · 2024-05-08T21:50:15Z

disperser/common/blobstore/blob_metadata_store.go

+// GetBlobMetadataByAccount Count returns the count of all the metadata with the given status
+// Because this function scans the entire index, it should only be used for status with a limited number of items.
+// It should only be used to filter "Processing" status. To support other status, a streaming version should be implemented.
+func (s *BlobMetadataStore) GetBlobMetadataCountByAccountID(ctx context.Context, accountID core.AccountID) (int32, error) {


We probably want the total amount of data rather than the total number of blobs. Or perhaps both.

Let's do another sync on the exact use case for this work!

@mooselumph i think if we want total amount of data everytime than better approach is to have a lambda invoke on DynamodB stream.....which only processes INSERT_EVENT and can be used to update:

Amount of Data

Count of blobs

for now temporarily i will just update Batcher to update amount of data after Blob is confirmed and increment count

anupsv · 2024-05-09T19:07:06Z

disperser/apiserver/server.go

+		// Update AccountID to accountKey
+		// This is a combination of origin and authenticatedAddress
+		// AccountId is later used to track blobs sent by the same account
+		blob.RequestHeader.BlobAuthHeader.AccountID = accountKey


This might need to change depending on if the authenticated endpoint is being used right ?

anupsv · 2024-05-09T19:12:12Z

disperser/dataapi/server.go

@@ -244,6 +250,7 @@ func (s *server) Start() error {
 		{
 			feed.GET("/blobs", s.FetchBlobsHandler)
 			feed.GET("/blobs/:blob_key", s.FetchBlobHandler)
+			feed.GET("/blobs/count/:accountId", s.FetchBlobCountByAccountIdHandler)


Should definitely cache these endpoints.

Assign acconuntKey to accountId

0562b5a

siddimore requested review from mooselumph and ian-shim May 3, 2024 02:48

Add GetBlobMetadataCountByCountId

8e6c7d0

siddimore marked this pull request as draft May 5, 2024 22:01

Siddharth More added 5 commits May 5, 2024 15:26

fix test

e655eb2

Fix test failure

badbd96

Fix SharedStorage Test

6efc495

update docs

6f33481

fix test

99c05b0

siddimore marked this pull request as ready for review May 6, 2024 00:43

siddimore requested a review from dmanc May 6, 2024 04:43

siddimore changed the title ~~[Disperser] Assign acconuntKey to accountId~~ [DataApi] Get BlobCount By AccountID May 6, 2024

siddimore requested review from jianoaix and pschork May 7, 2024 01:18

ian-shim reviewed May 7, 2024

View reviewed changes

dmanc reviewed May 8, 2024

View reviewed changes

fix pr comment

bde7ab2

mooselumph reviewed May 8, 2024

View reviewed changes

Fix Comment

21440ec

anupsv reviewed May 9, 2024

View reviewed changes

dmanc mentioned this pull request Jun 7, 2024

Use accountKey as AccountID #602

Merged

5 tasks

pschork closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataApi] Get BlobCount By AccountID #541

[DataApi] Get BlobCount By AccountID #541

siddimore commented May 3, 2024 •

edited

Loading

ian-shim left a comment

dmanc May 8, 2024

siddimore May 8, 2024 •

edited

Loading

siddimore commented May 8, 2024

mooselumph May 8, 2024

siddimore May 9, 2024

siddimore May 14, 2024

anupsv May 9, 2024

anupsv May 9, 2024

[DataApi] Get BlobCount By AccountID #541

[DataApi] Get BlobCount By AccountID #541

Conversation

siddimore commented May 3, 2024 • edited Loading

Why are these changes needed?

Changes

Checks

ian-shim left a comment

Choose a reason for hiding this comment

dmanc May 8, 2024

Choose a reason for hiding this comment

siddimore May 8, 2024 • edited Loading

Choose a reason for hiding this comment

siddimore commented May 8, 2024

mooselumph May 8, 2024

Choose a reason for hiding this comment

siddimore May 9, 2024

Choose a reason for hiding this comment

siddimore May 14, 2024

Choose a reason for hiding this comment

anupsv May 9, 2024

Choose a reason for hiding this comment

anupsv May 9, 2024

Choose a reason for hiding this comment

siddimore commented May 3, 2024 •

edited

Loading

siddimore May 8, 2024 •

edited

Loading