Store issuer chains #132

phbnf · 2024-08-12T14:29:48Z

codecov-commenter · 2024-08-12T14:33:13Z

Codecov Report

Attention: Patch coverage is 11.11111% with 72 lines in your changes missing coverage. Please review.

Project coverage is 34.65%. Comparing base (46ec9c2) to head (fdd5de7).
Report is 78 commits behind head on main.

Files	Patch %	Lines
personalities/sctfe/storage/gcp/issuers.go	0.00%	51 Missing ⚠️
personalities/sctfe/storage.go	0.00%	12 Missing ⚠️
personalities/sctfe/ct_server_gcp/main.go	0.00%	7 Missing ⚠️
personalities/sctfe/handlers.go	33.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #132      +/-   ##
==========================================
- Coverage   35.80%   34.65%   -1.16%     
==========================================
  Files          16       34      +18     
  Lines        1363     2955    +1592     
==========================================
+ Hits          488     1024     +536     
- Misses        801     1820    +1019     
- Partials       74      111      +37

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

personalities/sctfe/ct_server_gcp/main.go

personalities/sctfe/storage.go

AlCutter · 2024-08-16T16:03:29Z

personalities/sctfe/storage.go

+func (cts *CTStorage) AddIssuerChain(ctx context.Context, chain []*x509.Certificate) error {
+	errG := errgroup.Group{}
+	for _, c := range chain {
+		errG.Go(func() error {


Did you consider pushing this down into the storage implementation to decide how best to store these entries? (e.g. some storage infra may do better with some sort of non-transactional batch insert as opposed to several concurrent write transactions).

May be more of an issue for fresh logs and become less so once the chain storage is close to being maximally populated.

Storage implementations can, if they want to, stage all the requests and then batch write them if they want to. Each go routine will hang until then. That would be batched across all AddChain requests though, which means that two issuers.Add for the same chain might not be in the same batch.

I could change the API to accept multiple k,v instead, but since keys can't be of type []byte, I'd need to pass a map[string][]byte, or two [][]byte that would be meant to track one another, or a map[string][]byte. Can you think of a better way of doing this?

The PR that will come after this adds a local cache. That means that issuers.Add won't necessarily be called on all the issuers. So if we modify the api to get multiple k,v I'd have to batch caching as well, and then call issuers.Add with a potential "skipped chain".

It feels to me that it would add a non trivial amount of complexity to me. What do you think?

Storage implementations can, if they want to, stage all the requests and then batch write them if they want to. Each go routine will hang until then. That would be batched across all AddChain requests though, which means that two issuers.Add for the same chain might not be in the same batch.

I agree that storage implementations should be free to batch up writes if that's what best for them, but your proposal here is a multiplier for the goroutines which are serving the HTTP requests these certs came from - i.e. if the queue is configured to flush at 1000 entries, there will be 1000 goroutines parked on the futures, if each of those chains has 3 entries then with this change we'll have 1000+3*1000 parked goroutines.

Obvs that's fine if that really is the best way for the storage impl to handle the calls to Set, but if we're saying that storage impls which would prefer to use a native "batch" write have to then block and "fan-in" all those extra 3000 goroutines in order to get back to just having a slice of entries - that seems like we've let an implementation detail of "lots of goroutines" type storage types leak out into this higher level.

Feels like the "fan-out" should be done by the storage implementation iff that's what's needed?

I could change the API to accept multiple k,v instead, but since keys can't be of type []byte, I'd need to pass a map[string][]byte, or two [][]byte that would be meant to track one another, or a map[string][]byte. Can you think of a better way of doing this?

Hmm, could you pass a []struct{K []byte, V []byte}?

As you say, these would need to be independently added rather than transactional (i.e. I think that means the semantics of the API becomes a bit more AddTheseIntermediates() rather than AddIssuerChain()), both for local caching reasons, but also because distributed systems in the case where there's a central shared intermediates cache.

Yeah - I wanted to avoid defining a new struct because it leads to more dependencies. Go lacks tuples :/
I had a go at it, (still need to add comments and co), but have a look at let me know what you think. I could go even further an, leave it to the storage implementation to do the "Exists" calls.

While I'm starting with the SCTFE here, I'm thinking of making deduplication a feature that other personalities can use. That probably means that we'll need to have a base "personalitystorage" package, where KV will be defined?

I think it's either this, or a map[string]string / map[string][]byte, and we do some type conversion.

I think it's probably ok for the storage impls to take a dep on a struct here - they have to meet the implicit interface requirement specific here anyway.

I think having the storage do the Exists inline with the Add would be good, they might be able to optimize that away - e.g. GCS might do a single WriteIfNotExists and save a round-trip, etc.

For now, I'd keep the chain storage and dedup storages separate and have one do []byte -> []byte and the other do []byte -> uint64 natively, rather than prematurely push them together just yet.

I think having the storage do the Exists inline with the Add would be good, they might be able to optimize that away - e.g. GCS might do a single WriteIfNotExists and save a round-trip, etc.

Maybe, yes. Not 100% sure since write are more expensive than reads, and eventually the list issuers will stop growing, so it will be cheaper to check for existence first. But also, caching should make all of this even cheaper. There's already a TODO somewhere to evaluate this. I've also left a TODO for parallel write, let's keep them out for now. I've pushed it down to the storage layer.

For now, I'd keep the chain storage and dedup storages separate and have one do []byte -> []byte and the other do []byte -> uint64 natively, rather than prematurely push them together just yet.

In practice, I had written a wrapper around map.go to handle the conversion: https://github.com/phbnf/trillian-tessera/blob/dedup/personalities/sctfe/storage/gcp/dedup.go. Honestly, I'm not even super attached to this dedup code on GCS, I don't think any CT log will use it, it doesn't scale price-wise. It might be useful for a test personality though or other applications once deduplication becomes a "standard" personality plugin.

personalities/sctfe/storage/gcp/map.go

Need to initialize the writer before setting the content type

phbnf force-pushed the issuernoreorg branch 2 times, most recently from 3c45769 to bf979c6 Compare August 14, 2024 12:54

phbnf requested a review from AlCutter August 14, 2024 12:56

phbnf marked this pull request as ready for review August 14, 2024 13:05

phbnf force-pushed the issuernoreorg branch 2 times, most recently from 2944928 to 1b5388a Compare August 15, 2024 15:38

AlCutter reviewed Aug 16, 2024

View reviewed changes

phbnf added 12 commits August 19, 2024 16:43

Fix contenttype

fa1154b

Need to initialize the writer before setting the content type

implement a map on GCS

810e51d

add AddIssuerChain to the SCTFE Storage interface

5abff2b

connect issuer storage service to sctfe

df4a829

update mock_ct_storage

fa6a543

store the issuer chain

2ca97c2

add content type

1b7de20

add calls to AddIssuerChain to tests

3d71063

typo fix

2d97d9d

change interface for []byte instead of [32]byte

1c1b021

Typo fixes

e8efb46

fix key encoding

12c2b32

phbnf force-pushed the issuernoreorg branch from 20247fb to 12c2b32 Compare August 19, 2024 16:45

phbnf assigned AlCutter Aug 20, 2024

phbnf added 3 commits August 21, 2024 10:58

Multiple writes at a time

4f849ef

Rename and push down the Exists logic

6b3ebe3

fix a few strings

fdd5de7

AlCutter approved these changes Aug 21, 2024

View reviewed changes

phbnf merged commit 5423c83 into transparency-dev:main Aug 22, 2024
8 checks passed

phbnf mentioned this pull request Aug 29, 2024

CT static API personality #88

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store issuer chains #132

Store issuer chains #132

phbnf commented Aug 12, 2024 •

edited

Loading

codecov-commenter commented Aug 12, 2024 •

edited

Loading

AlCutter Aug 16, 2024

phbnf Aug 19, 2024

AlCutter Aug 20, 2024

phbnf Aug 21, 2024

phbnf Aug 21, 2024

AlCutter Aug 21, 2024

phbnf Aug 21, 2024 •

edited

Loading

Store issuer chains #132

Store issuer chains #132

Conversation

phbnf commented Aug 12, 2024 • edited Loading

codecov-commenter commented Aug 12, 2024 • edited Loading

Codecov Report

AlCutter Aug 16, 2024

Choose a reason for hiding this comment

phbnf Aug 19, 2024

Choose a reason for hiding this comment

AlCutter Aug 20, 2024

Choose a reason for hiding this comment

phbnf Aug 21, 2024

Choose a reason for hiding this comment

phbnf Aug 21, 2024

Choose a reason for hiding this comment

AlCutter Aug 21, 2024

Choose a reason for hiding this comment

phbnf Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

phbnf commented Aug 12, 2024 •

edited

Loading

codecov-commenter commented Aug 12, 2024 •

edited

Loading

phbnf Aug 21, 2024 •

edited

Loading