Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PS sharding in compute #5837

Closed
wants to merge 26 commits into from
Closed

Conversation

knizhnik
Copy link
Contributor

@knizhnik knizhnik commented Nov 9, 2023

Problem

This is preliminary implementation of #5508

Summary of changes

Now compute calculates hash of buffer tag and send request to the shard.

Open issues:

  1. I am currently implemented hash in this way:
#if PG_MAJORVERSION_NUM < 16
	hash = murmurhash32(tag->rnode.spcNode);
	hash_combine(hash, murmurhash32(tag->rnode.dbNode));
	hash_combine(hash, murmurhash32(tag->rnode.relNode));
	hash_combine(hash, murmurhash32(tag->blockNum/STRIPE_SIZE));
#else
	hash = murmurhash32(tag->spcOid);
	hash_combine(hash, murmurhash32(tag->dbOid));
	hash_combine(hash, murmurhash32(tag->relNumber));
	hash_combine(hash, murmurhash32(tag->blockNum/STRIPE_SIZE));
#endif

It is not consistent with how it is now calculated in Rust.
I think we should change Rust implementation to make in Postgres compatible.

Separate question is how block number should be hashed.
There was discussion about it: #5432 (comment)

So in any case Rust and C hash implementation should remade consistent...

  1. Stripe size is currently hardcoded, should it be taken from control plane?

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@knizhnik knizhnik requested a review from a team as a code owner November 9, 2023 06:18
@knizhnik knizhnik requested review from conradludgate and removed request for a team November 9, 2023 06:18
@vadim2404 vadim2404 requested a review from jcsp November 9, 2023 07:36
@jcsp jcsp changed the title Add support for PS shardoing in compute Add support for PS sharding in compute Nov 9, 2023
@jcsp jcsp added c/cloud/compute t/feature Issue type: feature, for new features or requests labels Nov 9, 2023
pgxn/neon/pagestore_client.h Outdated Show resolved Hide resolved
pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved
@knizhnik knizhnik force-pushed the compute_sharding_support branch from 0aeba9f to 59ec475 Compare November 9, 2023 17:33
Copy link

github-actions bot commented Nov 9, 2023

2184 tests run: 2099 passed, 0 failed, 85 skipped (full report)


Flaky tests (3)

Postgres 16

Postgres 15

  • test_statvfs_pressure_usage: debug
  • test_delete_timeline_client_hangup: release

Code coverage (full report)

  • functions: 55.0% (9711 of 17672 functions)
  • lines: 82.2% (55813 of 67887 lines)

The comment gets automatically updated with the latest test results
8de5b63 at 2023-12-20T07:53:46.499Z :recycle:

pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved
@knizhnik knizhnik force-pushed the compute_sharding_support branch from eedc4d1 to c6ac0a1 Compare November 29, 2023 10:07
knizhnik pushed a commit that referenced this pull request Dec 2, 2023
@knizhnik knizhnik force-pushed the compute_sharding_support branch from 11367b2 to 1e26621 Compare December 3, 2023 06:36
jcsp pushed a commit that referenced this pull request Dec 4, 2023
jcsp pushed a commit that referenced this pull request Dec 5, 2023
@jcsp
Copy link
Collaborator

jcsp commented Dec 8, 2023

@knizhnik please can you resolve the merge conflicts.

@knizhnik knizhnik force-pushed the compute_sharding_support branch from 8a6655a to 54a0b6a Compare December 8, 2023 15:01
Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a single prefetch ring for all shards? The responses might come in different order across the pageservers, so I would assume there to be a separate prefetch ring for each shard. Connections to different pageservers will also be lost and re-established at different times.

prfh_hash *prf_hash;
prfh_hash *prf_hash;
int max_shard_no;
uint8 shard_bitmap[(MAX_SHARDS + 7)/8];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is shard_bitmap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a number of buffered prefetch requests for different shards. Before waiting for response we need two flush all pending requests. And to do it we need to know which shards are effected. This is why I am using this bitmap. Correspondent bit in bitmap is set when we send some request to the shards and cleared by flush.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a single prefetch ring for all shards? The responses might come in different order across the pageservers, so I would assume there to be a separate prefetch ring for each shard. Connections to different pageservers will also be lost and re-established at different times.

No there is still the single prefetch ring for all shards. But each prefetch slot has shard number. When we are waiting for this slot, we wait response from the particular shard. @jcsp said that we can have up to hundred of shards in future. Having separate ring for each shard in this case will be very inefficient.

pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved
pgxn/neon/pagestore_smgr.c Show resolved Hide resolved
pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved
@jcsp
Copy link
Collaborator

jcsp commented Dec 12, 2023

@knizhnik found a couple more log lines that need to be made shard-aware:

2023-12-12 12:07:14.042 GMT [231054] ERROR:  could not read block 0 in rel 1664/0/1262.0 from page server at lsn 0/014FEDE8
2023-12-12 12:07:14.042 GMT [231054] DETAIL:  page server returned error: Request routed to wrong shard

(either in this PR or a followup)

@jcsp
Copy link
Collaborator

jcsp commented Dec 12, 2023

@knizhnik please cherry pick 2f2a865 to update the hashing to match what's in the pageserver. This is the change that enables initdb with a template to work.

@knizhnik
Copy link
Contributor Author

@knizhnik please cherry pick 2f2a865 to update the hashing to match what's in the pageserver. This is the change that enables initdb with a template to work.

Done

@knizhnik
Copy link
Contributor Author

@knizhnik found a couple more log lines that need to be made shard-aware:

2023-12-12 12:07:14.042 GMT [231054] ERROR:  could not read block 0 in rel 1664/0/1262.0 from page server at lsn 0/014FEDE8
2023-12-12 12:07:14.042 GMT [231054] DETAIL:  page server returned error: Request routed to wrong shard

(either in this PR or a followup)

Fixed

@knizhnik knizhnik force-pushed the compute_sharding_support branch from f994cb1 to 322dd3c Compare December 12, 2023 13:55
@knizhnik knizhnik force-pushed the compute_sharding_support branch from 9458996 to b43ae6e Compare December 19, 2023 07:22
@vadim2404
Copy link
Contributor

@hlinnaka , @jcsp can you review it again? Because it seems that it's done

@jcsp
Copy link
Collaborator

jcsp commented Dec 19, 2023

I don't have any open threads, but I'm not the right person to green tick this -- I think @hlinnaka is the right approver for this.

@vadim2404
Copy link
Contributor

@hlinnaka , only you left 🙏

@jcsp
Copy link
Collaborator

jcsp commented Dec 19, 2023

This still isn't running CI:

Expected postgres-v14 rev to be at 'null', but it is at '03358bb0b5e0d33c238710139e768db9e75cfcc8'
Expected postgres-v15 rev to be at 'de8242c400f7870084861ac5796e0b5088b1898d', but it is at 'a2dc225ddfc8cae1849aa2316f435c58f0333d8c'
Please update vendors/revisions.json if these changes are intentional
Error: Process completed with exit code 1.

@jcsp
Copy link
Collaborator

jcsp commented Dec 19, 2023

The commit to tolerate empty pageserver strings is not making the test pass: it still hits an assertion:

TRAP: failed Assert("i > 0"), File: "/home/neon/neon//pgxn/neon/libpagestore.c", Line: 642, PID: 1407128

Please use the test branch that I DM'd you earlier today to test your changes.

@knizhnik
Copy link
Contributor Author

I failed to rebase this PR, so I have to replace it with new one: #6205

@knizhnik knizhnik closed this Dec 20, 2023
knizhnik added a commit that referenced this pull request Jan 25, 2024
refer #5508

replaces #5837

## Problem

This PR implements sharding support at compute side. Relations are
splinted in stripes and `get_page` requests are redirected to the
particular shard where stripe is located. All other requests (i.e. get
relation or database size) are always send to shard 0.

## Summary of changes

Support of sharding at compute side include three things:
1. Make it possible to specify and change in runtime connection to more
retain one page server
2. Send `get_page` request to the particular shard (determined by hash
of page key)
3. Support multiple servers in prefetch ring requests

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <[email protected]>
Co-authored-by: John Spray <[email protected]>
Co-authored-by: Heikki Linnakangas <[email protected]>
@stepashka stepashka added the c/compute Component: compute, excluding postgres itself label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute, excluding postgres itself t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants