Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: detach ancestor should copy files from unshard portion of the timeline #9667

Closed
skyzh opened this issue Nov 6, 2024 · 0 comments · Fixed by #9669
Closed

pageserver: detach ancestor should copy files from unshard portion of the timeline #9667

skyzh opened this issue Nov 6, 2024 · 0 comments · Fixed by #9669
Labels
t/bug Issue Type: Bug

Comments

@skyzh
Copy link
Member

skyzh commented Nov 6, 2024

Description

See INC-320 (slack channel).

Steps to reproduce

  1. have some writes to a timeline that creates some layer files
  2. do a shard split, keeping references to those layer files
  3. create a branch (either before or after shard split, it doesn't matter)
  4. do a detach ancestor for that child branch

Expected result

no errors

Actual result

There will be errors in the detach ancestor step that copies layers in S3. With a test environment that uses LocalFs (reproducer from #9669):

WARN request{method=PUT path=/v1/tenant/a10c21187d817bd102ed45865d27d4d8-0304/timeline/30a22fa5bb9b4057597733fedc7ea2a3/detach_ancestor request_id=acce4b63-bba1-4cee-bfc1-09f3c20a74c5}:detach_ancestor{tenant_id=a10c21187d817bd102ed45865d27d4d8 shard_id=0304 timeline_id=30a22fa5bb9b4057597733fedc7ea2a3}: copy timeline layer failed, will retry (attempt 6): copy layer tenants/a10c21187d817bd102ed45865d27d4d8-0304/timelines/433ed006163e534f6cc8bf2aa2f4d549/000000067F00000001000004E70000000020-000000067F00000001000004E70000000030__00000000014F2918-00000001 to tenants/a10c21187d817bd102ed45865d27d4d8-0304/timelines/30a22fa5bb9b4057597733fedc7ea2a3/000000067F00000001000004E70000000020-000000067F00000001000004E70000000030__00000000014F2918-00000001: Failed to copy file [...]: No such file or directory (os error 2)

With S3, The "Failed to copy file" part is like service error: unhandled error (NoSuchKey): Error { code: "NoSuchKey", message: "The specified key does not exist.", [...] }.

@skyzh skyzh added the t/bug Issue Type: Bug label Nov 6, 2024
arpad-m added a commit that referenced this issue Nov 7, 2024
…9669)

We need to use the shard associated with the layer file, not the shard
associated with our current tenant shard ID.

Due to shard splits, the shard IDs can refer to older files.

close #9667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant