Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7665] Support upgrade downgrade to or from table version 8 #12327

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

codope
Copy link
Member

@codope codope commented Nov 24, 2024

Change Logs

  • Migrating table properties including partition fields, key generators, payload type, bootstrap index type. Handling both upgrade and downgrade
  • Migrating timeline and timeline history to new layout
  • Full compact the table, to get rid of log files. Both in case of upgrade and downgrade.
  • Some tests for above.

Impact

Support upgrade downgrade to or from table version 8

Risk level (write none, low medium or high below)

high

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 24, 2024
@@ -153,6 +153,21 @@ public int archiveIfRequired(HoodieEngineContext context, boolean acquireLock) t
txnManager.beginTransaction(Option.empty(), Option.empty());
}
List<HoodieInstant> instantsToArchive = getInstantsToArchive().collect(Collectors.toList());
return archiveInstants(context, instantsToArchive, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be locking twice in this code-path. Can we instead introduce a private method which archives a given list of instants and the locking/unlocking managed in the public methods.

Copy link
Contributor

@danny0405 danny0405 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similiar concerns, the multiple-level falgs for lock control is error-prone, we should avoid that as much as possible, and why we need a lock for upgrade/downgrade then, I don't think there is concurrency here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have refactored. Not taking lock here.

@@ -58,17 +66,31 @@ public class SevenToEightUpgradeHandler implements UpgradeHandler {
@Override
public Map<ConfigProperty, String> upgrade(HoodieWriteConfig config, HoodieEngineContext context,
String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
config.setValue(HoodieWriteConfig.WRITE_TABLE_VERSION, String.valueOf(HoodieTableVersion.SIX.versionCode()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set up this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that readers can continue reading at this time

};
try {
HoodieTimelineArchiver archiver = TimelineArchivers.getInstance(TimelineLayoutVersion.LAYOUT_VERSION_2, config, table);
archiver.archiveInstants(engineContext, archivedTimeline.getInstants(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can read the legady archived instants in one shot. The right manner is reading them in mini-batchs then flush as the new LSM timeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And be caution that the legacy archived instants can be large amount, just reading the latest log should be enough for the new file slicing.

Copy link
Contributor

@danny0405 danny0405 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read the legacy archived instants as a iterator and flush as a accumulated mini-batch. ArchivedTimelineLoader.loadInstants should be used instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed. Now, using LegacyArchivedMetaEntryReader and LSMTimelineWriter directly. Please check latest commit.

timelineLayoutVersion -> ValidationUtils.checkState(TimelineLayoutVersion.LAYOUT_VERSION_2.equals(timelineLayoutVersion),
"Downgrade from LSM timeline is only supported for layout version 2. Given version: " + timelineLayoutVersion));
HoodieArchivedTimeline lsmArchivedTimeline = table.getMetaClient().getArchivedTimeline();
if (lsmArchivedTimeline.getInstants().isEmpty()) {
Copy link
Contributor

@danny0405 danny0405 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read the LSM archived instants as a iterator and flush as a accumulated mini-batch. ArchivedTimelineLoader.loadInstants should be used instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised, please check.

@@ -645,7 +648,9 @@ private static void createTableLayoutOnStorage(StorageConfiguration<?> storageCo
}

initializeBootstrapDirsIfNotExists(basePath, storage);
HoodieTableConfig.create(storage, metaPathDir, props);
if (shouldCreateTableConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we always create the table properties file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just need to create the timeline/hstory folder without updating the configs.

@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Nov 27, 2024
address review comments

batch instants and use LegacyArchivedMetaEntryReader and LSMTimelineWriter directly

Fix archive timeline API and TestSevenToEightUpgradeHandler#testPropertyUpgrade

More fixes to TimelineArchiver V2 and V1

Fix the targer archive path for timeline migration[C

fix upgrade when auto upgrade is disabled

fix upgrade downgrade tests

fix testAutoUpgradeFailure

Fix the schema mismatch and commit metadata enconding for archived timeline

Fix the legacy archival reader for the instant

Add a FULL read mode for lsm timeline reader

archivePath as an attribute of LSMTimelineWriter and couple other fixes

fix archived instant read schema

More fix to the LSMTimelineWriter archive path
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL PR with lines of changes > 1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants