Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MESDK-88 Minimal v2 persistence #2801

Merged
merged 31 commits into from
May 31, 2024
Merged

MESDK-88 Minimal v2 persistence #2801

merged 31 commits into from
May 31, 2024

Conversation

jldec
Copy link
Contributor

@jldec jldec commented May 18, 2024

This is step 1 toward full persistence and apis for v2 MessageBundles and resolves opral/inlang-sdk#50.

The (bare-bones) store dumps messageBundles into a single JSON file.

The goal of this PR is to be able to run the load-test and multi-project-test with v2 MessageBundles persisted to disk using the experimental persistence feature flag.

  • open a message store via loadProject
  • separately run inlang machine translate, adapted to a new CRUD api.
  • check the results, also using a new CRUD api.

Only the minimum read/write CRUD api necessary for the above will be included in this PR. This api should be considered internal / temporary.

  • load/store JSON v2 MessageBundles under experimental persistence feature flag
  • hook into loadProject to return a project.store.messageBundles CRUD api.
  • stub out the existing messages project.query.* api
  • implement minimal store api for v2 MessageBundles
  • minimal store api tests
  • reintroduce helper from #2655 to convert between v1 Message and v2 MessageBundle
  • adapt inlang machine translate to call the store api behind experimental persistence flag.
  • adapt load-test to support experimental persistence
  • review comments
  • batch writes when there are large numbers of updates
    • durable update api: use batchedIO to await until fs write completes
    • test for batchedIO
  • load-test with 10k messages
  • experimental json with slots and newlines for improved merge conflict avoidance
  • create issues for TODOs
  • changeset

to test

$ DEBUG=sdk:store,sdk:fileLock,sdk:batchedIO pnpm --filter @inlang/sdk-load-test test

not included in this PR (but planned in future PRs)

Note

Because the existing messagesQueryApi is stubbed out, existing apps will no longer work in projects with the experimental persistence feature flag turned on.

Copy link

changeset-bot bot commented May 18, 2024

🦋 Changeset detected

Latest commit: 075d664

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 30 packages
Name Type
@inlang/cli Patch
@inlang/sdk Patch
next-js-testapp Patch
@inlang/editor Patch
@inlang/plugin-i18next Patch
@inlang/plugin-json Patch
@inlang/plugin-m-function-matcher Patch
@inlang/plugin-next-intl Patch
@inlang/plugin-t-function-matcher Patch
@inlang/sdk-load-test Patch
@inlang/sdk-multi-project-test Patch
@inlang/badge Patch
@inlang/doc-layout-component Patch
@inlang/github-lint-action Patch
vs-code-extension Patch
@inlang/message-bundle-component Patch
@inlang/rpc Patch
@inlang/settings-component Patch
@inlang/telemetry Patch
@inlang/cross-sell-ninja Patch
@inlang/paraglide-unplugin Patch
@inlang/paraglide-js-e2e Patch
@inlang/paraglide-next-e2e Patch
@inlang/server Patch
@inlang/paraglide-rollup Patch
@inlang/paraglide-vite Patch
@inlang/paraglide-webpack Patch
@inlang/paraglide-astro Patch
@inlang/paraglide-sveltekit Patch
@inlang/paraglide-sveltekit-example Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@jldec
Copy link
Contributor Author

jldec commented May 25, 2024

Tests are now green, however dumping all messages into a single JSON file on every update is not a tenable design even for this first PR. When there a multiple writers, the large-file save lock-contention eventually produces errors (see below).

options

  1. use mutliple files e.g per message bundle or by language
  2. batch writes (like we do in v1)

cc: @janfjohannes @martin-lysk @samuelstroschein

$ pnpm --filter @inlang/sdk-load-test test

  load-test load-test start - using experimental persistence +0ms
  load-test generating 1000 messages +0ms
  load-test opening repo and loading project +2ms
Using existing cloned repo
  load-test subscribing to project.errors +6s
  load-test loaded 1000 v2 MessageBundles +0ms
  load-test translating messages with inlang cli +0ms
Using existing cloned repo

 ERROR   save exceeded maximum retries (10) to acquire lockfile 11                                            2:31:13 PM

  at acquireFileLock (/Users/jldec/opral/monorepo/inlang/source-code/cli/dist/main.js:74839:11)
  at Timeout._onTimeout (/Users/jldec/opral/monorepo/inlang/source-code/cli/dist/main.js:74904:33)

  load-test load-test done - exiting +9s

@janfjohannes
Copy link
Contributor

@martin-lysk @jldec my last status was that you use many small files with a locking mechanism for persistnece. what changed since then?

@samuelstroschein
Copy link
Member

samuelstroschein commented May 26, 2024

go for option 1: split by bundle (we have to do this anyways). for prototyping you likely don't need to re-implement @martin-lysk's namespacing algorithm to avoid more than 1000 files per directory.

@janfjohannes my last status was that you use many small files with a locking mechanism for persistnece. what changed since then?

i think @jldec wants to incrementally build the feature. dropping in one file seemed good enough but turned out to be insufficient even for prototyping

@jldec
Copy link
Contributor Author

jldec commented May 27, 2024

go for option 1: split by bundle (we have to do this anyways)

@samuelstroschein could you clarify the reason why you think we have to do this?
I understand in the short term it would be nice to use separate files to help git to avoid merge conflicts, but once we have sqlite support, lix will have to do this differently right?

I will look at both options to see which will unblock us to ship this PR soonest. File-per-message has more complexity which may not be justified given our immediate goal of shipping the v2 persistence api for apps asap.

my last status was that you use many small files with a locking mechanism for persistnece. what changed since then?

@janfjohannes, the POC for new sdk persistence with a file-per-message was removed from PR 2108 in order to limit the scope of that PR and address data-loss issues first. The plan to build bidirectional data-sync between plugins and sdk files was simplified to ship the new persistence separately, behind a feature flag, which is what we're doing in this PR.

The locking mechanism currently in place is a global (project-level) lock. We never implemented per-file locking.
@janfjohannes - b.t.w. it would be nice to have a simpler way to do atomic file writes through lix fs - e.g. by writing to a temp file and renaming.. That would allow us to avoid at least some of the complextity of the current lockfile with stale-lock detection.

@jldec
Copy link
Contributor Author

jldec commented May 27, 2024

One more comment about file-per-message persistence, and watching those files for udpates:

We currently use fs.watch to detect changes on disk (e.g. when the cli or git modifies message files directly), and we know that the current fs watcher is not reliable (see MESDK-91)

I think we can ship this PR without implementing file watching - but thinking ahead:

I would like to avoid registering a separate watcher per file per message for the same scalability reasons as avoiding reactivity per message in the core. There are other ways to do this e.g. by watching the root directory of the persisted tree or by using a different store-level write-coordination mechanism.

@janfjohannes
Copy link
Contributor

i think a good batching algorithm is needed in any case and would always be my first step before going into the other optimizations.

Copy link
Contributor

@martin-lysk martin-lysk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a look into the PR to clarify questions.

With a look at the query api interface and with the discussion @jldec and me had via around I feel like we are not 100% alinged about the steps forward and the subscribe functionality exposed by the SDK.

* E.g. `await project.messageBundles.get({ id: "..." })`
**/
export interface Query<T> {
get: (args: { id: string }) => Promise<T | undefined>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clearification - will the returned object T expose a .subscribe later on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why on T and not a separate subscribe method?

query.get("id")
query.subscribe("id")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samuel's answer is closer to what I was planning.
You won't be able to subscribe directly to stored entities at this level of the api, but a layer above this should be able to provide this capability so that apps don't need to implement their own change tracking. As mentioned below, I will rename this api to Store (instead of Query) to make this clearer.

**/
export interface Query<T> {
get: (args: { id: string }) => Promise<T | undefined>
set: (args: { data: T }) => Promise<void>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be split into insert, update and maybe upsert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api of rxdb looks pretty clean and carefully drafted see I use this as an inspiration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I would just copy the mango query builder and be done with discussions how our query Api should look like.

Copy link
Member

@samuelstroschein samuelstroschein May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the argument is that a mango query builder would take 1-2 weeks longer to implement, i would invest those 1-2 weeks:

  1. having a basic query builder now will ship the SDK faster but slow down apps (aka no speed advantage)
  2. breaking change in the future because apps will ask for more sophisticated querys

the implementation can be non-performance optimized. as long as the API is set, apps have an easy time querying now and we avoid a breaking change

Copy link
Contributor Author

@jldec jldec May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's helpful guidance.

The query builder api needs to live in a layer above this single entity, low-level store api, so that it can include other entities like lint reports. I will rename this api to Store<T> (instead of Query<T>) to make this clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay Store instead of Query makes sense.

the implementation can be non-performance optimized. as long as the API is set, apps have an easy time querying now and we avoid a breaking change

Addding to this remark: The query API doesn't even need to be complete. Functionality like filters ($gte) can be added incrementally.

@samuelstroschein
Copy link
Member

samuelstroschein commented May 28, 2024

@martin-lysk i understand that @jldec (correct me if i am wrong) want's to unblock the message component (cc @NilsJacobsen) by shipping persistency in this PR without backwards compatibility faster, then think about backwards compatibility afterwards (if it is even needed)

1. PR have the AST types in place
2. PR minimal/non-backwards compatible persistency to unblock message component (<- this PR)
3. PR (if required) backwards compatibility 

@samuelstroschein samuelstroschein temporarily deployed to v2-persistence - git-proxy PR #2801 May 30, 2024 23:44 — with Render Destroyed
@samuelstroschein samuelstroschein temporarily deployed to v2-persistence - badge-service PR #2801 May 31, 2024 10:17 — with Render Destroyed
@samuelstroschein samuelstroschein temporarily deployed to v2-persistence - fink-editor PR #2801 May 31, 2024 18:49 — with Render Destroyed
@samuelstroschein samuelstroschein temporarily deployed to v2-persistence - inlang-website PR #2801 May 31, 2024 18:49 — with Render Destroyed
@samuelstroschein samuelstroschein temporarily deployed to v2-persistence - inlang-manage PR #2801 May 31, 2024 18:49 — with Render Destroyed
@jldec jldec marked this pull request as ready for review May 31, 2024 18:50
@jldec jldec merged commit 90c7464 into main May 31, 2024
3 checks passed
@jldec jldec deleted the v2-persistence branch May 31, 2024 19:14
@github-actions github-actions bot locked and limited conversation to collaborators May 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist v2 MessageBundles with variants
4 participants