Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate forms to use offline entities #692

Closed
matthew-white opened this issue Jul 26, 2024 · 6 comments · Fixed by getodk/central-backend#1210
Closed

Migrate forms to use offline entities #692

matthew-white opened this issue Jul 26, 2024 · 6 comments · Fixed by getodk/central-backend#1210
Assignees
Labels
backend Requires a change to the API server behavior verified Behavior has been manually verified entities Multiple Encounter workflows frontend Requires a change to the UI

Comments

@matthew-white
Copy link
Member

matthew-white commented Jul 26, 2024

One issue around offline entities that we've discussed is what would happen if a Central project has a mix of forms such that some forms use offline entities, while others don't. That could end up being a very confusing experience to data collectors. For example, imagine a project that has four forms and two entity lists: form A updates entity list X, form B also updates X, form C updates entity list Y, and form D reads from both X and Y (but doesn't update them). There are a number of possibilities then with the potential for confusion:

  • Form A makes offline updates, but form B makes non-offline updates, so A's updates to entity list X are persisted across forms/submissions on the client, but B's updates are not. X could end up in a weird state on the client.
  • Forms A and B both make offline updates to entity list X, but form C makes non-offline updates to entity list Y. In that case, form D will use the latest offline updates from X, but not the latest local updates from Y (only the latest server update).

The solution to this issue that we've discussed is to migrate all existing forms that modify (or even use?) entities. The idea is to automatically convert forms to the latest entities spec (2024.1) so that they are all capable of making offline changes. If they are all converted at the same time, then clients shouldn't end up with a mix of forms, avoiding this issue.

Once forms are migrated, we will need to check going forward that all form definitions that are uploaded (for new or existing forms) are capable of making offline changes. Specifically, we should reject form definitions that specify an entities spec before 2024.1. That shouldn't be an issue for users using XLSForms (things should work seamlessly), because pyxform will automatically specify 2024.1. (UPDATE: This will be done separately as part of issue #730.)

In the future, users may wish to opt out of offline entities. We're thinking that that's something that would be configured on the client (perhaps via a QR code setting) and not something that the XForm would determine. In other words, even if we migrate forms now, that doesn't close off the possibility of users opting out of offline entities in the future.

I think we should convert both the current form def (the most recently published def) and the form draft. That's what we do when we convert forms when enabling managed encryption. Central will autogenerate the form version string as part of the migration. If there is an XLSForm on the form def, Central should carry it forward to the migrated form def (related: #729).

This might not be needed, but I personally think that it'd be nice if Frontend were to show a message above the forms table when the forms table includes a form that has been recently migrated. The message would inform users that their form definitions have been changed and will need to be re-downloaded to clients. If the autogenerated form version string includes a particular prefix or suffix, then Frontend could check for that prefix/suffix and a timestamp in the last X days when determining whether to show the message.

One thing to note is that even if Central migrates all forms at once, there are rare cases where a client could end up with a mix of forms. That could happen if the form download process is interrupted partway through such that only some forms are updated. If a client does end up with a mix of forms, then it's possible that a submission would specify a non-offline entity update that would result in an entity processing error on the server. That should be a rare case, but once we implement #688, such errors would be surfaced in Frontend.

Another thing to think about is forms that used to be allowed in Central, but are not anymore. For example, it's no longer possible to upload a form definition without a <meta> field, but a server could have an old form definition without a <meta> field. It's possible that such a form definition would cause the migration to fail, so we should check that case. Could we just skip forms like that or bypass requirements like the existence of a <meta> field? How does that work currently when enabling managed encryption? Another example of a form that used to be allowed is a form definition that includes an <entity> field without any action attribute (neither create nor update). Such form definitions were never valid according to the entities spec and would be rejected on upload today, but at some point, it was possible to upload such a form.

@matthew-white matthew-white added backend Requires a change to the API server entities Multiple Encounter workflows labels Jul 26, 2024
@github-project-automation github-project-automation bot moved this to 🕒 backlog in ODK Central Jul 26, 2024
@matthew-white
Copy link
Member Author

Currently, I'm thinking that the only thing that the database migration should do is add events to the audit log like upgrade.process.form and upgrade.process.form.draft. That way, it'd be the worker rather than the database migration that would ultimately convert forms to the latest entities spec. In order to convert forms, we'll need to call something like Forms.createVersion(), but as noted in getodk/central-backend#1084, it's best for database migrations not to rely on application code. We could use a worker to call Forms.createVersion() instead.

This strategy was used previously to backfill Enketo IDs for forms and form drafts.

Another thing to think about is forms that used to be allowed in Central, but are not anymore. … It's possible that such a form definition would cause the migration to fail.

I think this is another advantage of using a worker. If a worker tries to create a new version of a form, and it fails for some reason, then that'd be handled in the usual way and won't bring anything down. On the other hand, if a database migration fails, then the server won't be able to complete the upgrade to .2.

@ktuite
Copy link
Member

ktuite commented Oct 3, 2024

Entity spec version history:
2022.1.0: <entity dataset="foo" id="" create=""> (create only)
2023.1.0: <entity dataset="people" id="" update="" baseVersion=""> (update now allowed, baseVersion required for update)
2024.1.0: <entity dataset="people" id="" create="" update="" baseVersion="" trunkVersion="" branchId=""> (offline updates now possible, trunkVersion and branchId required for that)

Questions:

1. Maybe just upgrade 2023.1 (entity update) forms

2023.1.0 is the only version with the problem because it is the only released version that supports updating. (Collect doesn't fill in extra fields like baseVersion, branchId, and trunkVersion on create anyway.) Maybe we only update 2023.1.0 forms? Or only forms with the update action (a subset of 2023.1.0 forms)? When we restrict certain versions of the entity spec, we could just exclude 2023.1.0 but continue to accept 2022.1.0 just to minimize the disruption?

I was first thinking about this when considering the 2022.1.0 spec <entity dataset="foo" id="" create=""> that just has create and no baseVersion. It felt weird to add branchId and trunkVersion to this definition without baseVersion. Would we add that, too? Or would we just take no action in this case?

2. Published form only vs. draft only vs. both vs. other edge cases

Looking back at the PR that upgraded forms with the upgrade.process.form action, that touched every form and form draft (acting on each FormDef).

For what we're doing now, I think we just want to do one action per Form. Here are the cases:

a. Published form only: introduce new version that has xml with new entity spec and probably a version suffix, carry xlsx attachment over
b. Draft form only: it seems like the draft has to be updated in place. otherwise any draft submissions associated with that draft will get lost.
c. Published form and active draft: make new version of published form, update the draft in place (combo of above)
d. Encrypted form: don't do anything with entities in encrypted forms anyway so skip it
e. Previous version of a form dealt with entities but current/active form does not: skip it

Does that seem right?

@matthew-white
Copy link
Member Author

matthew-white commented Oct 4, 2024

2023.1.0 is the only version with the problem because it is the only released version that supports updating. … Maybe we only update 2023.1.0 forms? Or only forms with the update action (a subset of 2023.1.0 forms)?

That makes sense to me. 👍 I think the idea of updating as few forms as possible is nice. Targeting forms with an update attribute sounds ideal / like the minimal subset to me.

When we restrict certain versions of the entity spec, we could just exclude 2023.1.0 but continue to accept 2022.1.0 just to minimize the disruption?

That also makes sense to me. 👍

Draft form only: it seems like the draft has to be updated in place. otherwise any draft submissions associated with that draft will get lost.

I wasn't so sure initially about updating in place, but now I think that it's the least confusing thing. I think we should carry forward the XLSForm of the draft that's being replaced.

The other cases make sense to me too. So does the idea of one action per form rather than one action per form def.

@matthew-white matthew-white added the needs testing Needs manual testing label Oct 16, 2024
@matthew-white
Copy link
Member Author

If a client does end up with a mix of forms, then it's possible that a submission would specify a non-offline entity update that would result in an entity processing error on the server.

Rereading this issue, I'm not sure why I thought a non-offline entity update would result in an entity processing error. As far as I know, we're not planning to require that all entity updates via submission be offline updates.

@lognaturel
Copy link
Member

we're not planning to require that all entity updates via submission be offline updates.

Right! It's totally fine for a client to still have old form versions and be submitting updates that are not offline-aware. In that case the client won't have an offline representation and everything will be consistent.

@srujner
Copy link

srujner commented Nov 6, 2024

Tested with Success!

@srujner srujner added behavior verified Behavior has been manually verified and removed needs testing Needs manual testing labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Requires a change to the API server behavior verified Behavior has been manually verified entities Multiple Encounter workflows frontend Requires a change to the UI
Projects
Status: ✅ done
Development

Successfully merging a pull request may close this issue.

4 participants