Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore JSONField columns on DOI model. #523

Merged
merged 10 commits into from
Jun 16, 2023
Merged

Conversation

edkeeble
Copy link
Contributor

@edkeeble edkeeble commented Jun 13, 2023

As documented in NASA-IMPACT/admg-casei#538, the CMR fields on DOI models no longer store valid JSON, so they cannot be deserialized during the frontend build. The reason for this goes back to #459, where we converted the CMR fields from JSONFields to TextFields, due to a serialization bug in the MI which was causing the JSONField data to be serialized and escaped each time it was saved, resulting in excessively large field values over time (escape characters being escaped, etc).

This was not a bug in JSONField, but rather in the way we serialize data prior to saving a Change object when users edit Drafts in the MI. Rather than using the form's cleaned_data, which correctly deserializes the JSON string submitted by the user, we were storing the string directly. To illustrate:

What we want to store: {"a": "b"}
What we were storing: '{"a": "b"}'

This is all valid, because update is a JSONField and you can store a string as a value, of course:

{
  "cmr_projects": "{\"a\": \"b\"}"
}

However, when the user later wants to edit this data, they access the ChangeUpdateView, which loads the Change model data via the ChangeModelFormMixin. The mixin generates a ModelForm for the target model (e.g. DOI), which is confusingly named DOIForm, even though we have a separately defined DOIForm, which has nothing to do with this process and good luck figuring that out in your debugger, but I digress...

The generated ModelForm correctly makes use of Django's JSONField to bind the data from the cmr fields to the form as a string. BUT when presented with a string instead of a deserialized object, what do you suppose JSONField does with it? Does it parse it the string first and then serialize it? Does it pass it along untouched, assuming it's already serialized? Nope. It json.dumps it again. And what happens if we JSON dump a serialized JSON object?

>>> s = '{"a": "b"}'
>>> json.dumps(s)
'"{\\"a\\": \\"b\\"}"'

Ooof. So, we could use a custom JSONField in our form to handle cases where we get JSON strings rather than JSON objects, but it's probably better to just store the correct data in the DB to begin with, which is what this PR does.


Update: Issues below are resolved

Initial investigation suggests the form fields in the MI are being treated as strings (understandably--they're text fields), which JSONField (on Django v4.1.5) processes directly with json.dumps: https://github.com/django/django/blob/eba81c83906fd966204b9853aaa793f0516ccd4b/django/db/models/fields/json.py#L93

>>> s = '{"a": "b"}'
>>> json.dumps(s)
'"{\\"a\\": \\"b\\"}"'

It looks like this may be fixed in Django 4.2.2, which first checks if the provided value is a string and attempts to json.loads it first, if so. https://github.com/django/django/blob/6218ed34541bfc1e49c0f169c5b3a216a63cd985/django/db/models/fields/json.py#L101

Effectively:

>>> s = '{"a": "b"}'
>>> json.dumps(json.loads(s))
'{"a": "b"}'

Update:

Django 4.2.2 doesn't resolve the issue on its own. The model does seem to be storing the correct, unescaped string, but when the value is presented in the update form, it is escaped. Looking into crispy forms and django form fields to see if that's where the issue occurs.

Edward Keeble and others added 9 commits June 13, 2023 16:54
…roblem with unnecessary escaping of JSON strings.
- M2M fields with CollectionPeriods cause 1000+ DB queries, since the string renderer for that model references three related models.
-  ShortNamefromUUIDColumn made one query per UUID, even if provided a list of UUIDs. Updated to make 1-2 queries total.
- Published tables for models were not making use of select_related for foreign key columns used in table display.

Added debugpy to local requirements and update local docker compose file to run Django through debugpy to allow for attaching remotely from VSCode.
@edkeeble edkeeble marked this pull request as ready for review June 15, 2023 19:48
if f.remote_field.model == models.CollectionPeriod:
kwargs.update(
{
"queryset": models.CollectionPeriod.objects.select_related(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bonus performance improvement for accessing change forms in the MI. The forms by default render related models with str(model) and the CollectionPeriod string method references deployment, deployment.campaign and platform. This resulted in three individual SQL queries for each item when rendering the form, and since there are a few hundred CollectionPeriods at the moment, that resulted in ~1500 DB queries just to render one form. This change gets the number of queries down to ~22.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, the form is unusable locally, where the debug toolbar inspecting each query causes the page load time to increase to 10 minutes or something.

@@ -57,6 +57,9 @@ class IOPPublishedTable(tables.Table):
start_date = tables.Column(verbose_name="Start Date", accessor="start_date")
end_date = tables.Column(verbose_name="End Date", accessor="end_date")

def __init__(self, data, *args, **kwargs):
super().__init__(data.select_related("deployment"), *args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are similar performance improvements to the one above. When presenting the tables, each row resulted in an additional query for each related field, which can add up.

@@ -111,10 +111,37 @@ def get_short_name(self, potential_uuid):
# this really should never happen
return potential_uuid

def get_short_names(self, potential_uuids):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another performance improvement for rendering tables. When the ShortNamefromUUID column is provided with a list of UUIDs, it would previously execute a query per UUID. This fetches them all at once and reconstructs the list in the original order before returning it.

Comment on lines +49 to +50
elif isinstance(field, JSONField):
update[name] = model_form.cleaned_data[name]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the functional change for this PR. cleaned_data contains an object parsed from the user-provided JSON string.

@@ -46,7 +46,7 @@
MIDDLEWARE += ["debug_toolbar.middleware.DebugToolbarMiddleware"] # noqa F405
# https://django-debug-toolbar.readthedocs.io/en/latest/configuration.html#debug-toolbar-config
DEBUG_TOOLBAR_CONFIG = {
"DISABLE_PANELS": ["debug_toolbar.panels.redirects.RedirectsPanel"],
# "DISABLE_PANELS": ["debug_toolbar.panels.redirects.RedirectsPanel"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, I think, a performance improvement for the debug toolbar. It was so slow as to be unusable when this panel was disabled. I'm not sure why, but saw some github issues around it, so removed this line.

@@ -5,7 +5,7 @@ services:
build:
context: ./
dockerfile: Dockerfile.local
command: python manage.py runserver 0.0.0.0:8001
command: python -m debugpy --listen 0.0.0.0:5678 manage.py runserver 0.0.0.0:8001
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to attach a debugger in vscode to the container

@edkeeble edkeeble enabled auto-merge June 16, 2023 14:07
@@ -1,5 +1,5 @@
# pull official base image
FROM python:3.11.1 AS builder
FROM --platform=linux/amd64 python:3.11.1 AS builder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the repercussions of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They only matter if you're running an arm processor where you build the image. It makes it explicit that we want the amd64 version of the image. I was seeing some build errors on the arm version.

@@ -290,3 +290,5 @@ cmr/config.py
*.sql
hosts/*
!hosts/*.sample

.backup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a system you are using locally for backups that's worth sharing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I've just been storing the backup json file in the src dir and got tired of avoiding committing it.

Copy link
Collaborator

@CarsonDavis CarsonDavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@edkeeble edkeeble merged commit 5615989 into dev Jun 16, 2023
@edkeeble edkeeble deleted the bugfix/casei-538-jsonfields branch June 16, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants