-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore JSONField columns on DOI model. #523
Conversation
…roblem with unnecessary escaping of JSON strings.
- M2M fields with CollectionPeriods cause 1000+ DB queries, since the string renderer for that model references three related models. - ShortNamefromUUIDColumn made one query per UUID, even if provided a list of UUIDs. Updated to make 1-2 queries total. - Published tables for models were not making use of select_related for foreign key columns used in table display. Added debugpy to local requirements and update local docker compose file to run Django through debugpy to allow for attaching remotely from VSCode.
…ned JSON data without serializing it.
if f.remote_field.model == models.CollectionPeriod: | ||
kwargs.update( | ||
{ | ||
"queryset": models.CollectionPeriod.objects.select_related( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bonus performance improvement for accessing change forms in the MI. The forms by default render related models with str(model)
and the CollectionPeriod
string method references deployment
, deployment.campaign
and platform
. This resulted in three individual SQL queries for each item when rendering the form, and since there are a few hundred CollectionPeriods at the moment, that resulted in ~1500 DB queries just to render one form. This change gets the number of queries down to ~22.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, the form is unusable locally, where the debug toolbar inspecting each query causes the page load time to increase to 10 minutes or something.
@@ -57,6 +57,9 @@ class IOPPublishedTable(tables.Table): | |||
start_date = tables.Column(verbose_name="Start Date", accessor="start_date") | |||
end_date = tables.Column(verbose_name="End Date", accessor="end_date") | |||
|
|||
def __init__(self, data, *args, **kwargs): | |||
super().__init__(data.select_related("deployment"), *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are similar performance improvements to the one above. When presenting the tables, each row resulted in an additional query for each related field, which can add up.
@@ -111,10 +111,37 @@ def get_short_name(self, potential_uuid): | |||
# this really should never happen | |||
return potential_uuid | |||
|
|||
def get_short_names(self, potential_uuids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another performance improvement for rendering tables. When the ShortNamefromUUID column is provided with a list of UUIDs, it would previously execute a query per UUID. This fetches them all at once and reconstructs the list in the original order before returning it.
elif isinstance(field, JSONField): | ||
update[name] = model_form.cleaned_data[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the functional change for this PR. cleaned_data
contains an object parsed from the user-provided JSON string.
@@ -46,7 +46,7 @@ | |||
MIDDLEWARE += ["debug_toolbar.middleware.DebugToolbarMiddleware"] # noqa F405 | |||
# https://django-debug-toolbar.readthedocs.io/en/latest/configuration.html#debug-toolbar-config | |||
DEBUG_TOOLBAR_CONFIG = { | |||
"DISABLE_PANELS": ["debug_toolbar.panels.redirects.RedirectsPanel"], | |||
# "DISABLE_PANELS": ["debug_toolbar.panels.redirects.RedirectsPanel"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, I think, a performance improvement for the debug toolbar. It was so slow as to be unusable when this panel was disabled. I'm not sure why, but saw some github issues around it, so removed this line.
@@ -5,7 +5,7 @@ services: | |||
build: | |||
context: ./ | |||
dockerfile: Dockerfile.local | |||
command: python manage.py runserver 0.0.0.0:8001 | |||
command: python -m debugpy --listen 0.0.0.0:5678 manage.py runserver 0.0.0.0:8001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to attach a debugger in vscode to the container
@@ -1,5 +1,5 @@ | |||
# pull official base image | |||
FROM python:3.11.1 AS builder | |||
FROM --platform=linux/amd64 python:3.11.1 AS builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the repercussions of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They only matter if you're running an arm processor where you build the image. It makes it explicit that we want the amd64 version of the image. I was seeing some build errors on the arm version.
@@ -290,3 +290,5 @@ cmr/config.py | |||
*.sql | |||
hosts/* | |||
!hosts/*.sample | |||
|
|||
.backup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a system you are using locally for backups that's worth sharing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I've just been storing the backup json file in the src dir and got tired of avoiding committing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
As documented in NASA-IMPACT/admg-casei#538, the CMR fields on DOI models no longer store valid JSON, so they cannot be deserialized during the frontend build. The reason for this goes back to #459, where we converted the CMR fields from JSONFields to TextFields, due to a serialization bug in the MI which was causing the JSONField data to be serialized and escaped each time it was saved, resulting in excessively large field values over time (escape characters being escaped, etc).
This was not a bug in JSONField, but rather in the way we serialize data prior to saving a Change object when users edit Drafts in the MI. Rather than using the form's
cleaned_data
, which correctly deserializes the JSON string submitted by the user, we were storing the string directly. To illustrate:What we want to store:
{"a": "b"}
What we were storing:
'{"a": "b"}'
This is all valid, because
update
is a JSONField and you can store a string as a value, of course:However, when the user later wants to edit this data, they access the
ChangeUpdateView
, which loads the Change model data via theChangeModelFormMixin
. The mixin generates a ModelForm for the target model (e.g.DOI
), which is confusingly namedDOIForm
, even though we have a separately definedDOIForm
, which has nothing to do with this process and good luck figuring that out in your debugger, but I digress...The generated ModelForm correctly makes use of Django's JSONField to bind the data from the cmr fields to the form as a string. BUT when presented with a string instead of a deserialized object, what do you suppose JSONField does with it? Does it parse it the string first and then serialize it? Does it pass it along untouched, assuming it's already serialized? Nope. It
json.dumps
it again. And what happens if we JSON dump a serialized JSON object?Ooof. So, we could use a custom JSONField in our form to handle cases where we get JSON strings rather than JSON objects, but it's probably better to just store the correct data in the DB to begin with, which is what this PR does.
Update: Issues below are resolved
Initial investigation suggests the form fields in the MI are being treated as strings (understandably--they're text fields), which JSONField (on Django v4.1.5) processes directly with
json.dumps
: https://github.com/django/django/blob/eba81c83906fd966204b9853aaa793f0516ccd4b/django/db/models/fields/json.py#L93It looks like this may be fixed in Django 4.2.2, which first checks if the provided value is a string and attempts to json.loads it first, if so. https://github.com/django/django/blob/6218ed34541bfc1e49c0f169c5b3a216a63cd985/django/db/models/fields/json.py#L101
Effectively:
Update:
Django 4.2.2 doesn't resolve the issue on its own. The model does seem to be storing the correct, unescaped string, but when the value is presented in the update form, it is escaped. Looking into crispy forms and django form fields to see if that's where the issue occurs.