Skip to content

Commit

Permalink
Jadudm/materialized views (#3511)
Browse files Browse the repository at this point in the history
* Adds materialized views to the startup

Uses the run.sh refactored form to add materialized views.

These are dropped first, then created on every deploy. This is because
we might choose to change the view between deploys. In that case, we
should completely destroy it.

The Django command

fac materialized_views --refresh

can be used to refresh the view(s) on any cycle desired.

* Linting

* Added unmanaged model for materialized view

* Switched from using dissemination tables in search module to materialized view

* An increment on the materialized view.

* Adding *all* the columns.

Because, if we're doing it, we should go all the way.

* Code cleaning

* Added passthrough to materialized view

* Adding in the .profile change

We dropped/added views in run.sh, but did not add it to .profile.

Before deploying, this would be a good idea.

* Incremental

* Adds the workflow for materialized views

Configures a matrix for the 3 core environments on a schedule, and allows us to run
via workflow_dispatch for a single target environment

* Add materialized view migration file.

* Linting fixes.

* Updated test cases to use Dissemination Combined view

* Ensure search is run against dissemination combined

* Updated test cases to use dissemination combined

* Code improvement

* Added testing environment to TestAdminAPI

* Skipping TestAdminAPI for now

* Disabling more test for now. Before going to prod, all these tests must pass

* Remove skipped API tests

* Add source for materialized view sh functions.

* Remove materialized view commands from .profile.

* Fixing names query

The names query was potentially not right.

It appears to now perform better (in terms of time to execute) as well
as "do the right thing".

* Adding in index creation.

Timing data in summary reports.

* Approximate 4x speedup in SF-SAC generation

This walks the DisseminationCombined table *only once*, which
reduces the number of times we traverse 4.4M rows.

This currently has a weird alternation in the view, to test the two
different exports.

* Adding .profile

I wonder if this actually is an issue?

* Yep, it matters.

This causes a timeout. has to run post-deploy.

* Allows for alternation...

In seconds 0-9 of a minute, we get the original report generator.

19-19, the new...

20-29 the old...

This makes testing the two against each-other in preview possible.

* Workflow changes for preview

* For want of a 'd'

* Updates tests to accommodate timing info

* Removing "TESTING"

We should be using a default dockerized set of values for the connection
string.

* Linting.

* Revert workflow test changes

* Set materizalized views creation post deployment

* Updating tests

* Troubleshooting commit. And some linting fixes.

* Troubleshooting commit.

* Troubleshooting commit.

* Proof of concept

* Switched Admin API Test to using django.db library

* More tests

* Still testing

* ....more testing

* bug fix

* Trying another approach of creating test tables

* Temporarily skipping for speed of troubleshooting

* Undo some changes made for testing purpose

* Commented out the new code in an attempt to isolate the issue

* Commenting more code to isolate the issue

* Looking for env variable values

* Reverting back previous changes now that the issue is isolated

* Typo

* Fixed linting

* Ensure summary report is using combined view

* Re-structure script to ensure db tables are created before materialized views

* Code cleaning + improvement

* Reverting tests that were skipped for debugging purpose

* Switched to concurrent refresh mode for materialized view

* Fixing PK

* Removing model fields to align with view

* Removing ID

* Possibly correct?

* Hotswap the view.

* Removing unique column

* remove unused files and commands

* clean up stray comments

* lint

* bring refresh back. it's used for a lot of tests.

* restore MV shell

* re-remove shell file, remove concurrently from MV refresh sql

* build MV on start i guess

* bring back passthrough

* lint

* no more shell script

* Update backend/run.sh

---------

Co-authored-by: Matt Jadud <[email protected]>
Co-authored-by: Hassan D. M. Sambo <[email protected]>
Co-authored-by: Alex Steel <[email protected]>
Co-authored-by: Daniel Swick <[email protected]>
  • Loading branch information
5 people authored Mar 22, 2024
1 parent 1d3e326 commit 336286f
Show file tree
Hide file tree
Showing 22 changed files with 1,656 additions and 176 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/materialize-views.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
name: Run the Materialize Views Django Function
on:
workflow_dispatch:
inputs:
environment:
required: true
type: choice
description: The environment the workflow should run on.
options:
- dev
- staging
- preview
- production

jobs:
dispatch-materialize-views:
if: ${{ github.event.inputs.environment != '' }}
name: Run Materialize Views on ${{ inputs.environment }}
runs-on: ubuntu-latest
environment: ${{ inputs.environment }}
env:
space: ${{ inputs.environment }}
steps:
- name: Run Command
uses: cloud-gov/cg-cli-tools@main
with:
cf_username: ${{ secrets.CF_USERNAME }}
cf_password: ${{ secrets.CF_PASSWORD }}
cf_org: gsa-tts-oros-fac
cf_space: ${{ env.space }}
command: cf run-task gsa-fac -k 2G -m 2G --name dispatch_create_materialized_views --command "python manage.py materialized_views --create"
2 changes: 1 addition & 1 deletion backend/audit/test_workbooks_should_fail.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from django.test import SimpleTestCase
import os
from functools import reduce
import re
from django.test import SimpleTestCase
from django.core.exceptions import ValidationError

from audit.intakelib import (
Expand Down
11 changes: 11 additions & 0 deletions backend/dissemination/api_versions.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from psycopg2._psycopg import connection
from config import settings
import logging
import os

logger = logging.getLogger(__name__)

Expand All @@ -23,6 +24,16 @@ def get_conn_string():
return conn_string


def exec_sql_at_path(dir, filename):
conn = connection(get_conn_string())
conn.autocommit = True
path = os.path.join(dir, filename)
with conn.cursor() as curs:
logger.info(f"EXEC SQL {path}")
sql = open(path, "r").read()
curs.execute(sql)


def exec_sql(location, version, filename):
conn = connection(get_conn_string())
conn.autocommit = True
Expand Down
22 changes: 22 additions & 0 deletions backend/dissemination/management/commands/materialized_views.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from django.core.management.base import BaseCommand
from dissemination import api_versions


class Command(BaseCommand):
help = """
Runs sql scripts to recreate access tables for the postgrest API.
"""

def add_arguments(self, parser):
parser.add_argument("-c", "--create", action="store_true", default=False)
parser.add_argument("-d", "--drop", action="store_true", default=False)
parser.add_argument("-r", "--refresh", action="store_true", default=False)

def handle(self, *args, **options):
path = "dissemination/sql"
if options["create"]:
api_versions.exec_sql_at_path(path, "create_materialized_views.sql")
elif options["drop"]:
api_versions.exec_sql_at_path(path, "drop_materialized_views.sql")
elif options["refresh"]:
api_versions.exec_sql_at_path(path, "refresh_materialized_views.sql")
623 changes: 623 additions & 0 deletions backend/dissemination/migrations/0015_disseminationcombined.py

Large diffs are not rendered by default.

Loading

0 comments on commit 336286f

Please sign in to comment.