Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reduce query complexity for getting total grants #1848

Merged
merged 5 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 53 additions & 38 deletions packages/server/src/db/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,47 @@ function buildFiltersQuery(queryBuilder, filters, agencyId) {
);
}

function grantsQuery(queryBuilder, filters, agencyId, orderingParams, paginationParams) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function grantsQuery has 5 arguments (exceeds 4 allowed). Consider refactoring.

as1729 marked this conversation as resolved.
Show resolved Hide resolved
if (filters) {
if (filters.reviewStatuses?.length) {
queryBuilder.join(TABLES.grants_interested, `${TABLES.grants}.grant_id`, `${TABLES.grants_interested}.grant_id`)
.join(TABLES.interested_codes, `${TABLES.interested_codes}.id`, `${TABLES.grants_interested}.interested_code_id`);
}
if (parseInt(filters.assignedToAgencyId, 10) >= 0) {
queryBuilder.join(TABLES.assigned_grants_agency, `${TABLES.grants}.grant_id`, `${TABLES.assigned_grants_agency}.grant_id`);
}
buildKeywordQuery(queryBuilder, filters.includeKeywords, filters.excludeKeywords);
buildFiltersQuery(queryBuilder, filters, agencyId);
}
if (orderingParams.orderBy && orderingParams.orderBy !== 'undefined') {
if (orderingParams.orderBy.includes('interested_agencies')) {
// Only perform the join if it was not already performed above.
if (!filters.reviewStatuses?.length) {
queryBuilder.leftJoin(TABLES.grants_interested, `${TABLES.grants}.grant_id`, `${TABLES.grants_interested}.grant_id`);
}
const orderArgs = orderingParams.orderBy.split('|');
queryBuilder.orderBy(`${TABLES.grants_interested}.grant_id`, orderArgs[1]);
queryBuilder.orderBy(`${TABLES.grants}.grant_id`, orderArgs[1]);
} else if (orderingParams.orderBy.includes('viewed_by')) {
const orderArgs = orderingParams.orderBy.split('|');
queryBuilder.leftJoin(TABLES.grants_viewed, `${TABLES.grants}.grant_id`, `${TABLES.grants_viewed}.grant_id`);
queryBuilder.orderBy(`${TABLES.grants_viewed}.grant_id`, orderArgs[1]);
queryBuilder.orderBy(`${TABLES.grants}.grant_id`, orderArgs[1]);
} else {
const orderArgs = orderingParams.orderBy.split('|');
const orderDirection = ((orderingParams.orderDesc === 'true') ? 'desc' : 'asc');
if (orderArgs.length > 1) {
console.log(`Too many orderArgs: ${orderArgs}`);
}
queryBuilder.orderBy(orderArgs[0], orderDirection);
}
}
if (paginationParams) {
queryBuilder.limit(paginationParams.perPage);
queryBuilder.offset((paginationParams.currentPage - 1) * paginationParams.perPage);
}
}

/*
filters: {
reviewStatuses: List[Enum['Interested', 'Result', 'Rejected']],
Expand All @@ -530,46 +571,20 @@ function buildFiltersQuery(queryBuilder, filters, agencyId) {
*/
async function getGrantsNew(filters, paginationParams, orderingParams, tenantId, agencyId) {
console.log(filters, paginationParams, orderingParams, tenantId, agencyId);
const { data, pagination } = await knex(TABLES.grants)
const data = await knex(TABLES.grants)
.select(`${TABLES.grants}.*`)
.distinct()
.modify((queryBuilder) => {
if (filters) {
if (filters.reviewStatuses?.length) {
queryBuilder.join(TABLES.grants_interested, `${TABLES.grants}.grant_id`, `${TABLES.grants_interested}.grant_id`)
.join(TABLES.interested_codes, `${TABLES.interested_codes}.id`, `${TABLES.grants_interested}.interested_code_id`);
}
if (parseInt(filters.assignedToAgencyId, 10) >= 0) {
queryBuilder.join(TABLES.assigned_grants_agency, `${TABLES.grants}.grant_id`, `${TABLES.assigned_grants_agency}.grant_id`);
}
buildKeywordQuery(queryBuilder, filters.includeKeywords, filters.excludeKeywords);
buildFiltersQuery(queryBuilder, filters, agencyId);
}
if (orderingParams.orderBy && orderingParams.orderBy !== 'undefined') {
if (orderingParams.orderBy.includes('interested_agencies')) {
// Only perform the join if it was not already performed above.
if (!filters.reviewStatuses?.length) {
queryBuilder.leftJoin(TABLES.grants_interested, `${TABLES.grants}.grant_id`, `${TABLES.grants_interested}.grant_id`);
}
const orderArgs = orderingParams.orderBy.split('|');
queryBuilder.orderBy(`${TABLES.grants_interested}.grant_id`, orderArgs[1]);
queryBuilder.orderBy(`${TABLES.grants}.grant_id`, orderArgs[1]);
} else if (orderingParams.orderBy.includes('viewed_by')) {
const orderArgs = orderingParams.orderBy.split('|');
queryBuilder.leftJoin(TABLES.grants_viewed, `${TABLES.grants}.grant_id`, `${TABLES.grants_viewed}.grant_id`);
queryBuilder.orderBy(`${TABLES.grants_viewed}.grant_id`, orderArgs[1]);
queryBuilder.orderBy(`${TABLES.grants}.grant_id`, orderArgs[1]);
} else {
const orderArgs = orderingParams.orderBy.split('|');
const orderDirection = ((orderingParams.orderDesc === 'true') ? 'desc' : 'asc');
if (orderArgs.length > 1) {
console.log(`Too many orderArgs: ${orderArgs}`);
}
queryBuilder.orderBy(orderArgs[0], orderDirection);
}
}
})
.paginate(paginationParams);
.modify((qb) => grantsQuery(qb, filters, agencyId, orderingParams, paginationParams));

const counts = await knex(TABLES.grants)
.distinct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a nit because I don't think it will affect performance too much, but I think you can remove the .distinct() call here – the underlying query is only selecting a grants table row once, so rows should already be inherently distinct.

Removing the DISTINCT operator from the query will result in a slight (although likely negligible) performance improvement because it removes a step from the query plan that compares the final values to determine uniqueness.

Comparison:

  • With DISTINCT:
    • Actual:
      SELECT DISTINCT Count("grant_id") AS "total_grants"
      FROM            "grants"
      CROSS JOIN      To_tsquery('english', '(health) & !(fire)') AS tsq
      WHERE           (
                                      "tsq" @@ title_ts
                      OR              "tsq" @@ description_ts)
      AND             (
                                      "eligibility_codes" ~ '00|01'
                      AND             "grants"."opportunity_status" IN ('posted'));
       total_grants 
      --------------
                300
      (1 row)
      
    • EXPLAIN ANALYZE output:
       Unique  (cost=15111.27..15111.28 rows=1 width=8) (actual time=80.060..80.062 rows=1 loops=1)
         ->  Sort  (cost=15111.27..15111.27 rows=1 width=8) (actual time=80.059..80.060 rows=1 loops=1)
               Sort Key: (count(grants.grant_id))
               Sort Method: quicksort  Memory: 25kB
               ->  Aggregate  (cost=15111.25..15111.26 rows=1 width=8) (actual time=80.051..80.051 rows=1 loops=1)
                     ->  Seq Scan on grants  (cost=0.00..15110.90 rows=140 width=6) (actual time=0.333..80.006 rows=300 loops=1)
                           Filter: ((eligibility_codes ~ '00|01'::text) AND (opportunity_status = 'posted'::text) AND (('''health'' & !''fire'''::tsquery @@ title_ts) OR ('''health'' & !''fire'''::tsq
      uery @@ description_ts)))
                           Rows Removed by Filter: 73769
       Planning Time: 0.360 ms
       Execution Time: 80.111 ms
      (10 rows)
      
  • Without DISTINCT:
    • Actual:
      SELECT  Count("grant_id") AS "total_grants"
      FROM            "grants"
      CROSS JOIN      To_tsquery('english', '(health) & !(fire)') AS tsq
      WHERE           (
                                      "tsq" @@ title_ts
                      OR              "tsq" @@ description_ts)
      AND             (
                                      "eligibility_codes" ~ '00|01'
                      AND             "grants"."opportunity_status" IN ('posted'));
       total_grants 
      --------------
                300
      (1 row)
      
    • EXPLAIN ANALYZE output:
      Aggregate  (cost=15111.25..15111.26 rows=1 width=8) (actual time=89.836..89.837 rows=1 loops=1)
         ->  Seq Scan on grants  (cost=0.00..15110.90 rows=140 width=6) (actual time=0.334..89.793 rows=300 loops=1)
               Filter: ((eligibility_codes ~ '00|01'::text) AND (opportunity_status = 'posted'::text) AND (('''health'' & !''fire'''::tsquery @@ title_ts) OR ('''health'' & !''fire'''::tsquery @@ desc
      ription_ts)))
               Rows Removed by Filter: 73769
       Planning Time: 0.333 ms
       Execution Time: 89.866 ms
      (6 rows)
      

Copy link
Member

@TylerHendrickson TylerHendrickson Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@as1729

Edit: On second glance, it appears the DISTINCT clause is only looking at the total_grants result "column" (difference between SELECT Count(DISTINCT "grant_id")) AS "total_grants" vs SELECT DISTINCT Count("grants_id") AS "total_grants"). I still think the above suggestion is valid for the same reasons, but more to the point DISTINCT isn't really doing anything on this query other than adding an unnecessary planning step – i.e. a count query that only produces a single row/column will always be unique.

Copy link
Contributor Author

@as1729 as1729 Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the perf improvement here. One concern of removing the distinct is that there are many variations of joins in the filters (depending on what parameters are passed in) and we don't have good unit test coverage (or any coverage) on all these combinations. So its possible that one of the variations of the query isn't grouped which may cause a wrong total to show up.

Edit: I don't think we should do this given that the original query does not have any group-by clauses and relies on the distinct to get individual results. I think we can do this as an over-haul of the original filters.

.modify((qb) => grantsQuery(qb, filters, agencyId, { orderBy: undefined }, null))
.count('grant_id as total_grants');

const pagination = {
total: counts[0].total_grants,
lastPage: Math.ceil(parseInt(counts[0].total_grants, 10) / parseInt(paginationParams.perPage, 10)),
};

const dataWithAgency = await enhanceGrantData(tenantId, data);

Expand Down
6 changes: 6 additions & 0 deletions packages/server/src/routes/grants.js
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ router.get('/closestGrants/:perPage/:currentPage', requireUser, async (req, res)
res.json(rows);
});

router.get('/exportCSVNext', requireUser, async (req, res) => {
console.log(req, res);
// Make a CSV file and upload it to S3
// Send email to user with a signed link to download the file
});

// For API tests, reduce the limit to 100 -- this is so we can test the logic around the limit
// without the test having to insert 10k rows, which slows down the test.
const MAX_CSV_EXPORT_ROWS = process.env.NODE_ENV !== 'test' ? 10000 : 100;
Expand Down