feat: add/fix fields in csv export #2180

replicantSocks · 2023-11-02T21:49:00Z

Ticket #2107

Description

The following fields were added:
Funding Type
Eligibility
Appropriations Bill
Agency Code
The "Status" field was changed to "Opportunity Status"

Screenshots / Demo Video

grants (15).csv

Testing

In testing it appeared that adding the 'Eligibility' field would significantly impact performance. An example search went from <2 ms to >9ms. As a result, fetching csv-only fields was made conditional.

Without the eligibility field:

~/USDR/usdr-gost$ docker exec gost-postgres psql postgresql://postgres:password123@postgres:5432/usdr_grants -c "EXPLAIN ANALYZE $(cat query2)"
                                                                                        QUERY PLAN                                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=50.70..50.70 rows=1 width=1285) (actual time=1.571..1.576 rows=50 loops=1)                 
   ->  Sort  (cost=50.70..50.70 rows=1 width=1285) (actual time=1.570..1.572 rows=50 loops=1)                       
         Sort Key: open_date DESC NULLS LAST   
         Sort Method: top-N heapsort  Memory: 196kB                                                                       
         ->  WindowAgg  (cost=50.66..50.69 rows=1 width=1285) (actual time=1.185..1.280 rows=219 loops=1)
               ->  Group  (cost=50.66..50.66 rows=1 width=1237) (actual time=0.761..0.875 rows=219 loops=1)
                     Group Key: grant_id                                                                                                                                                                                                             
                     ->  Sort  (cost=50.66..50.66 rows=1 width=1237) (actual time=0.758..0.771 rows=219 loops=1)                                                                                                                                     
                           Sort Key: grant_id                                                                             
                           Sort Method: quicksort  Memory: 374kB                                                                                                                                                                                     
                           ->  Seq Scan on grants  (cost=0.00..50.65 rows=1 width=1237) (actual time=0.024..0.209 rows=219 loops=1)                                                             
                                 Filter: (CASE WHEN (archive_date <= now()) THEN 'archived'::text WHEN (close_date <= now()) THEN 'closed'::text ELSE 'posted'::text END = 'posted'::text)
                                 Rows Removed by Filter: 32                                                                                                                                                                                          
 Planning Time: 0.680 ms
 Execution Time: 1.799 ms
(15 rows)

With eligibility field added:

~/USDR/usdr-gost$ docker exec gost-postgres psql postgresql://postgres:password123@postgres:5432/usdr_grants -c "EXPLAIN ANALYZE $(cat query2)"
                                                                                           QUERY PLAN                                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.32..52.32 rows=1 width=1349) (actual time=8.682..8.688 rows=50 loops=1)
   ->  Sort  (cost=52.32..52.32 rows=1 width=1349) (actual time=8.681..8.684 rows=50 loops=1)
         Sort Key: grants.open_date DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 263kB
         ->  WindowAgg  (cost=52.25..52.31 rows=1 width=1349) (actual time=8.096..8.336 rows=219 loops=1)
               ->  GroupAggregate  (cost=52.25..52.27 rows=1 width=1269) (actual time=6.865..7.698 rows=219 loops=1)
                     Group Key: grants.grant_id
                     ->  Sort  (cost=52.25..52.26 rows=1 width=1292) (actual time=6.850..6.929 rows=1575 loops=1)
                           Sort Key: grants.grant_id
                           Sort Method: quicksort  Memory: 2701kB
                           ->  Nested Loop Left Join  (cost=0.00..52.24 rows=1 width=1292) (actual time=0.041..3.754 rows=1575 loops=1)
                                 Join Filter: (eligibility_codes.code = ANY (string_to_array(grants.eligibility_codes, ' '::text)))
                                 Rows Removed by Join Filter: 2148
                                 ->  Seq Scan on grants  (cost=0.00..50.65 rows=1 width=1237) (actual time=0.025..0.224 rows=219 loops=1)
                                       Filter: (CASE WHEN (archive_date <= now()) THEN 'archived'::text WHEN (close_date <= now()) THEN 'closed'::text ELSE 'posted'::text END = 'posted'::text)
                                       Rows Removed by Filter: 32
                                 ->  Seq Scan on eligibility_codes  (cost=0.00..1.17 rows=17 width=58) (actual time=0.001..0.002 rows=17 loops=219)
 Planning Time: 0.787 ms
 Execution Time: 9.114 ms
(19 rows)

Automated and Unit Tests

Added Unit tests

Manual tests for Reviewer

Added steps to test feature/functionality manually

Checklist

Provided ticket and description
Provided screenshots/demo
Provided testing information
Provided adequate test coverage for all new code
Added PR reviewers

TylerHendrickson

@replicantSocks Good catch on the performance issues – based on the query plan you provided, it seems like the performance hit is coming from needing to resolve each item in the grants table's string array of eligibility codes to its corresponding eligibility_codes.label value, which involves a lot more joins.

I think restricted to CSV-only should be fine for now, given that there's a hard upper limit on the number of rows, but given that the eligibility_codes table is fairly fixed in terms of its total row count, an alternative (which, for the record, is usually bad practice) could be to do something like this (e.g. in the grants.js CSV export route handler):

const ecMap = (await db('eligibility_codes').select('code', 'label')).reduce(
    (map, row) => { map[row.code] = row.label; return map; },
    {},
);

// Generate CSV
const formattedData = data.map((grant) => ({
    ...grant,
    interested_agencies: grant.interested_agencies
        .map((v) => v.agency_abbreviation)
        .join(', '),
    viewed_by: grant.viewed_by_agencies
        .map((v) => v.agency_abbreviation)
        .join(', '),
    eligibility_codes: grant.eligibility_codes.split(' ').map((code) => ecMap[code]).join('|'),
    open_date: new Date(grant.open_date).toLocaleDateString('en-US', { timeZone: 'UTC' }),
    close_date: new Date(grant.close_date).toLocaleDateString('en-US', { timeZone: 'UTC' }),
    url: `https://www.grants.gov/web/grants/view-opportunity.html?oppId=${grant.grant_id}`,
}));

This may not be worth reimplementing like above just yet, though – might be worth taking a look at real-world performance first, but putting it here for consideration.

codeclimate · 2023-11-21T15:50:40Z

Code Climate has analyzed commit 335b507 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 58.1% (0.0% change).

View more on Code Climate.

* feat: add/fix fields in csv export * fix: column header flipped in test ---------

feat: add/fix fields in csv export

5b88667

replicantSocks requested a review from a team November 2, 2023 21:49

fix: column header flipped in test

2ae8713

replicantSocks requested a review from as1729 November 13, 2023 19:36

TylerHendrickson approved these changes Nov 21, 2023

View reviewed changes

Merge branch '_staging' into add-more-csv-columns

335b507

as1729 enabled auto-merge (squash) November 21, 2023 15:48

as1729 merged commit 30826a4 into usdigitalresponse:_staging Nov 21, 2023
7 checks passed

adele-usdr pushed a commit that referenced this pull request Nov 28, 2023

feat: add/fix fields in csv export (#2180)

09d97f8

* feat: add/fix fields in csv export * fix: column header flipped in test ---------

sanason mentioned this pull request Nov 30, 2023

feat: add category of funding activity as search field #2267

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add/fix fields in csv export #2180

feat: add/fix fields in csv export #2180

replicantSocks commented Nov 2, 2023 •

edited

Loading

TylerHendrickson left a comment •

edited

Loading

codeclimate bot commented Nov 21, 2023

feat: add/fix fields in csv export #2180

feat: add/fix fields in csv export #2180

Conversation

replicantSocks commented Nov 2, 2023 • edited Loading

Ticket #2107

Description

Screenshots / Demo Video

Testing

Automated and Unit Tests

Manual tests for Reviewer

Checklist

TylerHendrickson left a comment • edited Loading

Choose a reason for hiding this comment

codeclimate bot commented Nov 21, 2023

replicantSocks commented Nov 2, 2023 •

edited

Loading

TylerHendrickson left a comment •

edited

Loading