Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add/fix fields in csv export #2180

Merged

Conversation

replicantSocks
Copy link
Contributor

@replicantSocks replicantSocks commented Nov 2, 2023

Ticket #2107

Description

The following fields were added:
Funding Type
Eligibility
Appropriations Bill
Agency Code
The "Status" field was changed to "Opportunity Status"

Screenshots / Demo Video

grants (15).csv

Testing

In testing it appeared that adding the 'Eligibility' field would significantly impact performance. An example search went from <2 ms to >9ms. As a result, fetching csv-only fields was made conditional.

Without the eligibility field:

~/USDR/usdr-gost$ docker exec gost-postgres psql postgresql://postgres:password123@postgres:5432/usdr_grants -c "EXPLAIN ANALYZE $(cat query2)"
                                                                                        QUERY PLAN                                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=50.70..50.70 rows=1 width=1285) (actual time=1.571..1.576 rows=50 loops=1)                 
   ->  Sort  (cost=50.70..50.70 rows=1 width=1285) (actual time=1.570..1.572 rows=50 loops=1)                       
         Sort Key: open_date DESC NULLS LAST   
         Sort Method: top-N heapsort  Memory: 196kB                                                                       
         ->  WindowAgg  (cost=50.66..50.69 rows=1 width=1285) (actual time=1.185..1.280 rows=219 loops=1)
               ->  Group  (cost=50.66..50.66 rows=1 width=1237) (actual time=0.761..0.875 rows=219 loops=1)
                     Group Key: grant_id                                                                                                                                                                                                             
                     ->  Sort  (cost=50.66..50.66 rows=1 width=1237) (actual time=0.758..0.771 rows=219 loops=1)                                                                                                                                     
                           Sort Key: grant_id                                                                             
                           Sort Method: quicksort  Memory: 374kB                                                                                                                                                                                     
                           ->  Seq Scan on grants  (cost=0.00..50.65 rows=1 width=1237) (actual time=0.024..0.209 rows=219 loops=1)                                                             
                                 Filter: (CASE WHEN (archive_date <= now()) THEN 'archived'::text WHEN (close_date <= now()) THEN 'closed'::text ELSE 'posted'::text END = 'posted'::text)
                                 Rows Removed by Filter: 32                                                                                                                                                                                          
 Planning Time: 0.680 ms
 Execution Time: 1.799 ms
(15 rows)

With eligibility field added:

~/USDR/usdr-gost$ docker exec gost-postgres psql postgresql://postgres:password123@postgres:5432/usdr_grants -c "EXPLAIN ANALYZE $(cat query2)"
                                                                                           QUERY PLAN                                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.32..52.32 rows=1 width=1349) (actual time=8.682..8.688 rows=50 loops=1)
   ->  Sort  (cost=52.32..52.32 rows=1 width=1349) (actual time=8.681..8.684 rows=50 loops=1)
         Sort Key: grants.open_date DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 263kB
         ->  WindowAgg  (cost=52.25..52.31 rows=1 width=1349) (actual time=8.096..8.336 rows=219 loops=1)
               ->  GroupAggregate  (cost=52.25..52.27 rows=1 width=1269) (actual time=6.865..7.698 rows=219 loops=1)
                     Group Key: grants.grant_id
                     ->  Sort  (cost=52.25..52.26 rows=1 width=1292) (actual time=6.850..6.929 rows=1575 loops=1)
                           Sort Key: grants.grant_id
                           Sort Method: quicksort  Memory: 2701kB
                           ->  Nested Loop Left Join  (cost=0.00..52.24 rows=1 width=1292) (actual time=0.041..3.754 rows=1575 loops=1)
                                 Join Filter: (eligibility_codes.code = ANY (string_to_array(grants.eligibility_codes, ' '::text)))
                                 Rows Removed by Join Filter: 2148
                                 ->  Seq Scan on grants  (cost=0.00..50.65 rows=1 width=1237) (actual time=0.025..0.224 rows=219 loops=1)
                                       Filter: (CASE WHEN (archive_date <= now()) THEN 'archived'::text WHEN (close_date <= now()) THEN 'closed'::text ELSE 'posted'::text END = 'posted'::text)
                                       Rows Removed by Filter: 32
                                 ->  Seq Scan on eligibility_codes  (cost=0.00..1.17 rows=17 width=58) (actual time=0.001..0.002 rows=17 loops=219)
 Planning Time: 0.787 ms
 Execution Time: 9.114 ms
(19 rows)

Automated and Unit Tests

  • Added Unit tests

Manual tests for Reviewer

  • Added steps to test feature/functionality manually

Checklist

  • Provided ticket and description
  • Provided screenshots/demo
  • Provided testing information
  • Provided adequate test coverage for all new code
  • Added PR reviewers

@replicantSocks replicantSocks requested a review from a team November 2, 2023 21:49
Copy link
Member

@TylerHendrickson TylerHendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@replicantSocks Good catch on the performance issues – based on the query plan you provided, it seems like the performance hit is coming from needing to resolve each item in the grants table's string array of eligibility codes to its corresponding eligibility_codes.label value, which involves a lot more joins.

I think restricted to CSV-only should be fine for now, given that there's a hard upper limit on the number of rows, but given that the eligibility_codes table is fairly fixed in terms of its total row count, an alternative (which, for the record, is usually bad practice) could be to do something like this (e.g. in the grants.js CSV export route handler):

const ecMap = (await db('eligibility_codes').select('code', 'label')).reduce(
    (map, row) => { map[row.code] = row.label; return map; },
    {},
);

// Generate CSV
const formattedData = data.map((grant) => ({
    ...grant,
    interested_agencies: grant.interested_agencies
        .map((v) => v.agency_abbreviation)
        .join(', '),
    viewed_by: grant.viewed_by_agencies
        .map((v) => v.agency_abbreviation)
        .join(', '),
    eligibility_codes: grant.eligibility_codes.split(' ').map((code) => ecMap[code]).join('|'),
    open_date: new Date(grant.open_date).toLocaleDateString('en-US', { timeZone: 'UTC' }),
    close_date: new Date(grant.close_date).toLocaleDateString('en-US', { timeZone: 'UTC' }),
    url: `https://www.grants.gov/web/grants/view-opportunity.html?oppId=${grant.grant_id}`,
}));

This may not be worth reimplementing like above just yet, though – might be worth taking a look at real-world performance first, but putting it here for consideration.

@as1729 as1729 enabled auto-merge (squash) November 21, 2023 15:48
Copy link

codeclimate bot commented Nov 21, 2023

Code Climate has analyzed commit 335b507 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 58.1% (0.0% change).

View more on Code Climate.

@as1729 as1729 merged commit 30826a4 into usdigitalresponse:_staging Nov 21, 2023
7 checks passed
adele-usdr pushed a commit that referenced this pull request Nov 28, 2023
* feat: add/fix fields in csv export

* fix: column header flipped in test

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants