Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check all columns work with release_type = 'record' (compiled releases) #43

Closed
duncandewhurst opened this issue Oct 22, 2019 · 13 comments · Fixed by #211
Closed

Check all columns work with release_type = 'record' (compiled releases) #43

duncandewhurst opened this issue Oct 22, 2019 · 13 comments · Fixed by #211
Labels
bug Something isn't working sql Relating to the SQL files in the sql directory

Comments

@duncandewhurst
Copy link
Contributor

When working with the Moldova records data, which contains compiled releases, I noticed that the release_tag column in views.release_summary_with_data is empty.

This looks like it is due to the same issue as in #36 - needing to look in data -> 'compiledRelease' -> 'tag' rather than just data -> 'tag' to populate the column.

Can we check that all columns are set up to support compiled releases?

@duncandewhurst duncandewhurst added the bug Something isn't working label Oct 22, 2019
@duncandewhurst duncandewhurst changed the title Check all aggregates work with records (compiled releases) Check all columns work with records (compiled releases) Oct 22, 2019
@robredpath
Copy link
Contributor

@duncandewhurst
Copy link
Contributor Author

This came up again when working with the Portugal records dataset - views.release_id is not populated

@jpmckinney jpmckinney added the sql Relating to the SQL files in the sql directory label Feb 18, 2020
@jpmckinney
Copy link
Member

The missing release tag was due to #85. However, for a record dataset, the tag will always be ['compiled'], under the current logic in the SQL.

In the case of the Portugal records dataset, the logic is:

  • For the release table in Kingfisher, copy the release_id column
  • For the record table in Kingfisher, store NULL
  • For the compiled_release table in Kingfisher, store NULL

If not NULL, then what value should there be for release_id for the last two? A record can have many releases, and a compiled release doesn't necessarily need an id (can discuss in open-contracting/standard#834), so I'm not sure any value makes sense.

@jpmckinney
Copy link
Member

Also, instead of adding a lot of support for looking in a record's compiled release in every query, we should just fix open-contracting/kingfisher-process#63, which would copy a record's compiledRelease to the compiled_release table, which is treated the same as a row in the release table here.

Once that's done, we can revert #38 and maybe other occurrences of 'compiledRelease'.

@ghost
Copy link

ghost commented Apr 20, 2020

We have just deployed open-contracting/kingfisher-process#63 and now the "compile releases" transform will work with records as well as releases - in light of that, this may need review.

@jpmckinney
Copy link
Member

@duncandewhurst Is there still an issue here?

@aVolpe
Copy link

aVolpe commented Jul 31, 2020

@jpmckinney We are having a similar issue when using the Paraguay records scrapper (I open the PR #129)

@jpmckinney
Copy link
Member

@aVolpe Can you have a look at #110, where I think we are working on this?

@yolile
Copy link
Member

yolile commented Jul 31, 2020

@jpmckinney I think this is about the use of the compiledRelease within a record and #110 is about the use of releases within a record

@jpmckinney
Copy link
Member

Can you provide some minimal data to test with (e.g. not all of Paraguay's data)? Looking at the code in 002...sql, it should work already.

@jpmckinney jpmckinney changed the title Check all columns work with records (compiled releases) Check all columns work with release_type = 'record' (compiled releases) Jul 31, 2020
@duncandewhurst
Copy link
Contributor Author

@duncandewhurst Is there still an issue here?

I'm not sure. The issue was to check that compiled releases published as part of records were being summarized correctly in all the views tables/columns.

It looks like two of the issues I thought were due to the format of the source data (release_tag and release_id) were actually due to other issues.

If someone has time to do some testing that would be great. I collected a sample of the Moldova records data in collection 1389 and generated view_data_collection_1389 in case that is useful for testing.

@jpmckinney
Copy link
Member

jpmckinney commented Aug 4, 2020

Aha, I think the fix we need to do here is to check all occurrences of JOIN data (the SQL is now consistently formatted, so these are now easy to find), and make sure that we have an appropriate CASE statement to get the data from the right location. (This applies, e.g., to the creation of the tender_summary_with_data table that was the subject of @aVolpe's #129).

Example CASE statement from 002...sql:

CREATE VIEW parties_summary AS
SELECT
    parties_summary_no_data.*,
    CASE WHEN release_type = 'record' THEN
        data #> ARRAY['compiledRelease', 'parties', party_index::text]
    WHEN release_type = 'embedded_release' THEN
        data -> 'releases' -> (mod(parties_summary_no_data.id / 10, 1000000)::integer) -> 'parties' -> party_index::integer
    ELSE
        data #> ARRAY['parties', party_index::text]
    END AS party
FROM
    parties_summary_no_data
    JOIN data ON data.id = data_id;

The occurrences to fix are in 005, 006, 007, and we should review the JOIN data in 008.

Update: Alternately, we should join tmp_release_summary_with_release_data instead of data, since the former already puts the right data in the data column. However, tmp_release_summary_with_release_data is presently never joined and only selected from, so I'd rather @kindly look into this when available. If urgent, we can pursue the above fix.

@jpmckinney
Copy link
Member

Update: Alternately, we should join tmp_release_summary_with_release_data instead of data, since the former already puts the right data in the data column. However, tmp_release_summary_with_release_data is presently never joined and only selected from, so I'd rather kindly look into this when available. If urgent, we can pursue the above fix.

tmp_release_summary_with_release_data is a view based on a temporary table (tmp_release_summary), so that's why we join directly to data instead (even if it means repeating a similar CASE statement in multiple places).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sql Relating to the SQL files in the sql directory
Projects
None yet
5 participants