Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore generation of a single EITI summary data filee per country #28

Open
anderspeders opened this issue May 5, 2017 · 8 comments
Open

Comments

@anderspeders
Copy link

Why

In addition to the split file currently available it would would be great to explore the following route for creating single file when the data match.

Please see instructions from Development Gateway below.
There's a chance to eliminate the double counting in the flattened files under the following conditions:
1 - The Excel sheet should have disaggregated company information.
2 - The Government reported revenue and the Company reported revenue should match.
3 - Then you group by GFS Code + Name of revenue stream
4 - The rows of type "company" in column G aggregated by value_reported should be equal to the row of type "government" same column.
If all those conditions are met, then you can copy the value in column name_of_recieving_agency from the "government"type row, to the "company" type rows that have it empty, and delete the former, effectively eliminating the double counting.

What

@mattfullerton Would you be able to take a look at this and see what share of EITI files would be able to pass this test.

We should then discuss if we can create this as a flat file in supplement or simply replace when the file can be generated.

Notes

@mattfullerton
Copy link
Contributor

First look at this: 1921 reports have disaggregated company information and 624 do not. The last 3 rows will need a good deal more time to figure out.

@mattfullerton mattfullerton changed the title Explore generation of a single EITI summary data filee per country IMPORT (EITI): Explore generation of a single EITI summary data filee per country May 9, 2017
@anderspeders
Copy link
Author

Ok, please prioritise the RGI source tool. We can keep this pending given that it seems a bit tricky to solve.

@mattfullerton mattfullerton changed the title IMPORT (EITI): Explore generation of a single EITI summary data filee per country Explore generation of a single EITI summary data filee per country May 10, 2017
@anderspeders
Copy link
Author

I believe that we can close this now as this has been done, right?

@mattfullerton
Copy link
Contributor

It hasn't, no

@mattfullerton
Copy link
Contributor

@anderspeders Could you make a comment on how urgent/important this is? And (or @moman822) could you send a small example for the separated files that illustrate points 1-4 with the data?

@anderspeders
Copy link
Author

Following call today, please cost this approx. and we can then sign move this item forward.

@anderspeders
Copy link
Author

anderspeders commented Nov 13, 2017

Trying to recap where we are on this ticket. I have not been able to follow the conversation in slack and do not see anything captured in github.

My understanding is however that most file cannot be generated as a single file at the momemt due to the fact that they do not fully match. If that is the case for more than half of the counties I suggest that we keep the current setup as is and await to implement this for when the data from the EITI API reaches a high data quality.

Thoughts are welcome - until removing priority label.

@mattfullerton
Copy link
Contributor

Preliminary results are such:

All matched % | None matched % | Partially matched %
6.896551724137931 | 43.8871473354232 | 49.21630094043887

I have already posted the results per transaction in Slack and will post an update on that and a summary per report. We could dig a bit deeper to see if there's still something we're (=I'm) missing, but the fact that the method works very well for some reports and partially for others makes me think we're doing it right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants