Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add report ordering #13

Open
3 tasks
thibautjombart opened this issue May 25, 2019 · 3 comments
Open
3 tasks

Add report ordering #13

thibautjombart opened this issue May 25, 2019 · 3 comments
Labels
enhancement New feature or request

Comments

@thibautjombart
Copy link
Collaborator

thibautjombart commented May 25, 2019

To handle dependencies between reports, it would be useful to implement an optional ordering of reports. This could be stored in a file .order or .reports_order at the root of a factory. The order would relate to the file names, without the dates, and default to alphanumeric. I would imagine:

  • get_order(): returns (undated) Rmd files in their order of compilation, defaulting to alphanumeric

  • set_order(x): sets the order of compilation of the reports; x could be a vector of names, which then needs validation against the names of existing reports, or a vector of integers, in which case this is applied to the output of get_order(); the output will be saved in .order

  • reset_order(): resets the order of compilation of the reports to default, i.e. removes the file .order

Comments and ideas welcome. I may be able to get a head start on this if @zkamvar is really not keen on it, unless we can get help from others?

@zkamvar zkamvar removed their assignment Jun 27, 2019
@sgetalbo
Copy link
Contributor

sgetalbo commented Nov 4, 2019

Hey @thibautjombart I believe this falls under Locke Data's remit now. Could I get a quick rundown of a use case for this? I'm not sure I understand the process - e.g.

  • why do you want to implement it/what scenario might it be used in?

  • When you refer to handling dependencies, is that meaning the data that was used in the report?

From what I understand, get_order() is simply a list of .Rmd files in a factory, ordered by their compilation date, Y/N?

  • Would it be useful to have any more information provided, such as the version of the data that was used etc?

TIA.

@thibautjombart
Copy link
Collaborator Author

Hey @sgetalbo

This boils down to outputs of some reports being used as inputs of others. The simplest use case I have encountered is:

  • clean_data_[date].Rmd processes raw data and output some clean data in a specific folder, e.g. data/clean/my_data.rds

  • analyse_data_[date].Rmd makes some analyses on the clean dataset, reading it from data/clean/my_data.rds

However, when calling update_reports() the order is by default alphanumeric, so that the analyses would be done before the cleaning; in this case we'd like to specify the order of these files. Note that it is not predicated on the [date], only on the base name of the report, e.g.:

set_order(c("clean_data", "analyse_data"))

Currently the workaround is to rename clean_data... to aaa_clean_data....

@reconverse reconverse deleted a comment from zkamvar Nov 6, 2019
@reconverse reconverse deleted a comment from zkamvar Nov 6, 2019
@thibautjombart
Copy link
Collaborator Author

This might be something worth looking into in the future. To adapt it to the current implementation, we could think of having priorities defined in the config file, e.g.:

compile_first:
  get_data
  clean_data

So that list_reports() would return reports with files matching the regexp get_data first, then clean_data, then the rest.

@thibautjombart thibautjombart added the enhancement New feature or request label Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants