-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Crawl dataset's metadata only once and before Nipype's workflow #1317
Conversation
82597a5
to
ef40015
Compare
@effigies @mgxd it'd be nice to get feedback on this one - maybe not so much on the code itself (if time is tight), but on the idea of pushing all metadata crawling to just once at the beginning. This way:
WDTY? This PR is still marked a draft and I'm locally testing -- the base implementation should be solid enough for taking a look. |
Yeah, no objections to the overall strategy. |
So essentially this shifts to keeping all metadata in memory? What if the fields get pared down to only the ones relevant to the pipeline? |
2b6049f
to
233d7bd
Compare
Yes, that's correct. If memory becomes a concern (although metadata's size should be negligible), you can set them to None and only load them when needed from the pickle file.
Not sure I understand -- you mean further filtering metadata within the new loop so only the relevant metadata is kept? |
f5c2037
to
991eb50
Compare
I realize I'm not familiar with what mriqc actually needs the metadata for - is it using the information to calculate something, or just aggregating it into the report? |
fe47aa2
to
de9629d
Compare
Aggregating it in the report. However, fMRIPrep does something similar as it attaches all the metadata to the output. We could have some dictionary of relevant metadata, or leave to the user to find non-critical unmodified metadata (in fMRIPrep). MRIQC does filter some metadata when submitting to the webapi, but I'm a bit sceptical that will actually shave off a lot of memory. |
da3b412
to
cf1ea8f
Compare
Aggregate dataset-wise operations that typically traverse the list of input files (datalad get, extract biggest file size and metadata extraction) in a single step.
Resolves #1316.