Add mapping tester repetitions #195

fsimonis · 2024-08-08T10:50:05Z

Main changes of this PR

This PR:

adds repetitions to mapping tester. These default to 1 and are a final folder level in the case tree. This makes gathering more samples for performance data easy.
plots the average of runs

I have a further change coming up in which I make writing the output mesh on B optional. This prevents excessive use of storage. #197

Based on #210

Author's checklist

I used the pre-commit hook and used pre-commit run --all to apply all available hooks.
I added a test to cover the proposed changes in our test suite.
I updated the documentation in docs/README.md.
I updated potential breaking changes in the tutorial precice/tutorials/aste-turbine.

fsimonis · 2024-08-08T11:06:15Z

I am wondering if I should write a documentation page in docs/ and link to it from the README instead. Is this important/useful enough for a separate tooling page on the website?

fsimonis · 2024-10-25T11:37:26Z

I split the gatherstats improvement #210 and the documentation #209 into separate PRS

davidscn

What I like about this approach:

the repetitions are part of the json file

what I don't like about this approach:

the hierarchy of the generated run-tree becomes yet another level steeper
each run (with the exact same configuration) is duplicated in another directory. This is particularly problematic in terms of memory. In the current setup, having larger runs can easily lead to $\mathcal{O}(10)GB$. Having 5 repetitions then would lead to an excessive memory usage for no reason.

Now there is another suggestion of making the generation of output files (which occupy this memory) optional. However, the use case of the mapping-tester is to test mappings, which have error and runtime as their integral properties. One of the main features of all this is in fact that it measures both with one execution. Therefore, disabling the error seems rather like a different use case.

In my previous setups, I simply repeated the whole pipeline "repetitions" amount of times to generate the same statistical data for individual runs. Thereby it gets rid of the two downsides I mentioned above, but comes at the disadvantage that temporarily generated profiling data of individual runs is not accessible after all repetitions are completed.

I think a clean approach here would be to have the same directory hierarchy as we have right now, and store the profiling information of each run in individual precice-profiling-<N> directories. Then, we would have (the same) memory problems as before, avoid duplicating the same runs and directory structures while keeping the profiling information persistent.

fsimonis · 2024-10-28T09:51:16Z

the hierarchy of the generated run-tree becomes yet another level steeper

True. In practise this doesn't matter though. The directory structure is consistent, and it is easy to use globbing expressions or tools like find or fd to find all case files/directories.

Currently, we extract some job information from the path, but fundamentally, this could also go into a case.json, then the path doesn't matter too much and could be replaced by a hash.
The purpose of this path method was to be able to easily rerun entire mappings, which was useful to dial in RBF mappings.

each run (with the exact same configuration) is duplicated in another directory. This is particularly problematic in terms of memory. In the current setup, having larger runs can easily lead to O(10) GB . Having 5 repetitions then would lead to an excessive memory usage for no reason.

I assume that you mean storage, which depends on the goal of using the tool:

If mapping accuracy is a concern, then output is mandatory and repetitions are pointless. One output is all it needs to get useful results.
If mapping performance is a concern, then repetitions are mandatory and the output is pointless. Optional output of mapped mesh #197 allows disabling outputs if only timings are of interest.

However, the use case of the mapping-tester is to test mappings, which have error and runtime as their integral properties. One of the main features of all this is in fact that it measures both with one execution.
...

The mapping-tester was build to test accuracy and runtime for the mappings section of the preCICE v2 paper. Back then, I thankfully didn't run into storage issues, but the process was nevertheless wasteful in terms of storage and runtime (pointless post-processing).
For runtime measurements, this repetition loop and the aggregation needs to exist somewhere. It may be manually rerunning and saving outputs, it may be a loop over multiple test case directories, or it may directly baked into the tool.

I think a clean approach here would be to have the same directory hierarchy as we have right now, and store the profiling information of each run in individual precice-profiling- directories.

Reusing the same case files for multiple runs comes at the severe downside, that post-processing needs to be interleaved with the runs, which requires the entire job to wait. This is wasteful in terms of project CPUh and needlessly blocks the partition. Especially so, as post-processing isn't even parallelized.
The login nodes or the micro partition exist for such post-processing tasks.

This feature has always been part of the purpose of this tool. Up to now, we only got away with not having it built-in.

davidscn · 2024-10-28T09:58:09Z

If mapping accuracy is a concern, then output is mandatory and repetitions are pointless.
If mapping performance is a concern, then repetitions are mandatory and the output is pointless

You successfully summarized the issue: both are a concern.

Reusing the same case files for multiple runs comes at the severe downside, that post-processing needs to be interleaved with the runs, which requires the entire job to wait.

That's not the case, please read again. If I have a profiling directory for each repetition, then I can handle everything afterwards.

fsimonis · 2024-10-28T11:43:31Z

You successfully summarized the issue: both are a concern.

I am trying to make the point that not necessarily both are a concern at the same time.

If I am interested only in measuring runtime then I shouldn't have to pay for measuring mapping-accuracy.

That's not the case, please read again. If I have a profiling directory for each repetition, then I can handle everything afterwards.

I get it now. You essentially move the “repetitions” loop into the run script of a case itself like this:

for run in range(repetitions)
  runCase()
  move("precice-profiling", f"precice-profiling-{run}")

So no interleaving of run and post-processing 👍

There are still some open points:

this still writes the output mesh n times, which is unnecessary IO and storage usage. Outputting the mapped mesh only for the first repetition may be an option even though it will lead to a different global time for the writing solver B.
the memory measurement also needs to be replicated (which is pretty questionable anyhow for anything running on multiple cores).
this complicates gathering data from cases. As a case now results in multiple data-points. The solution proposed here is a drop-in.

Can you point to a branch which uses your current workflow? It would be intetesting to see an implementation of this.

davidscn · 2024-10-28T12:27:07Z

I get it now. You essentially move the “repetitions” loop into the run script of a case itself like this:

My first idea was to make use of the <profiling directory="" /> attribute, but honestly speaking, a simple renaming as suggested above seems to be much easier to apply and has the same effect.

this still writes the output mesh n times, which is unnecessary IO and storage usage.

Yes, it is unnecessary IO, but not any additional storage usage. The output mesh files generated in the first run will be overridden later on. Having the IO multiple times shouldn't hurt the runtime though.

the memory measurement also needs to be replicated

each measurement should give the same result, right?

this complicates gathering data from cases. As a case now results in multiple data-points. The solution proposed here is a drop-in.

I see what you have in mind. What I did last time was generating separate statistics files for each run and later on perform the averaging over all generating statistics files. Another option would be to load all frames immediately in each case and then aggregate the (e.g.) mean timing. Anyway, one would look at a single timing or derived measurements from the timings, it's mostly a question where to aggregate the data then.

That's what I did last time, after storing each run as a separate statistics file

https://github.com/davidscn/aste/blob/mapping-paper-setups/tools/mapping-tester/aggregate-timings.py

fsimonis force-pushed the add-mapping-tester-repetitions branch from dc2679c to eb1c09b Compare August 8, 2024 14:22

fsimonis mentioned this pull request Aug 8, 2024

Optional output of mapped mesh #197

Draft

4 tasks

fsimonis self-assigned this Aug 8, 2024

fsimonis requested a review from davidscn August 8, 2024 14:39

davidscn mentioned this pull request Sep 4, 2024

Installation of additional tools with ASTE #200

Open

fsimonis force-pushed the add-mapping-tester-repetitions branch from eb1c09b to ef98748 Compare October 25, 2024 11:45

fsimonis changed the title ~~Add mapping tester repetitions and docs~~ Add mapping tester repetitions Oct 25, 2024

davidscn requested changes Oct 28, 2024

View reviewed changes

fsimonis added 4 commits October 28, 2024 16:09

Improve performance and robustness of gathering stats

7865c78

Add repetitions to mapping tester

b9363f1

Plot averages of runs

0522925

Add changelog

7adf960

fsimonis force-pushed the add-mapping-tester-repetitions branch from ef98748 to 7adf960 Compare October 28, 2024 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mapping tester repetitions #195

Add mapping tester repetitions #195

fsimonis commented Aug 8, 2024 •

edited

Loading

fsimonis commented Aug 8, 2024

fsimonis commented Oct 25, 2024

davidscn left a comment

fsimonis commented Oct 28, 2024

davidscn commented Oct 28, 2024

fsimonis commented Oct 28, 2024 •

edited

Loading

davidscn commented Oct 28, 2024 •

edited

Loading

Add mapping tester repetitions #195

Are you sure you want to change the base?

Add mapping tester repetitions #195

Conversation

fsimonis commented Aug 8, 2024 • edited Loading

Main changes of this PR

Author's checklist

fsimonis commented Aug 8, 2024

fsimonis commented Oct 25, 2024

davidscn left a comment

Choose a reason for hiding this comment

fsimonis commented Oct 28, 2024

davidscn commented Oct 28, 2024

fsimonis commented Oct 28, 2024 • edited Loading

davidscn commented Oct 28, 2024 • edited Loading

fsimonis commented Aug 8, 2024 •

edited

Loading

fsimonis commented Oct 28, 2024 •

edited

Loading

davidscn commented Oct 28, 2024 •

edited

Loading