Dataset consistency, validation, execution comparison and benchmarking between hosting environments #46

JimCircadian · 2023-04-06T10:38:12Z

Previously, dev16 runs which were a set intended to be comparative across multiple HPC platforms (BAS and JASMIN) were marred by various issues, such as limited wall times, an underlying data consistency issue, IO issues and suchlike...

...on reflection, these were abandoned to favour a development push in the project that would allow multiple environments to be validated, step-by-step, when executed with identical configurations across different underlying platforms. This issue captures the necessary tasks to go through, improve various elements of running such workflows, implement consistency checking between the data stores and generated assets from executions on different HPC platforms and demonstrate the workflow through production of a notebook that can be run on HPCa and HPCb and then compare those runs using the tooling.

Creating a new run that is smaller and consistent on both HPCs whilst we solve the problems that stopped dev16 from working (it was fairly large!) There are also requests to do full tilt training runs for a conservation project which mean several long running pipeline issues need sorting.

In the first instance, we should use demonstrators that are small and to the point, as there are future runs that will scale the usage considerably. We should capture high level discussion in this issue and address functional requirements, discussion and performance improvements within the issues spread across the repositories.

There is a lot to capture in here and many issues that can be absorbed into this project, so they might not all be linked in yet

Some rules of thumb:

This is to be part of the 0.3 development push, don't retrofit to the existing 0.2.* series of developments
It is not possible to use direct file checking, all validation and comparison must be some level of naive statistical comparison - we have to account for acceptable differences across files due to platform
Never rely on pinned underlying environments, we don't want that to be a prerequisite as different HPCs will have differing requirements to host the pipeline
Outputs should be machine and human parsable and easy to transfer for comparison, if possible (e.g. JSON)

The text was updated successfully, but these errors were encountered:

JimCircadian · 2024-01-02T11:19:46Z

@bnubald we should have a chat about this, but I've reworked the issue to explain the primary goal. This links into various other streams of work but will be the priority moving forward. Ping me a DM to discuss further.

All contributions to individual issues welcome from others! 😆

JimCircadian self-assigned this Apr 6, 2023

JimCircadian mentioned this issue Apr 6, 2023

Forecast for November 2023 #44

Open

JimCircadian changed the title ~~New model run for comparsions between single and dual hemisphere training~~ Dev16 fixes and model run for comparsions between single and dual hemisphere training Apr 6, 2023

JimCircadian changed the title ~~Dev16 fixes and model run for comparsions between single and dual hemisphere training~~ Dataset fixups, data validation and comparsion model runs between single and dual hemisphere training Jun 21, 2023

JimCircadian removed their assignment Jul 24, 2023

JimCircadian assigned JimCircadian and bnubald Nov 30, 2023

JimCircadian changed the title ~~Dataset fixups, data validation and comparsion model runs between single and dual hemisphere training~~ Dataset consistency, validation and comparsion between environments and hosts Dec 29, 2023

JimCircadian added enhancement New feature or request bug Something isn't working and removed bug Something isn't working labels Dec 29, 2023

JimCircadian changed the title ~~Dataset consistency, validation and comparsion between environments and hosts~~ Dataset consistency, validation and comparsion between hosting environments Jan 2, 2024

JimCircadian changed the title ~~Dataset consistency, validation and comparsion between hosting environments~~ Dataset consistency, validation, execution comparison and benchmarking between hosting environments Jan 2, 2024

bnubald added this to IceNet Roadmap Aug 6, 2024

bnubald moved this to Ready in IceNet Roadmap Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset consistency, validation, execution comparison and benchmarking between hosting environments #46

Dataset consistency, validation, execution comparison and benchmarking between hosting environments #46

JimCircadian commented Apr 6, 2023 •

edited

Loading

JimCircadian commented Jan 2, 2024

Dataset consistency, validation, execution comparison and benchmarking between hosting environments #46

Dataset consistency, validation, execution comparison and benchmarking between hosting environments #46

Comments

JimCircadian commented Apr 6, 2023 • edited Loading

JimCircadian commented Jan 2, 2024

JimCircadian commented Apr 6, 2023 •

edited

Loading