Merge branch 'main' into functional_testing_ci

NVIDIA · May 15, 2024 · d3d7d30 · d3d7d30
2 parents 3ea51b1 + f3c6951
commit d3d7d30
Show file tree

Hide file tree

Showing 126 changed files with 285 additions and 28 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,20 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+---
+
+## Describe the Bug
+> A clear and concise description of what the bug is.
+
+## Steps to Reproduce
+> Steps to reproduce the behavior.
+> Please include the version information where the bug was observed.
+
+## Expected Behavior
+> A clear and concise description of what you expected to happen.
+
+## Screenshots
+> If applicable, add screenshots to help explain your problem.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,13 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+---
+
+## Problem Related to the Feature
+> A clear and concise description of what the problem is.
+
+## Proposed Solution
+> A clear and concise description of what you want to happen.
diff --git a/.github/ISSUE_TEMPLATE/general_question.md b/.github/ISSUE_TEMPLATE/general_question.md
@@ -0,0 +1,9 @@
+---
+name: General question
+about: Ask a question or seek clarification about the project
+title: ''
+labels: 'question'
+assignees: ''
+---
+
+> Please provide a detailed description of your question or the information you seek.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,13 @@
+## Summary
+Provide a concise summary of the changes introduced by this pull request. Detail the purpose and scope of the changes, referencing any relevant issues or discussions. Explain how these changes address the problem or improve the project.
+
+## Test Plan
+In this section, describe the testing you have performed to verify the changes. Include:
+- A clear description of the testing environment.
+- The steps you followed to test the new features or bug fixes.
+- Any specific commands used during testing, along with their outputs.
+- A description of the results and observations from your testing.
+This information is crucial for reviewers to understand how the changes have been validated.
+
+## Additional Notes
+Include any other notes or comments about the pull request here. This can include challenges faced, future considerations, or context that reviewers might find helpful.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,55 @@
+name: CI
+
+on:
+  pull_request:
+
+jobs:
+
+  lint:
+    name: Linting
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: 3.9
+
+      - name: Install dependencies
+        run: pip install -r requirements-dev.txt
+
+      - name: Run ruff linter
+        run: ruff check .
+
+      - name: Run ruff formatter
+        run: ruff format --check --diff .
+
+      - name: Run pyright
+        run: pyright .
+
+      - name: Run vulture check
+        run: vulture src/ tests/ ci_tools/ --min-confidence 100
+
+  test:
+    name: Run pytest
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: 3.9
+
+      - name: Install dependencies
+        run: pip install pytest -r requirements-dev.txt
+
+      - name: Run pytest
+        run: pytest -vv
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,91 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# pytype static type analyzer
+.pytype/
+
+# pycharm
+.idea/
+
+# VSCode
+.vscode/
+
+# Editors and IDEs
+*.swp
+*.bak
+*.tmp
+*~
+*.sublime-project
+*.sublime-workspace
+
+# OS generated files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+*.log
+install/
+results/
+.*
diff --git a/README.md b/README.md
@@ -33,43 +33,43 @@ Cloud AI supports five modes: install, dry-run, run, generate-report, and uninst
 * Use the generate-report mode to generate reports under the test directories alongside the raw data.
 * Use the uninstall mode to remove installed test templates.
 
-To install test templates, run main.py in install mode.
+To install test templates, run Cloud AI CLI in install mode.
 Please make sure to use the correct system configuration file that corresponds to your current setup for installation and experiments.
 ```bash
-python main.py\
+cloudai\
     --mode install\
     --system_config_path conf/v0.6/general/system/example_slurm_cluster.toml
 ```
 
 To simulate running experiments without execution, use the dry-run mode:
 ```bash
-python main.py\
+cloudai\
     --mode dry-run\
     --system_config_path conf/v0.6/general/system/example_slurm_cluster.toml\
     --test_scenario_path conf/v0.6/general/test_scenario/sleep/test_scenario.toml
 ```
 
-To run experiments, execute main.py in run mode:
+To run experiments, execute Cloud AI CLI in run mode:
 ```bash
-python main.py\
+cloudai\
     --mode run\
     --system_config_path conf/v0.6/general/system/example_slurm_cluster.toml\
     --test_scenario_path conf/v0.6/general/test_scenario/sleep/test_scenario.toml
 ```
 
-To generate reports, execute main.py in generate-report mode:
+To generate reports, execute Cloud AI CLI in generate-report mode:
 ```bash
-python main.py\
+cloudai\
     --mode generate-report\
     --system_config_path conf/v0.6/general/system/example_slurm_cluster.toml\
     --output_path /path/to/output_directory
 ```
 In the generate-report mode, use the --output_path argument to specify a subdirectory under the result directory.
 This subdirectory is usually named with a timestamp for unique identification.
 
-To uninstall test templates, run main.py in uninstall mode:
+To uninstall test templates, run Cloud AI CLI in uninstall mode:
 ```bash
-python main.py\
+cloudai\
     --mode uninstall\
     --system_config_path conf/v0.6/general/system/example_slurm_cluster.toml
 ```

diff --git a/ci_tools/functional_tests/run_functional_test.sh b/ci_tools/functional_tests/run_functional_test.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 
 VERBOSE=true
+CLOUDAI_PATH="./src/"
 
 if [ $# -ne 2 ]; then
     echo "Usage: $0 <test_scenario_path> <expected_output_path>"
@@ -51,9 +52,12 @@ dirs_diff() {
     fi
 }
 
+export PYTHONPATH=$PYTHONPATH:$CLOUDAI_PATH
+scenario_path=$1
+expected_output_path=$2
 
-scenario_path="$1"
-expected_output_path="$2"
+$VERBOSE && echo "Scenario dir: $scenario_path"
+$VERBOSE && echo "Expected output dir: $expected_output_path"
 
 if [ ! -f "$scenario_path" ]; then
     >&2 echo "Error: Scenario $scenario is not valid, can't find path $scenario_path."
@@ -64,19 +68,24 @@ fi
 
 last_result_before=$(ls results/ -la -X | tail -n 3 | head -n 1 | awk '{print $NF}')
 
-python main.py \
+python -m cloudai \
     --mode dry-run\
     --system_config_path "ci_tools/functional_tests/system_config.toml" \
     --test_scenario_path $scenario_path
 
+if [ $? -ne 0 ]; then
+    echo "Tests failed"
+    exit 1
+fi
+
 last_result=$(ls results/ -la -X | tail -n 3 | head -n 1 | awk '{print $NF}')
 
 if [ "$last_result_before" == "$last_result" ]; then
     >&2 echo "No new result added after running cloudai dry run."
     exit 1
 fi
 
-last_result_path="results/$last_result"
+last_result_path="results/$last_result/"
 
 dirs_diff "$expected_output_path" "$last_result_path"
 is_diff=$?

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,9 +1,16 @@
 [project]
 name = "cloudai"
 version = "0.6"
+dependencies = [
+    "bokeh==3.4.1",
+    "pandas==2.2.1",
+    "requests==2.31.0",
+    "tbparse==0.0.8",
+    "toml==0.10.2",
+]
 
-[tool.setuptools.packages.find]
-where = ["cloudai"]
+[project.scripts]
+cloudai = "cloudai.__main__:main"
 
 [tool.ruff]
 target-version = "py39"

diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -3,3 +3,6 @@ pytest==8.1.1
 ruff==0.3.7
 pandas-stubs==2.2.*
 pyright==1.1.359
+build==1.2.*
+vulture==2.11
+
diff --git a/cloudai/__init__.py → src/cloudai/__init__.py b/cloudai/__init__.py → src/cloudai/__init__.py
diff --git a/main.py → src/cloudai/__main__.py b/main.py → src/cloudai/__main__.py
diff --git a/cloudai/grader/__init__.py → src/cloudai/grader/__init__.py b/cloudai/grader/__init__.py → src/cloudai/grader/__init__.py
diff --git a/cloudai/grader/grader.py → src/cloudai/grader/grader.py b/cloudai/grader/grader.py → src/cloudai/grader/grader.py
diff --git a/cloudai/installer/__init__.py → src/cloudai/installer/__init__.py b/cloudai/installer/__init__.py → src/cloudai/installer/__init__.py
diff --git a/cloudai/installer/base_installer.py → src/cloudai/installer/base_installer.py b/cloudai/installer/base_installer.py → src/cloudai/installer/base_installer.py
diff --git a/cloudai/installer/installer.py → src/cloudai/installer/installer.py b/cloudai/installer/installer.py → src/cloudai/installer/installer.py
diff --git a/cloudai/installer/slurm_installer.py → src/cloudai/installer/slurm_installer.py b/cloudai/installer/slurm_installer.py → src/cloudai/installer/slurm_installer.py
diff --git a/cloudai/installer/standalone_installer.py → ...cloudai/installer/standalone_installer.py b/cloudai/installer/standalone_installer.py → ...cloudai/installer/standalone_installer.py
diff --git a/cloudai/parser/__init__.py → src/cloudai/parser/__init__.py b/cloudai/parser/__init__.py → src/cloudai/parser/__init__.py
diff --git a/cloudai/parser/core/__init__.py → src/cloudai/parser/core/__init__.py b/cloudai/parser/core/__init__.py → src/cloudai/parser/core/__init__.py
diff --git a/...dai/parser/core/base_multi_file_parser.py → ...dai/parser/core/base_multi_file_parser.py b/...dai/parser/core/base_multi_file_parser.py → ...dai/parser/core/base_multi_file_parser.py
diff --git a/cloudai/parser/core/base_system_parser.py → ...cloudai/parser/core/base_system_parser.py b/cloudai/parser/core/base_system_parser.py → ...cloudai/parser/core/base_system_parser.py
diff --git a/cloudai/parser/core/parser.py → src/cloudai/parser/core/parser.py b/cloudai/parser/core/parser.py → src/cloudai/parser/core/parser.py
diff --git a/cloudai/parser/core/system_parser.py → src/cloudai/parser/core/system_parser.py b/cloudai/parser/core/system_parser.py → src/cloudai/parser/core/system_parser.py
diff --git a/cloudai/parser/core/test_parser.py → src/cloudai/parser/core/test_parser.py b/cloudai/parser/core/test_parser.py → src/cloudai/parser/core/test_parser.py
diff --git a/cloudai/parser/core/test_scenario_parser.py → ...oudai/parser/core/test_scenario_parser.py b/cloudai/parser/core/test_scenario_parser.py → ...oudai/parser/core/test_scenario_parser.py
diff --git a/cloudai/parser/core/test_template_parser.py → ...oudai/parser/core/test_template_parser.py b/cloudai/parser/core/test_template_parser.py → ...oudai/parser/core/test_template_parser.py
diff --git a/cloudai/parser/system_parser/__init__.py → src/cloudai/parser/system_parser/__init__.py b/cloudai/parser/system_parser/__init__.py → src/cloudai/parser/system_parser/__init__.py
diff --git a/...rser/system_parser/slurm_system_parser.py → ...rser/system_parser/slurm_system_parser.py b/...rser/system_parser/slurm_system_parser.py → ...rser/system_parser/slurm_system_parser.py
diff --git a/...system_parser/standalone_system_parser.py → ...system_parser/standalone_system_parser.py b/...system_parser/standalone_system_parser.py → ...system_parser/standalone_system_parser.py
diff --git a/cloudai/report_generator/__init__.py → src/cloudai/report_generator/__init__.py b/cloudai/report_generator/__init__.py → src/cloudai/report_generator/__init__.py
diff --git a/cloudai/report_generator/report_generator.py → ...udai/report_generator/report_generator.py b/cloudai/report_generator/report_generator.py → ...udai/report_generator/report_generator.py
diff --git a/cloudai/report_generator/tool/__init__.py → ...cloudai/report_generator/tool/__init__.py b/cloudai/report_generator/tool/__init__.py → ...cloudai/report_generator/tool/__init__.py
diff --git a/...eport_generator/tool/bokeh_report_tool.py → ...eport_generator/tool/bokeh_report_tool.py b/...eport_generator/tool/bokeh_report_tool.py → ...eport_generator/tool/bokeh_report_tool.py
diff --git a/...t_generator/tool/report_tool_interface.py → ...t_generator/tool/report_tool_interface.py b/...t_generator/tool/report_tool_interface.py → ...t_generator/tool/report_tool_interface.py
diff --git a/...generator/tool/tensorboard_data_reader.py → ...generator/tool/tensorboard_data_reader.py b/...generator/tool/tensorboard_data_reader.py → ...generator/tool/tensorboard_data_reader.py
diff --git a/cloudai/report_generator/util.py → src/cloudai/report_generator/util.py b/cloudai/report_generator/util.py → src/cloudai/report_generator/util.py
diff --git a/cloudai/runner/__init__.py → src/cloudai/runner/__init__.py b/cloudai/runner/__init__.py → src/cloudai/runner/__init__.py
diff --git a/cloudai/runner/core/__init__.py → src/cloudai/runner/core/__init__.py b/cloudai/runner/core/__init__.py → src/cloudai/runner/core/__init__.py
diff --git a/cloudai/runner/core/base_job.py → src/cloudai/runner/core/base_job.py b/cloudai/runner/core/base_job.py → src/cloudai/runner/core/base_job.py
diff --git a/cloudai/runner/core/base_runner.py → src/cloudai/runner/core/base_runner.py b/cloudai/runner/core/base_runner.py → src/cloudai/runner/core/base_runner.py
diff --git a/cloudai/runner/core/runner.py → src/cloudai/runner/core/runner.py b/cloudai/runner/core/runner.py → src/cloudai/runner/core/runner.py
diff --git a/cloudai/runner/slurm/__init__.py → src/cloudai/runner/slurm/__init__.py b/cloudai/runner/slurm/__init__.py → src/cloudai/runner/slurm/__init__.py
diff --git a/cloudai/runner/slurm/slurm_job.py → src/cloudai/runner/slurm/slurm_job.py b/cloudai/runner/slurm/slurm_job.py → src/cloudai/runner/slurm/slurm_job.py
diff --git a/cloudai/runner/slurm/slurm_runner.py → src/cloudai/runner/slurm/slurm_runner.py b/cloudai/runner/slurm/slurm_runner.py → src/cloudai/runner/slurm/slurm_runner.py
diff --git a/cloudai/runner/standalone/__init__.py → src/cloudai/runner/standalone/__init__.py b/cloudai/runner/standalone/__init__.py → src/cloudai/runner/standalone/__init__.py
diff --git a/cloudai/runner/standalone/standalone_job.py → ...oudai/runner/standalone/standalone_job.py b/cloudai/runner/standalone/standalone_job.py → ...oudai/runner/standalone/standalone_job.py
diff --git a/...ai/runner/standalone/standalone_runner.py → ...ai/runner/standalone/standalone_runner.py b/...ai/runner/standalone/standalone_runner.py → ...ai/runner/standalone/standalone_runner.py
diff --git a/cloudai/schema/__init__.py → src/cloudai/schema/__init__.py b/cloudai/schema/__init__.py → src/cloudai/schema/__init__.py
diff --git a/cloudai/schema/core/__init__.py → src/cloudai/schema/core/__init__.py b/cloudai/schema/core/__init__.py → src/cloudai/schema/core/__init__.py
diff --git a/cloudai/schema/core/strategy/__init__.py → src/cloudai/schema/core/strategy/__init__.py b/cloudai/schema/core/strategy/__init__.py → src/cloudai/schema/core/strategy/__init__.py
diff --git a/...ema/core/strategy/command_gen_strategy.py → ...ema/core/strategy/command_gen_strategy.py b/...ema/core/strategy/command_gen_strategy.py → ...ema/core/strategy/command_gen_strategy.py
diff --git a/.../schema/core/strategy/grading_strategy.py → .../schema/core/strategy/grading_strategy.py b/.../schema/core/strategy/grading_strategy.py → .../schema/core/strategy/grading_strategy.py
diff --git a/.../schema/core/strategy/install_strategy.py → .../schema/core/strategy/install_strategy.py b/.../schema/core/strategy/install_strategy.py → .../schema/core/strategy/install_strategy.py
diff --git a/...ore/strategy/job_id_retrieval_strategy.py → ...ore/strategy/job_id_retrieval_strategy.py b/...ore/strategy/job_id_retrieval_strategy.py → ...ore/strategy/job_id_retrieval_strategy.py
diff --git a/...re/strategy/report_generation_strategy.py → ...re/strategy/report_generation_strategy.py b/...re/strategy/report_generation_strategy.py → ...re/strategy/report_generation_strategy.py
diff --git a/...schema/core/strategy/strategy_registry.py → ...schema/core/strategy/strategy_registry.py b/...schema/core/strategy/strategy_registry.py → ...schema/core/strategy/strategy_registry.py
diff --git a/...a/core/strategy/test_template_strategy.py → ...a/core/strategy/test_template_strategy.py b/...a/core/strategy/test_template_strategy.py → ...a/core/strategy/test_template_strategy.py
diff --git a/cloudai/schema/core/system.py → src/cloudai/schema/core/system.py b/cloudai/schema/core/system.py → src/cloudai/schema/core/system.py
diff --git a/cloudai/schema/core/test.py → src/cloudai/schema/core/test.py b/cloudai/schema/core/test.py → src/cloudai/schema/core/test.py
diff --git a/cloudai/schema/core/test_scenario.py → src/cloudai/schema/core/test_scenario.py b/cloudai/schema/core/test_scenario.py → src/cloudai/schema/core/test_scenario.py
diff --git a/cloudai/schema/core/test_template.py → src/cloudai/schema/core/test_template.py b/cloudai/schema/core/test_template.py → src/cloudai/schema/core/test_template.py
diff --git a/cloudai/schema/system/__init__.py → src/cloudai/schema/system/__init__.py b/cloudai/schema/system/__init__.py → src/cloudai/schema/system/__init__.py
diff --git a/cloudai/schema/system/slurm/__init__.py → src/cloudai/schema/system/slurm/__init__.py b/cloudai/schema/system/slurm/__init__.py → src/cloudai/schema/system/slurm/__init__.py
diff --git a/cloudai/schema/system/slurm/slurm_node.py → ...cloudai/schema/system/slurm/slurm_node.py b/cloudai/schema/system/slurm/slurm_node.py → ...cloudai/schema/system/slurm/slurm_node.py
diff --git a/cloudai/schema/system/slurm/slurm_system.py → ...oudai/schema/system/slurm/slurm_system.py b/cloudai/schema/system/slurm/slurm_system.py → ...oudai/schema/system/slurm/slurm_system.py
diff --git a/.../schema/system/slurm/strategy/__init__.py → .../schema/system/slurm/strategy/__init__.py b/.../schema/system/slurm/strategy/__init__.py → .../schema/system/slurm/strategy/__init__.py
diff --git a/...rm/strategy/slurm_command_gen_strategy.py → ...rm/strategy/slurm_command_gen_strategy.py b/...rm/strategy/slurm_command_gen_strategy.py → ...rm/strategy/slurm_command_gen_strategy.py
diff --git a/.../slurm/strategy/slurm_install_strategy.py → .../slurm/strategy/slurm_install_strategy.py b/.../slurm/strategy/slurm_install_strategy.py → .../slurm/strategy/slurm_install_strategy.py
diff --git a/cloudai/schema/system/standalone_system.py → ...loudai/schema/system/standalone_system.py b/cloudai/schema/system/standalone_system.py → ...loudai/schema/system/standalone_system.py
diff --git a/cloudai/schema/test_template/__init__.py → src/cloudai/schema/test_template/__init__.py b/cloudai/schema/test_template/__init__.py → src/cloudai/schema/test_template/__init__.py
diff --git a/...a/test_template/chakra_replay/__init__.py → ...a/test_template/chakra_replay/__init__.py b/...a/test_template/chakra_replay/__init__.py → ...a/test_template/chakra_replay/__init__.py
diff --git a/...emplate/chakra_replay/grading_strategy.py → ...emplate/chakra_replay/grading_strategy.py b/...emplate/chakra_replay/grading_strategy.py → ...emplate/chakra_replay/grading_strategy.py
diff --git a/...akra_replay/report_generation_strategy.py → ...akra_replay/report_generation_strategy.py b/...akra_replay/report_generation_strategy.py → ...akra_replay/report_generation_strategy.py
diff --git a/...akra_replay/slurm_command_gen_strategy.py → ...akra_replay/slurm_command_gen_strategy.py b/...akra_replay/slurm_command_gen_strategy.py → ...akra_replay/slurm_command_gen_strategy.py
diff --git a/...e/chakra_replay/slurm_install_strategy.py → ...e/chakra_replay/slurm_install_strategy.py b/...e/chakra_replay/slurm_install_strategy.py → ...e/chakra_replay/slurm_install_strategy.py
diff --git a/...a/test_template/chakra_replay/template.py → ...a/test_template/chakra_replay/template.py b/...a/test_template/chakra_replay/template.py → ...a/test_template/chakra_replay/template.py
diff --git a/...i/schema/test_template/common/__init__.py → ...i/schema/test_template/common/__init__.py b/...i/schema/test_template/common/__init__.py → ...i/schema/test_template/common/__init__.py
diff --git a/...common/slurm_job_id_retrieval_strategy.py → ...common/slurm_job_id_retrieval_strategy.py b/...common/slurm_job_id_retrieval_strategy.py → ...common/slurm_job_id_retrieval_strategy.py
diff --git a/...n/standalone_job_id_retrieval_strategy.py → ...n/standalone_job_id_retrieval_strategy.py b/...n/standalone_job_id_retrieval_strategy.py → ...n/standalone_job_id_retrieval_strategy.py
diff --git a/...ema/test_template/jax_toolbox/__init__.py → ...ema/test_template/jax_toolbox/__init__.py b/...ema/test_template/jax_toolbox/__init__.py → ...ema/test_template/jax_toolbox/__init__.py
diff --git a/..._template/jax_toolbox/grading_strategy.py → ..._template/jax_toolbox/grading_strategy.py b/..._template/jax_toolbox/grading_strategy.py → ..._template/jax_toolbox/grading_strategy.py
diff --git a/...jax_toolbox/report_generation_strategy.py → ...jax_toolbox/report_generation_strategy.py b/...jax_toolbox/report_generation_strategy.py → ...jax_toolbox/report_generation_strategy.py
@@ -49,38 +49,41 @@ def generate_report(self, directory_path: str, sol: Optional[float] = None) -> N
                 "max": max(times),
                 "average": sum(times) / len(times),
                 "median": statistics.median(times),
+                "stdev": statistics.stdev(times) if len(times) > 1 else 0,
             }
             self._write_report(directory_path, stats)
 
     def _extract_times(self, directory_path: str) -> List[float]:
         """
         Extracts elapsed times from all error files matching the pattern in the directory,
-        excluding the first time value recorded in each file.
+        starting after the 10th occurrence of a line matching the "[PAX STATUS]: train_step() took" pattern.
 
         Args:
             directory_path (str): Directory containing error files.
 
         Returns:
-            List[float]: List of extracted times as floats, after excluding the first time from each file.
+            List[float]: List of extracted times as floats, starting from the epoch after the 10th occurrence.
         """
         times = []
         error_files = glob.glob(os.path.join(directory_path, "error-*.txt"))
         for stderr_path in error_files:
             file_times = []
+            epoch_count = 0
             with open(stderr_path, "r") as file:
                 for line in file:
-                    if "Elapsed time for" in line and "run" in line and ":434" in line:
-                        parts = line.split()
-                        time_str = parts[parts.index("<run>:") + 1]
-                        try:
-                            time_value = float(time_str.split("seconds")[0])
-                            file_times.append(time_value)
-                        except ValueError:
-                            continue  # Skip any lines where conversion fails
+                    if "[PAX STATUS]: train_step() took" in line:
+                        epoch_count += 1
+                        if epoch_count > 10:  # Start recording times after 10 epochs
+                            # Extract the time value right after the keyword
+                            parts = line.split("took")
+                            time_str = parts[1].strip().split("seconds")[0].strip()
+                            try:
+                                time_value = float(time_str)
+                                file_times.append(time_value)
+                            except ValueError:
+                                continue  # Skip any lines where conversion fails
 
-            # Exclude the first time record from each file if it exists
-            if file_times:
-                times.extend(file_times[1:])
+            times.extend(file_times)
 
         return times
 

diff --git a/...jax_toolbox/slurm_command_gen_strategy.py → ...jax_toolbox/slurm_command_gen_strategy.py b/...jax_toolbox/slurm_command_gen_strategy.py → ...jax_toolbox/slurm_command_gen_strategy.py
diff --git a/...ate/jax_toolbox/slurm_install_strategy.py → ...ate/jax_toolbox/slurm_install_strategy.py b/...ate/jax_toolbox/slurm_install_strategy.py → ...ate/jax_toolbox/slurm_install_strategy.py
diff --git a/...ema/test_template/jax_toolbox/template.py → ...ema/test_template/jax_toolbox/template.py b/...ema/test_template/jax_toolbox/template.py → ...ema/test_template/jax_toolbox/template.py
diff --git a/...hema/test_template/nccl_miner/__init__.py → ...hema/test_template/nccl_miner/__init__.py b/...hema/test_template/nccl_miner/__init__.py → ...hema/test_template/nccl_miner/__init__.py
diff --git a/...t_template/nccl_miner/grading_strategy.py → ...t_template/nccl_miner/grading_strategy.py b/...t_template/nccl_miner/grading_strategy.py → ...t_template/nccl_miner/grading_strategy.py
diff --git a/.../nccl_miner/report_generation_strategy.py → .../nccl_miner/report_generation_strategy.py b/.../nccl_miner/report_generation_strategy.py → .../nccl_miner/report_generation_strategy.py
diff --git a/.../nccl_miner/slurm_command_gen_strategy.py → .../nccl_miner/slurm_command_gen_strategy.py b/.../nccl_miner/slurm_command_gen_strategy.py → .../nccl_miner/slurm_command_gen_strategy.py
diff --git a/...late/nccl_miner/slurm_install_strategy.py → ...late/nccl_miner/slurm_install_strategy.py b/...late/nccl_miner/slurm_install_strategy.py → ...late/nccl_miner/slurm_install_strategy.py
diff --git a/...hema/test_template/nccl_miner/template.py → ...hema/test_template/nccl_miner/template.py b/...hema/test_template/nccl_miner/template.py → ...hema/test_template/nccl_miner/template.py
diff --git a/...chema/test_template/nccl_test/__init__.py → ...chema/test_template/nccl_test/__init__.py b/...chema/test_template/nccl_test/__init__.py → ...chema/test_template/nccl_test/__init__.py
diff --git a/...st_template/nccl_test/grading_strategy.py → ...st_template/nccl_test/grading_strategy.py b/...st_template/nccl_test/grading_strategy.py → ...st_template/nccl_test/grading_strategy.py
diff --git a/...e/nccl_test/report_generation_strategy.py → ...e/nccl_test/report_generation_strategy.py b/...e/nccl_test/report_generation_strategy.py → ...e/nccl_test/report_generation_strategy.py
diff --git a/...e/nccl_test/slurm_command_gen_strategy.py → ...e/nccl_test/slurm_command_gen_strategy.py b/...e/nccl_test/slurm_command_gen_strategy.py → ...e/nccl_test/slurm_command_gen_strategy.py
diff --git a/...plate/nccl_test/slurm_install_strategy.py → ...plate/nccl_test/slurm_install_strategy.py b/...plate/nccl_test/slurm_install_strategy.py → ...plate/nccl_test/slurm_install_strategy.py
diff --git a/...chema/test_template/nccl_test/template.py → ...chema/test_template/nccl_test/template.py b/...chema/test_template/nccl_test/template.py → ...chema/test_template/nccl_test/template.py
diff --git a/...a/test_template/nemo_launcher/__init__.py → ...a/test_template/nemo_launcher/__init__.py b/...a/test_template/nemo_launcher/__init__.py → ...a/test_template/nemo_launcher/__init__.py
diff --git a/...emplate/nemo_launcher/grading_strategy.py → ...emplate/nemo_launcher/grading_strategy.py b/...emplate/nemo_launcher/grading_strategy.py → ...emplate/nemo_launcher/grading_strategy.py
diff --git a/...mo_launcher/report_generation_strategy.py → ...mo_launcher/report_generation_strategy.py b/...mo_launcher/report_generation_strategy.py → ...mo_launcher/report_generation_strategy.py
diff --git a/...mo_launcher/slurm_command_gen_strategy.py → ...mo_launcher/slurm_command_gen_strategy.py b/...mo_launcher/slurm_command_gen_strategy.py → ...mo_launcher/slurm_command_gen_strategy.py
diff --git a/...e/nemo_launcher/slurm_install_strategy.py → ...e/nemo_launcher/slurm_install_strategy.py b/...e/nemo_launcher/slurm_install_strategy.py → ...e/nemo_launcher/slurm_install_strategy.py
diff --git a/...uncher/slurm_job_id_retrieval_strategy.py → ...uncher/slurm_job_id_retrieval_strategy.py b/...uncher/slurm_job_id_retrieval_strategy.py → ...uncher/slurm_job_id_retrieval_strategy.py
diff --git a/...a/test_template/nemo_launcher/template.py → ...a/test_template/nemo_launcher/template.py b/...a/test_template/nemo_launcher/template.py → ...a/test_template/nemo_launcher/template.py
diff --git a/...ai/schema/test_template/sleep/__init__.py → ...ai/schema/test_template/sleep/__init__.py b/...ai/schema/test_template/sleep/__init__.py → ...ai/schema/test_template/sleep/__init__.py
diff --git a/...a/test_template/sleep/grading_strategy.py → ...a/test_template/sleep/grading_strategy.py b/...a/test_template/sleep/grading_strategy.py → ...a/test_template/sleep/grading_strategy.py
diff --git a/...plate/sleep/report_generation_strategy.py → ...plate/sleep/report_generation_strategy.py b/...plate/sleep/report_generation_strategy.py → ...plate/sleep/report_generation_strategy.py
diff --git a/.../sleep/standalone_command_gen_strategy.py → .../sleep/standalone_command_gen_strategy.py b/.../sleep/standalone_command_gen_strategy.py → .../sleep/standalone_command_gen_strategy.py
diff --git a/...late/sleep/standalone_install_strategy.py → ...late/sleep/standalone_install_strategy.py b/...late/sleep/standalone_install_strategy.py → ...late/sleep/standalone_install_strategy.py
diff --git a/...ai/schema/test_template/sleep/template.py → ...ai/schema/test_template/sleep/template.py b/...ai/schema/test_template/sleep/template.py → ...ai/schema/test_template/sleep/template.py
diff --git a/...schema/test_template/ucc_test/__init__.py → ...schema/test_template/ucc_test/__init__.py b/...schema/test_template/ucc_test/__init__.py → ...schema/test_template/ucc_test/__init__.py
diff --git a/...est_template/ucc_test/grading_strategy.py → ...est_template/ucc_test/grading_strategy.py b/...est_template/ucc_test/grading_strategy.py → ...est_template/ucc_test/grading_strategy.py
diff --git a/...te/ucc_test/report_generation_strategy.py → ...te/ucc_test/report_generation_strategy.py b/...te/ucc_test/report_generation_strategy.py → ...te/ucc_test/report_generation_strategy.py
diff --git a/...te/ucc_test/slurm_command_gen_strategy.py → ...te/ucc_test/slurm_command_gen_strategy.py b/...te/ucc_test/slurm_command_gen_strategy.py → ...te/ucc_test/slurm_command_gen_strategy.py
diff --git a/...mplate/ucc_test/slurm_install_strategy.py → ...mplate/ucc_test/slurm_install_strategy.py b/...mplate/ucc_test/slurm_install_strategy.py → ...mplate/ucc_test/slurm_install_strategy.py
diff --git a/...schema/test_template/ucc_test/template.py → ...schema/test_template/ucc_test/template.py b/...schema/test_template/ucc_test/template.py → ...schema/test_template/ucc_test/template.py
diff --git a/cloudai/system_object_updater/__init__.py → ...cloudai/system_object_updater/__init__.py b/cloudai/system_object_updater/__init__.py → ...cloudai/system_object_updater/__init__.py
diff --git a/...ect_updater/base_system_object_updater.py → ...ect_updater/base_system_object_updater.py b/...ect_updater/base_system_object_updater.py → ...ect_updater/base_system_object_updater.py
diff --git a/...ct_updater/slurm_system_object_updater.py → ...ct_updater/slurm_system_object_updater.py b/...ct_updater/slurm_system_object_updater.py → ...ct_updater/slurm_system_object_updater.py
diff --git a/...dater/standalone_system_object_updater.py → ...dater/standalone_system_object_updater.py b/...dater/standalone_system_object_updater.py → ...dater/standalone_system_object_updater.py
diff --git a/...m_object_updater/system_object_updater.py → ...m_object_updater/system_object_updater.py b/...m_object_updater/system_object_updater.py → ...m_object_updater/system_object_updater.py
diff --git a/cloudai/util/__init__.py → src/cloudai/util/__init__.py b/cloudai/util/__init__.py → src/cloudai/util/__init__.py
diff --git a/cloudai/util/command_shell.py → src/cloudai/util/command_shell.py b/cloudai/util/command_shell.py → src/cloudai/util/command_shell.py