Update to better docs and add metric threshold

allegroai · Nov 21, 2022 · ab5da50 · ab5da50
1 parent ae43702
commit ab5da50
Show file tree

Hide file tree

Showing 4 changed files with 40 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -1,16 +1,25 @@
-# GitHub Action For Comparing Model Performance Between Current PR and Main Branch
+# GitHub Action For Detecting Model Degradation
 
 ![tags screenshot](images/screenshot.png)
 
-Search ClearML for a task corresponding to the current PR and automatically compare its performance to the previous best task.
+The goal of this Github action is to check for possible model degradation and prevent PRs from being merged if the model is performing worse than before given the changes to the codebase. It is designed to be run on an open PR as a quality check, to ensure the performance consistency of the main branch. The workflow is detailed in the diagram below.
 
-The action will identify a ClearML task as "corresponding" to the current PR if:
+![Workflow Diagram](images/detect_model_degradation_diagram.excalidraw.png)
+
+The propsed workflow is as follows.
+
+The repository contains the model training code and a data scientist or ML engineer creates a new branch to build a new feature. We want to protect the main branch against model degradation so we add this action as a requirement to merge a PR. While developing the feature, we expect the code to be run many times, each time will be captured by the ClearML experiment manager, but not every run will correspond to a clean commit ID. During development one might run the code with uncommitted changes that will be captured by ClearML but not by git.
+
+When the PR is opened or updated, the first thing we want to do is search for a ClearML task that was run with the exact code from the latest commit in the PR. We can do this by querying ClearML to make sure that:
 - The commit hash captured in the task is equal to the commit hash of the current PR
 - There are NO uncommitted changes logged on the ClearML task
 - The ClearML task was successful
 
-The previous best task in this case is defined as the latest task in ClearML that has the required tag.
-Another way to do this could be to do a similar lookup as above but for the last commit-id of the main branch.
+If all three of these requirements are met, it means one can assume that the code in the commit has successfully run on a developer machine at least once. If this is not the case, the pipeline will fail, we do not want to merge code that has not been proven to have run successfully before. If this is the case, we go to the next stage.
+
+Now we can get the required performance metrics from this new task and compare them to the previous best task. We can search for the previous best task by looking in ClearML for the task that has the "best" tag. The name of this tag can be customized. The previous best tag should have the commit ID of the last commit in the main branch and no uncommitted changes, much like the requirements for task selection from above.
+
+The relevant performance metric is extracted from both the new task and this previous best one and they are directly compared. If their difference is within the given threshold or the new one is better, the "best" tag will be moved from the previous best task to the new task, so that this cycle can continue.
 
 ## Example usage
 
@@ -22,21 +31,22 @@ on:
     types: [ assigned, opened, edited, reopened, synchronize ]
 
 jobs:
-  compare-models:
+  detect-model-degradation:
       runs-on: ubuntu-20.04
       steps:
         - name: Compare models
-          uses: thepycoder/clearml-actions-compare-models@main
+          uses: thepycoder/clearml-actions-detect-model-degradation@main
           with:
+            CLEARML_PROJECT: 'my_project' # CHANGE ME
+            CLEARML_TASK_NAME: 'my_task' # CHANGE ME
+            CLEARML_SCALAR_TITLE: 'Performance Metrics' # CHANGE ME
+            CLEARML_SCALAR_SERIES: 'mAP' # CHANGE ME
+            CLEARML_SCALAR_MIN_MAX: 'MAX' # CHANGE ME
+            CLEARML_BEST_TAGNAME: 'BESTEST MODEL' # CHANGE ME (OR NOT ^^)
+            CLEARML_SCALAR_THRESHOLD: 3 # CHANGE ME
             CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
             CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
             CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
-            CLEARML_PROJECT: 'my_project'
-            CLEARML_TASK_NAME: 'my_task'
-            CLEARML_SCALAR_TITLE: 'Performance Metrics'
-            CLEARML_SCALAR_SERIES: 'mAP'
-            CLEARML_SCALAR_MIN_MAX: 'MAX'
-            CLEARML_BEST_TAGNAME: 'GOODEST BOI'
           env:
             COMMIT_ID: ${{ github.event.pull_request.head.sha }}
 ```
@@ -52,3 +62,4 @@ jobs:
 7. `CLEARML_SCALAR_SERIES`: Which scalar to use for comparison. Series to use within plot given by title.
 8. `CLEARML_SCALAR_MIN_MAX`: Whether smaller is better (MIN) or larger is better (MAX). (default: "MAX")
 9. `CLEARML_BEST_TAGNAME`: The name of tag to be given to the best task. Every task that is checked and is equal or better than the previous best will get this tag. (default: "Best Performance")
+10. `CLEARML_SCALAR_THRESHOLD`: The threshold for the difference between the previous best and the current commit. This should be a percentage of the previous best. E.g. CLEARML_SCALAR_THRESHOLD=3 -> if previous best PR is metric value 100 and CLEARML_SCALAR_MIN_MAX is max, then current PR can have a minimum metric value of 97 while still passing. (default: 0)
diff --git a/action.yml b/action.yml
@@ -29,6 +29,9 @@ inputs:
   CLEARML_BEST_TAGNAME:
     description: 'The name of tag to be given to the best task. Every task that is checked and is equal or better than the previous best will get this tag. (default: "Best Performance")'
     default: 'Best Performance'
+  CLEARML_SCALAR_THRESHOLD:
+    description: 'The threshold for the difference between the previous best and the current commit. This should be a percentage of the previous best. E.g. CLEARML_SCALAR_THRESHOLD=3 -> if previous best PR is metric value 100 and CLEARML_SCALAR_MIN_MAX is max, then current PR can have a minimum metric value of 97 while still passing. (default: 0)'
+    default: '0'
 runs:
   using: 'composite'
   steps:
@@ -52,4 +55,5 @@ runs:
         CLEARML_SCALAR_SERIES: ${{ inputs.CLEARML_SCALAR_SERIES }}
         CLEARML_BEST_TAGNAME: ${{ inputs.CLEARML_BEST_TAGNAME }}
         CLEARML_SCALAR_MIN_MAX: ${{ inputs.CLEARML_SCALAR_MIN_MAX }}
+        CLEARML_SCALAR_THRESHOLD: ${{ inputs.CLEARML_SCALAR_THRESHOLD }}
 
diff --git a/compare_models.py b/compare_models.py
@@ -44,30 +44,36 @@ def compare_and_tag_task(commit_hash):
         tags=[os.getenv('CLEARML_BEST_TAGNAME')]
     )
     if best_task:
-        best_metric = max(
+        best_metric = (
             best_task.get_reported_scalars()
             .get(os.getenv('CLEARML_SCALAR_TITLE'))
             .get(os.getenv('CLEARML_SCALAR_SERIES')).get('y')
         )
-        current_metric = max(
+        current_metric = (
             current_task.get_reported_scalars()
             .get(os.getenv('CLEARML_SCALAR_TITLE'))
             .get(os.getenv('CLEARML_SCALAR_SERIES')).get('y')
         )
         print(f"Best metric in the system is: {best_metric} and current metric is {current_metric}")
         if os.getenv('CLEARML_SCALAR_MIN_MAX') == 'MIN':
-            flag = current_metric <= best_metric
+            flag = min(current_metric) <= min(best_metric)*(1+float(os.getenv('CLEARML_SCALAR_THRESHOLD'))/100)
         elif os.getenv('CLEARML_SCALAR_MIN_MAX') == 'MAX':
-            flag = current_metric >= best_metric
+            flag = max(current_metric) >= max(best_metric)*(1-float(os.getenv('CLEARML_SCALAR_THRESHOLD'))/100)
         else:
             raise ValueError(f"Cannot parse value of CLEARML_SCALAR_MIN_MAX: {os.getenv('CLEARML_SCALAR_MIN_MAX')}"
                              " Should be 'MIN' or 'MAX'")
+        print(f"Previous best: {best_metric}")
+        print(f"Current task: {current_metric}")
         if flag:
-            print("This means current metric is better or equal! Tagging as such.")
+            print("Congratulations, you are now the best performing task :)")
+            # Remove the current best task
+            best_task.set_tags([])
+            # Set the current task as the new best task
             current_task.add_tags([os.getenv('CLEARML_BEST_TAGNAME')])
         else:
-            print("This means current metric is worse! Not tagging.")
+            print("Current metric is worse! Not tagging.")
     else:
+        # Set the current task as the new best task
         current_task.add_tags([os.getenv('CLEARML_BEST_TAGNAME')])
 
 

diff --git a/images/detect_model_degradation_diagram.excalidraw.png b/images/detect_model_degradation_diagram.excalidraw.png