[SDK] Create Unify Training Client (kubeflow#1719)

* Remove legacy ksonnet tests * [SDK] Create Unify Training Client * Modify E2E tests * Rename Training to Operator version * Add missing exception * Add API server timeout parameter * Add delete options * Fix import for V1JobCondition * Fix container name in e2e tests * Fix mxnet container * Import all Kubernetes models * Update SDK Examples * Verify other SDK APIs in e2e tests * Add replica types to const * Use logging in e2e tests * Fix logging for status * Use const for job types
opendatahub-io · Jan 17, 2023 · b87c6fa · b87c6fa
1 parent 6eaf3a3
commit b87c6fa
Show file tree

Hide file tree

Showing 152 changed files with 2,523 additions and 396,440 deletions.
diff --git a/.gitattributes b/.gitattributes
diff --git a/docs/development/developer_guide.md b/docs/development/developer_guide.md
@@ -96,10 +96,16 @@ This command will re-generate the api and model files together with the document
 The following files/folders in `sdk/python` are auto-generated and should not be modified directly:
 
 ```
-docs
-kubeflow/training/models
-kubeflow/training/*.py
-test/*.py
+sdk/python/docs
+sdk/python/kubeflow/training/models
+sdk/python/kubeflow/training/*.py
+sdk/python/test/*.py
+```
+
+The Training Operator client and public APIs are located here:
+
+```
+sdk/python/kubeflow/training/api
 ```
 
 ## Code Style

diff --git a/docs/testing/e2e_debugging.md b/docs/testing/e2e_debugging.md
@@ -1,7 +1,9 @@
 # How to debug an E2E test for Kubeflow Training Operator
 
-[E2E Testing](./e2e_testing.md) gives an overview of writing e2e tests. This guidance concentrates more on the e2e failure debugging.
+TODO (andreyvelich): This doc is outdated. Currently, E2Es are located here:
+[`sdk/python/test/e2e`](../../sdk/python/test/e2e)
 
+[E2E Testing](./e2e_testing.md) gives an overview of writing e2e tests. This guidance concentrates more on the e2e failure debugging.
 
 ## Prerequsite
 
@@ -16,7 +18,8 @@ wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linu
 tar -xvzf ks_0.13.1_linux_amd64.tar.gz
 sudo cp ks_0.13.1_linux_amd64/ks /usr/local/bin/ks-13
 ```
-> We would like to deprecate `ksonnet` but may takes some time. Feel free to pick up [the issue](https://github.com/kubeflow/training-operator/issues/1468) if you are interested in it. 
+
+> We would like to deprecate `ksonnet` but may takes some time. Feel free to pick up [the issue](https://github.com/kubeflow/training-operator/issues/1468) if you are interested in it.
 > If your platform is darwin or windows, feel free to download binaries in [ksonnet v0.13.1](https://github.com/ksonnet/ksonnet/releases/tag/v0.13.1)
 
 4. Deploy HEAD training operator version in your environment
@@ -33,23 +36,24 @@ kubectl set image deployment.v1.apps/training-operator training-operator=kubeflo
 ## Run E2E Tests locally
 
 1. Set environments
+
 ```
 export KUBEFLOW_PATH=$GOPATH/src/github.com/kubeflow
 export KUBEFLOW_TRAINING_REPO=$KUBEFLOW_PATH/training-operator
 export KUBEFLOW_TESTING_REPO=$KUBEFLOW_PATH/testing
 export PYTHONPATH=$KUBEFLOW_TRAINING_REPO:$KUBEFLOW_TRAINING_REPO/py:$KUBEFLOW_TESTING_REPO/py:$KUBEFLOW_TRAINING_REPO/sdk/python
 ```
 
-
 2. Install python dependencies
+
 ```
 pip3 install -r $KUBEFLOW_TESTING_REPO/py/kubeflow/testing/requirements.txt
 ```
 
 > Note: if you have meet problem install requirement, you may need to `sudo apt-get install libffi-dev`. Feel free to share error logs if you don't know how to handle it.
 
-
 3. Run Tests
+
 ```
 # enter the ksonnet app to run tests
 cd $KUBEFLOW_TRAINING_REPO/test/workflows
@@ -60,10 +64,9 @@ python3 -m kubeflow.tf_operator.cleanpod_policy_tests --app_dir=$KUBEFLOW_TRAINI
 python3 -m kubeflow.tf_operator.simple_tfjob_tests  --app_dir=$KUBEFLOW_TRAINING_REPO/test/workflows --params=name=simple-tfjob-tests-v1,namespace=kubeflow --tfjob_version=v1 --num_trials=2 --artifacts_path=/tmp/output/artifact
 ```
 
-
 ## Check results
 
-You can either check logs or check results in `/tmp/output/artifact`. 
+You can either check logs or check results in `/tmp/output/artifact`.
 
 ```
 $ ls -al /tmp/output/artifact
@@ -75,7 +78,7 @@ $ cat /tmp/output/artifact/junit_test_simple_tfjob_cpu.xml
 
 ## Common issues
 
-1. ksonnet is not installed 
+1. ksonnet is not installed
 
 ```
 ERROR|2021-11-16T03:06:06|/home/jiaxin.shan/go/src/github.com/kubeflow/training-operator/py/kubeflow/tf_operator/test_runner.py|57| There was a problem running the job; Exception [Errno 2] No such file or directory: 'ks-13': 'ks-13'
@@ -97,7 +100,6 @@ FileNotFoundError: [Errno 2] No such file or directory: 'ks-13': 'ks-13'
 
 Please check `Prerequsite` section to install ksonnet.
 
-
 2. TypeError: load() missing 1 required positional argument: 'Loader'
 
 ```

diff --git a/docs/testing/e2e_testing.md b/docs/testing/e2e_testing.md
@@ -1,5 +1,8 @@
 # How to Write an E2E Test for Kubeflow Training Operator
 
+TODO (andreyvelich): This doc is outdated. Currently, E2Es are located here:
+[`sdk/python/test/e2e`](../../sdk/python/test/e2e)
+
 The E2E tests for Kubeflow Training operator are implemented as Argo workflows. For more background and details
 about Argo (not required for understanding the rest of this document), please take a look at
 [this link](https://github.com/kubeflow/testing/blob/master/README.md).

diff --git a/hack/python-sdk/post_gen.py b/hack/python-sdk/post_gen.py
@@ -46,21 +46,30 @@ def fix_test_files() -> None:
     for test_file in test_files:
         print(f"Precessing file {test_file}")
         if test_file.endswith(".py"):
-            with fileinput.FileInput(os.path.join(test_folder_dir, test_file), inplace=True) as file:
+            with fileinput.FileInput(
+                os.path.join(test_folder_dir, test_file), inplace=True
+            ) as file:
                 for line in file:
-                    print(_apply_regex(line), end='')
+                    print(_apply_regex(line), end="")
 
 
 def add_imports() -> None:
-    with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as init_file:
-        init_file.write("from kubeflow.training.api.tf_job_client import TFJobClient\n")
-        init_file.write("from kubeflow.training.api.py_torch_job_client import PyTorchJobClient\n")
-        init_file.write("from kubeflow.training.api.xgboost_job_client import XGBoostJobClient\n")
-        init_file.write("from kubeflow.training.api.mpi_job_client import MPIJobClient\n")
-        init_file.write("from kubeflow.training.api.mx_job_client import MXJobClient\n")
-        init_file.write("from kubeflow.training.api.paddle_job_client import PaddleJobClient\n")
-    with open(os.path.join(sdk_dir, "kubeflow/__init__.py"), "a") as init_file:
-        init_file.write("__path__ = __import__('pkgutil').extend_path(__path__, __name__)")
+    with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as f:
+        f.write("from kubeflow.training.api.training_client import TrainingClient\n")
+    with open(os.path.join(sdk_dir, "kubeflow/__init__.py"), "a") as f:
+        f.write("__path__ = __import__('pkgutil').extend_path(__path__, __name__)")
+
+    # Add Kubernetes models to proper deserialization of Training models.
+    with open(os.path.join(sdk_dir, "kubeflow/training/models/__init__.py"), "r") as f:
+        new_lines = []
+        for line in f.readlines():
+            new_lines.append(line)
+            if line.startswith("from __future__ import absolute_import"):
+                new_lines.append("\n")
+                new_lines.append("# Import Kubernetes models.\n")
+                new_lines.append("from kubernetes.client import *\n")
+    with open(os.path.join(sdk_dir, "kubeflow/training/models/__init__.py"), "w") as f:
+        f.writelines(new_lines)
 
 
 def _apply_regex(input_str: str) -> str:
@@ -69,5 +78,5 @@ def _apply_regex(input_str: str) -> str:
     return input_str
 
 
-if __name__ == '__main__':
+if __name__ == "__main__":
     main()
diff --git a/hack/python-sdk/swagger_config.json b/hack/python-sdk/swagger_config.json
@@ -1,13 +1,8 @@
 {
-    "packageName" : "kubeflow.training",
-    "projectName" : "training",
-    "packageVersion": "1.5.0",
-    "importMappings": {
-        "V1Container": "from kubernetes.client import V1Container",
-        "V1ObjectMeta": "from kubernetes.client import V1ObjectMeta",
-        "V1ListMeta": "from kubernetes.client import V1ListMeta",
-        "V1ResourceRequirements": "from kubernetes.client import V1ResourceRequirements",
-        "V1JobCondition": "from kubernetes.client import V1JobCondition",
-        "V1PodTemplateSpec": "from kubernetes.client import V1PodTemplateSpec"
-    }
+  "packageName": "kubeflow.training",
+  "projectName": "training",
+  "packageVersion": "1.5.0",
+  "typeMappings": {
+    "V1Time": "datetime"
+  }
 }
diff --git a/py/kubeflow/__init__.py b/py/kubeflow/__init__.py
diff --git a/py/kubeflow/tf_operator/Pipfile b/py/kubeflow/tf_operator/Pipfile