Skip to content

Commit

Permalink
[SDK] Create Unify Training Client (kubeflow#1719)
Browse files Browse the repository at this point in the history
* Remove legacy ksonnet tests

* [SDK] Create Unify Training Client

* Modify E2E tests

* Rename Training to Operator version

* Add missing exception

* Add API server timeout parameter

* Add delete options

* Fix import for V1JobCondition

* Fix container name in e2e tests

* Fix mxnet container

* Import all Kubernetes models

* Update SDK Examples

* Verify other SDK APIs in e2e tests

* Add replica types to const

* Use logging in e2e tests

* Fix logging for status

* Use const for job types
  • Loading branch information
andreyvelich authored Jan 17, 2023
1 parent 6eaf3a3 commit b87c6fa
Show file tree
Hide file tree
Showing 152 changed files with 2,523 additions and 396,440 deletions.
1 change: 0 additions & 1 deletion .gitattributes

This file was deleted.

14 changes: 10 additions & 4 deletions docs/development/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,16 @@ This command will re-generate the api and model files together with the document
The following files/folders in `sdk/python` are auto-generated and should not be modified directly:

```
docs
kubeflow/training/models
kubeflow/training/*.py
test/*.py
sdk/python/docs
sdk/python/kubeflow/training/models
sdk/python/kubeflow/training/*.py
sdk/python/test/*.py
```

The Training Operator client and public APIs are located here:

```
sdk/python/kubeflow/training/api
```

## Code Style
Expand Down
18 changes: 10 additions & 8 deletions docs/testing/e2e_debugging.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# How to debug an E2E test for Kubeflow Training Operator

[E2E Testing](./e2e_testing.md) gives an overview of writing e2e tests. This guidance concentrates more on the e2e failure debugging.
TODO (andreyvelich): This doc is outdated. Currently, E2Es are located here:
[`sdk/python/test/e2e`](../../sdk/python/test/e2e)

[E2E Testing](./e2e_testing.md) gives an overview of writing e2e tests. This guidance concentrates more on the e2e failure debugging.

## Prerequsite

Expand All @@ -16,7 +18,8 @@ wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linu
tar -xvzf ks_0.13.1_linux_amd64.tar.gz
sudo cp ks_0.13.1_linux_amd64/ks /usr/local/bin/ks-13
```
> We would like to deprecate `ksonnet` but may takes some time. Feel free to pick up [the issue](https://github.com/kubeflow/training-operator/issues/1468) if you are interested in it.

> We would like to deprecate `ksonnet` but may takes some time. Feel free to pick up [the issue](https://github.com/kubeflow/training-operator/issues/1468) if you are interested in it.
> If your platform is darwin or windows, feel free to download binaries in [ksonnet v0.13.1](https://github.com/ksonnet/ksonnet/releases/tag/v0.13.1)
4. Deploy HEAD training operator version in your environment
Expand All @@ -33,23 +36,24 @@ kubectl set image deployment.v1.apps/training-operator training-operator=kubeflo
## Run E2E Tests locally

1. Set environments

```
export KUBEFLOW_PATH=$GOPATH/src/github.com/kubeflow
export KUBEFLOW_TRAINING_REPO=$KUBEFLOW_PATH/training-operator
export KUBEFLOW_TESTING_REPO=$KUBEFLOW_PATH/testing
export PYTHONPATH=$KUBEFLOW_TRAINING_REPO:$KUBEFLOW_TRAINING_REPO/py:$KUBEFLOW_TESTING_REPO/py:$KUBEFLOW_TRAINING_REPO/sdk/python
```


2. Install python dependencies

```
pip3 install -r $KUBEFLOW_TESTING_REPO/py/kubeflow/testing/requirements.txt
```

> Note: if you have meet problem install requirement, you may need to `sudo apt-get install libffi-dev`. Feel free to share error logs if you don't know how to handle it.

3. Run Tests

```
# enter the ksonnet app to run tests
cd $KUBEFLOW_TRAINING_REPO/test/workflows
Expand All @@ -60,10 +64,9 @@ python3 -m kubeflow.tf_operator.cleanpod_policy_tests --app_dir=$KUBEFLOW_TRAINI
python3 -m kubeflow.tf_operator.simple_tfjob_tests --app_dir=$KUBEFLOW_TRAINING_REPO/test/workflows --params=name=simple-tfjob-tests-v1,namespace=kubeflow --tfjob_version=v1 --num_trials=2 --artifacts_path=/tmp/output/artifact
```


## Check results

You can either check logs or check results in `/tmp/output/artifact`.
You can either check logs or check results in `/tmp/output/artifact`.

```
$ ls -al /tmp/output/artifact
Expand All @@ -75,7 +78,7 @@ $ cat /tmp/output/artifact/junit_test_simple_tfjob_cpu.xml

## Common issues

1. ksonnet is not installed
1. ksonnet is not installed

```
ERROR|2021-11-16T03:06:06|/home/jiaxin.shan/go/src/github.com/kubeflow/training-operator/py/kubeflow/tf_operator/test_runner.py|57| There was a problem running the job; Exception [Errno 2] No such file or directory: 'ks-13': 'ks-13'
Expand All @@ -97,7 +100,6 @@ FileNotFoundError: [Errno 2] No such file or directory: 'ks-13': 'ks-13'

Please check `Prerequsite` section to install ksonnet.


2. TypeError: load() missing 1 required positional argument: 'Loader'

```
Expand Down
3 changes: 3 additions & 0 deletions docs/testing/e2e_testing.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# How to Write an E2E Test for Kubeflow Training Operator

TODO (andreyvelich): This doc is outdated. Currently, E2Es are located here:
[`sdk/python/test/e2e`](../../sdk/python/test/e2e)

The E2E tests for Kubeflow Training operator are implemented as Argo workflows. For more background and details
about Argo (not required for understanding the rest of this document), please take a look at
[this link](https://github.com/kubeflow/testing/blob/master/README.md).
Expand Down
33 changes: 21 additions & 12 deletions hack/python-sdk/post_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,21 +46,30 @@ def fix_test_files() -> None:
for test_file in test_files:
print(f"Precessing file {test_file}")
if test_file.endswith(".py"):
with fileinput.FileInput(os.path.join(test_folder_dir, test_file), inplace=True) as file:
with fileinput.FileInput(
os.path.join(test_folder_dir, test_file), inplace=True
) as file:
for line in file:
print(_apply_regex(line), end='')
print(_apply_regex(line), end="")


def add_imports() -> None:
with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as init_file:
init_file.write("from kubeflow.training.api.tf_job_client import TFJobClient\n")
init_file.write("from kubeflow.training.api.py_torch_job_client import PyTorchJobClient\n")
init_file.write("from kubeflow.training.api.xgboost_job_client import XGBoostJobClient\n")
init_file.write("from kubeflow.training.api.mpi_job_client import MPIJobClient\n")
init_file.write("from kubeflow.training.api.mx_job_client import MXJobClient\n")
init_file.write("from kubeflow.training.api.paddle_job_client import PaddleJobClient\n")
with open(os.path.join(sdk_dir, "kubeflow/__init__.py"), "a") as init_file:
init_file.write("__path__ = __import__('pkgutil').extend_path(__path__, __name__)")
with open(os.path.join(sdk_dir, "kubeflow/training/__init__.py"), "a") as f:
f.write("from kubeflow.training.api.training_client import TrainingClient\n")
with open(os.path.join(sdk_dir, "kubeflow/__init__.py"), "a") as f:
f.write("__path__ = __import__('pkgutil').extend_path(__path__, __name__)")

# Add Kubernetes models to proper deserialization of Training models.
with open(os.path.join(sdk_dir, "kubeflow/training/models/__init__.py"), "r") as f:
new_lines = []
for line in f.readlines():
new_lines.append(line)
if line.startswith("from __future__ import absolute_import"):
new_lines.append("\n")
new_lines.append("# Import Kubernetes models.\n")
new_lines.append("from kubernetes.client import *\n")
with open(os.path.join(sdk_dir, "kubeflow/training/models/__init__.py"), "w") as f:
f.writelines(new_lines)


def _apply_regex(input_str: str) -> str:
Expand All @@ -69,5 +78,5 @@ def _apply_regex(input_str: str) -> str:
return input_str


if __name__ == '__main__':
if __name__ == "__main__":
main()
17 changes: 6 additions & 11 deletions hack/python-sdk/swagger_config.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
{
"packageName" : "kubeflow.training",
"projectName" : "training",
"packageVersion": "1.5.0",
"importMappings": {
"V1Container": "from kubernetes.client import V1Container",
"V1ObjectMeta": "from kubernetes.client import V1ObjectMeta",
"V1ListMeta": "from kubernetes.client import V1ListMeta",
"V1ResourceRequirements": "from kubernetes.client import V1ResourceRequirements",
"V1JobCondition": "from kubernetes.client import V1JobCondition",
"V1PodTemplateSpec": "from kubernetes.client import V1PodTemplateSpec"
}
"packageName": "kubeflow.training",
"projectName": "training",
"packageVersion": "1.5.0",
"typeMappings": {
"V1Time": "datetime"
}
}
1 change: 0 additions & 1 deletion py/kubeflow/__init__.py

This file was deleted.

18 changes: 0 additions & 18 deletions py/kubeflow/tf_operator/Pipfile

This file was deleted.

Loading

0 comments on commit b87c6fa

Please sign in to comment.