Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: [ADR] Testing Strategy #2

Merged
merged 2 commits into from
Aug 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/adrs/002-testing-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Testing Strategy

Status: Accepted, except for the open TBD on the automated AI Core API [wiremock tests](#wiremock-unit-tests).

## Background

Like any product, the AI SDK needs to be tested.

## General Strategy

We classify our tests into regular unit, wiremock unit, integration and e2e tests.
Overall, we aim for at least 80% test coverage.
Excluded from this is the AI Core API module, which is completely generated and only selected functionality can be tested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should generalize from the start..?

Suggested change
Excluded from this is the AI Core API module, which is completely generated and only selected functionality can be tested.
Excluded from this are modules, which are completely generated and only selected functionality can be tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could, but the difference between e.g. AI Core and Orchestration is that AI Core is huge, while in Orchestration we can probably cover most of the functionality.


## Regular Unit Tests

Anything that can be covered by unit tests, should be, following the [testing pyramid principle](https://martinfowler.com/articles/practical-test-pyramid.html).

## Wiremock Unit Tests

We use wiremock tests to verify (de-) serialization, HTTP headers and other behavior on the HTTP layer.

We construct these tests only with data that has been obtained from real systems.
This guarantees that, at least at the time of creating the test, a real system behaved in the exact same way the wiremock test stub does.

TBD: How much of the generated code for the AI Core API do we want to test, and how do we want to test it.
There are various tools available to automate parts of this, see [this list](https://openapi.tools/#data-validators).
Comment on lines +26 to +27
Copy link
Contributor

@newtork newtork Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment)

This is not easy to decide. Of course we don't want to write tests only for checking OpenAPI generator functionality: We expect this to happen in (1) OpenAPI Generator and/or (2) SAP Cloud SDK.

However there are additional things to consider:

  • We deviate from the vanilla OpenAPI Generator templates.
  • (We are not certain whether the spec itself is 100% correct. We can only assume, of course.)

Some ideas:

  • For Wiremock-Unit-Test cases we could try injecting some validator logic into the HttpClientFactory that checks whether request/response are valid. This could happen (globally?) without a dedicated unit-test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, in an ideal world testing a single CRUD example should be more than sufficient. Unfortunately, as you point out, (1) the OpenAPI generator still has some issues, and a spec file so large will also have issues/mistakes.

I personally like the idea of automating this away, but then spec file mistakes still go undetected, and the automation is only as good as the automation tool, which brings us back to point (1) 😅

So yeah, tough one.


## Integration Tests

We use the integration tests to continuously test against real systems.
Integration tests are in a separate module and are run only nightly / on-demand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests are in a separate module

Are you sure we want to follow up on this? In Cloud SDK v3 we had severe issues keeping this dedicated module clean of unit tests. When looking for tests, we would've found unit-tests in some module and integration-tests in some other module. Maintenance felt higher because of that.

I don't want to object, just wondering whether you've seen this working better/well in another project(?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but the difference here is that integration tests won't run with normal builds, only nightly / together with e2e tests, and they would require additional setup, i.e. a service key.

To me that sounds like it should disincentivize abusing it enough 😄


These tests are more expensive and are less reliable, so we limit how much we cover with them:

- Reading configurations and deployments
- Performing chat completions and embeddings (and potentially other features) for at least one foundation model per vendor
- Performing requests via orchestration service, covering every orchestration module available
- Query all available models and verify all are known to the SDK

Note that orchestration and foundation model tests should be tuned to produce small input/output sizes to reduce costs and keep test execution fast.

## E2E Tests

Finally, E2E tests cover full application scenarios, e.g. using the SDK in a Spring and/or CAP based application.
They ensure that the SDK works in a real-world scenario, including typical application dependencies and configuration.

E2E tests are comprised of two parts: The test/sample application code and pipeline code.

The application code serves as both test code but also sample code for end-users, showcasing how they could include the SDK in their applications.
Thus, we want to keep the application code as meaningful as possible.
Comment on lines +50 to +51
Copy link
Contributor

@newtork newtork Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment)

I wholeheartedly agree on the promise. But just to be clear, the current state is not perfect and requires some minor improvement still, e.g. Spring Boot best-practices and code quality.

I'm not sure about the term "meaningful" in your description. But I'm also fine with leaving it as is :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "meaningful" I was thinking of keeping the code clean from things that are irrelevant for AI SDK users. E.g. we should not put tests for pure Cloud SDK features in there.

+1 on the code quality. I don't have specifics in mind, but I created this small ticket. Maybe you can add some ideas for code quality improvements to it?


The actual testing is done via GitHub Actions, which builds and starts the applications and then invokes endpoints via `curl` or similar.
The E2E test actions are run nightly / on-demand, similarly to integration tests.

## Performance Tests

Currently not planned.

## Test Landscapes

For integration and E2E tests we test against 2 different systems:

- Staging - pre-release system
- Canary - released system

Implementing new features or adapting to changes happens on branches and is tested first against staging.