-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: [ADR] Testing Strategy #2
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Testing Strategy | ||
|
||
Status: Accepted, except for the open TBD on the automated AI Core API [wiremock tests](#wiremock-unit-tests). | ||
|
||
## Background | ||
|
||
Like any product, the AI SDK needs to be tested. | ||
|
||
## General Strategy | ||
|
||
We classify our tests into regular unit, wiremock unit, integration and e2e tests. | ||
Overall, we aim for at least 80% test coverage. | ||
Excluded from this is the AI Core API module, which is completely generated and only selected functionality can be tested. | ||
|
||
## Regular Unit Tests | ||
|
||
Anything that can be covered by unit tests, should be, following the [testing pyramid principle](https://martinfowler.com/articles/practical-test-pyramid.html). | ||
|
||
## Wiremock Unit Tests | ||
|
||
We use wiremock tests to verify (de-) serialization, HTTP headers and other behavior on the HTTP layer. | ||
|
||
We construct these tests only with data that has been obtained from real systems. | ||
This guarantees that, at least at the time of creating the test, a real system behaved in the exact same way the wiremock test stub does. | ||
|
||
TBD: How much of the generated code for the AI Core API do we want to test, and how do we want to test it. | ||
There are various tools available to automate parts of this, see [this list](https://openapi.tools/#data-validators). | ||
Comment on lines
+26
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Comment) This is not easy to decide. Of course we don't want to write tests only for checking OpenAPI generator functionality: We expect this to happen in (1) OpenAPI Generator and/or (2) SAP Cloud SDK. However there are additional things to consider:
Some ideas:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, in an ideal world testing a single CRUD example should be more than sufficient. Unfortunately, as you point out, (1) the OpenAPI generator still has some issues, and a spec file so large will also have issues/mistakes. I personally like the idea of automating this away, but then spec file mistakes still go undetected, and the automation is only as good as the automation tool, which brings us back to point (1) 😅 So yeah, tough one. |
||
|
||
## Integration Tests | ||
|
||
We use the integration tests to continuously test against real systems. | ||
Integration tests are in a separate module and are run only nightly / on-demand. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are you sure we want to follow up on this? In Cloud SDK v3 we had severe issues keeping this dedicated module clean of unit tests. When looking for tests, we would've found unit-tests in some module and integration-tests in some other module. Maintenance felt higher because of that. I don't want to object, just wondering whether you've seen this working better/well in another project(?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, but the difference here is that integration tests won't run with normal builds, only nightly / together with e2e tests, and they would require additional setup, i.e. a service key. To me that sounds like it should disincentivize abusing it enough 😄 |
||
|
||
These tests are more expensive and are less reliable, so we limit how much we cover with them: | ||
|
||
- Reading configurations and deployments | ||
- Performing chat completions and embeddings (and potentially other features) for at least one foundation model per vendor | ||
- Performing requests via orchestration service, covering every orchestration module available | ||
- Query all available models and verify all are known to the SDK | ||
|
||
Note that orchestration and foundation model tests should be tuned to produce small input/output sizes to reduce costs and keep test execution fast. | ||
|
||
## E2E Tests | ||
|
||
Finally, E2E tests cover full application scenarios, e.g. using the SDK in a Spring and/or CAP based application. | ||
They ensure that the SDK works in a real-world scenario, including typical application dependencies and configuration. | ||
|
||
E2E tests are comprised of two parts: The test/sample application code and pipeline code. | ||
|
||
The application code serves as both test code but also sample code for end-users, showcasing how they could include the SDK in their applications. | ||
Thus, we want to keep the application code as meaningful as possible. | ||
Comment on lines
+50
to
+51
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Comment) I wholeheartedly agree on the promise. But just to be clear, the current state is not perfect and requires some minor improvement still, e.g. Spring Boot best-practices and code quality. I'm not sure about the term "meaningful" in your description. But I'm also fine with leaving it as is :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With "meaningful" I was thinking of keeping the code clean from things that are irrelevant for AI SDK users. E.g. we should not put tests for pure Cloud SDK features in there. +1 on the code quality. I don't have specifics in mind, but I created this small ticket. Maybe you can add some ideas for code quality improvements to it? |
||
|
||
The actual testing is done via GitHub Actions, which builds and starts the applications and then invokes endpoints via `curl` or similar. | ||
The E2E test actions are run nightly / on-demand, similarly to integration tests. | ||
|
||
## Performance Tests | ||
|
||
Currently not planned. | ||
|
||
## Test Landscapes | ||
|
||
For integration and E2E tests we test against 2 different systems: | ||
|
||
- Staging - pre-release system | ||
- Canary - released system | ||
|
||
Implementing new features or adapting to changes happens on branches and is tested first against staging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should generalize from the start..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could, but the difference between e.g. AI Core and Orchestration is that AI Core is huge, while in Orchestration we can probably cover most of the functionality.