Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test fixed #1705

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Flaky test fixed #1705

wants to merge 1 commit into from

Conversation

Derrick2000
Copy link

@Derrick2000 Derrick2000 commented Nov 17, 2024

Issue: https://issues.redhat.com/browse/UNDERTOW-2528

Dear Maintainer,

I noticed that one of the WebSocket tests in the repository is exhibiting flaky behavior due to Non-Order-Determinism (NOD) issues. Below is a detailed explanation of the problem and a proposed fix to ensure the test behaves reliably.

Occasionally, the test fails at the line:

with the error:
expected:<[MESSAGE-ECHO-hi]> but was:<[CLOSE]>

The issue stems from Non-Order-Determinism (NOD) in the asynchronous WebSocket communication. Specifically:

  1. Race Condition During Reconnection: After sending the "close" message, the test immediately proceeds to send "hi" without ensuring that the WebSocket connection has been fully reopened.
    The "CLOSE" message from the previous interaction might still be processed or pending when "hi" is sent, causing the test to retrieve "CLOSE" instead of "MESSAGE-ECHO-hi".

  2. Session State Assumption: The test assumes that the Session is immediately ready to send and receive messages after receiving an "OPEN" message, which may not be the case in all scenarios.

  3. Unreliable Message Processing Order: WebSocket communication is asynchronous, and the test does not synchronize properly with the server's message processing, leading to unexpected order of messages.

Proposed Fix
To resolve the issue, I propose the following change: Verify session state before sending messages by adding a check to ensure the Session is open and ready for communication. This change ensure the test behaves reliably by addressing the Non-Order-Determinism issue caused by race conditions in the asynchronous WebSocket communication.

Please let me know if you’d like further clarification or assistance with implementing these changes.

Best regards,

@baranowb
Copy link
Contributor

Hi @Derrick2000 . Ive created ticket for this enchancement: https://issues.redhat.com/browse/UNDERTOW-2528 . Since you already analyzed the problem, would you like to contribute said fix?

@Derrick2000
Copy link
Author

Hi @baranowb, our Nondex tool detected flakiness in the test io.undertow.websockets.jsr.test.reconnect.ClientEndpointReconnectTestCase.testAnnotatedClientEndpoint. The fix I implemented in this PR addresses and resolves this issue.

@baranowb
Copy link
Contributor

baranowb commented Dec 5, 2024

@Derrick2000 Oh, given the description I thought there is more to it.

@baranowb
Copy link
Contributor

baranowb commented Dec 5, 2024

@Derrick2000 - could you please update commit message to include prefix like - [UNDERTOW-2528], also, commit message should inform which test it addresses.

@Derrick2000
Copy link
Author

@baranowb commit message updated.

@Derrick2000 Oh, given the description I thought there is more to it.

There could be future PRs addressing other flaky tests in this repo because I was only assigned to this one test. Sorry about the whole paragraph, just trying to provide more context to the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants