Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty fails on Windows 2022/2019 using Buildkite #15562

Closed
dliappis opened this issue Nov 10, 2023 · 3 comments
Assignees

Comments

@dliappis
Copy link
Contributor

dliappis commented Nov 10, 2023

While transitioning JDK matrix jobs from Jenkins to Buildkite we've spotted a strange failure with testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty. It happens on Windows 2022, 2019 and occasionally on 2019 agents. Interestingly enough it doesn't fail using the same VM image / user / CSP account / instance type when running it in an interactive UI session over RDP.

Example failure:

org.logstash.common.io.DeadLetterQueueWriterAgeRetentionTest > testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty FAILED
--
  | org.awaitility.core.ConditionTimeoutException: Condition was evaluated in 986 milliseconds which is earlier than expected minimum timeout 1 seconds
  | at app//org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:172)
  | at app//org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
  | at app//org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
  | at app//org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
  | at app//org.awaitility.core.ConditionFactory.until(ConditionFactory.java:954)
  | at app//org.logstash.common.io.DeadLetterQueueWriterAgeRetentionTest.testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty(DeadLetterQueueWriterAgeRetentionTest.java:345)

By looking at the test, we tested the following scenarios:

  1. Increase flushInterval from 1s to 2s

This results in success in Buildkite.

  1. Introduce a delay between writing a DLQ entry and starting the Awaitility test i.e. like so:
diff --git a/logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterAgeRetentionTest.java b/logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterAgeRetentionTest.java
index 6c9bb5a02..69cfe3f1a 100644
--- a/logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterAgeRetentionTest.java
+++ b/logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterAgeRetentionTest.java
@@ -337,6 +337,7 @@ public class DeadLetterQueueWriterAgeRetentionTest {

             DLQEntry entry = new DLQEntry(event, "", "", "00001", DeadLetterQueueReaderTest.constantSerializationLengthTimestamp(fakeClock));
             writeManager.writeEntry(entry);
+            Thread.sleep(50);

             // wait the flush interval so that the current head segment is sealed
             Awaitility.await("After the flush interval head segment is sealed and a fresh empty head is created")

makes the test fail, on an interactive Windows 2019 session where the default test always succeeds.

This indicates that, if there is a delay (e.g. due to an operating system context switch) between starting to write the DLQ file and the Awaitility test, it may delay starting the test and its timer such that the condition that the file doesn't become available <1s fails. Therefore, there is potential for this test to fail, regardless of the OS, if cycles get stolen between the two actions.

dliappis added a commit to dliappis/logstash that referenced this issue Nov 10, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
elastic#15562.

Relates:

- elastic#15539
- elastic/ingest-dev#1725
-
@andsel
Copy link
Contributor

andsel commented Nov 10, 2023

This is greatly due to the fact that we are testing about time and this make the test fragile.
At first we should remove the invocations of atLeast, the original intent of it was to give the scheduled task inside the DQL writer enough time to run its sealer job.

Add a TODO to refactor the scheduler inside DLQ writer so that it can abstract from time, and respond to events like "schedule period elapsed". In this way the test could easily test conditions like "after a schedule period is elapsed, and after the sealer jab executed verify that the set of segment files is what we expect".


This is addressed by #15594

dliappis added a commit that referenced this issue Nov 10, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725
github-actions bot pushed a commit that referenced this issue Nov 10, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)
github-actions bot pushed a commit that referenced this issue Nov 10, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)
github-actions bot pushed a commit that referenced this issue Nov 10, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)
dliappis added a commit that referenced this issue Nov 13, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)

Co-authored-by: Dimitrios Liappis <[email protected]>
dliappis added a commit that referenced this issue Nov 13, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)

Co-authored-by: Dimitrios Liappis <[email protected]>
dliappis added a commit that referenced this issue Nov 13, 2023
This commit adds JDK matrix Buildkite pipelines for
Windows 2022, 2019 and 2016.

It also makes the groups easier to read (on both Linux and Windows
pipelines) by removing the os-jdk prefix from the job labels.

`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
fails on Windows Buildkite agents and it's a test issue tracked in
#15562.

Relates:

- #15539
- elastic/ingest-dev#1725

(cherry picked from commit 0ede19a)

Co-authored-by: Dimitrios Liappis <[email protected]>
@andsel
Copy link
Contributor

andsel commented Jan 9, 2024

Fixed by #15680

@andsel andsel closed this as completed Jan 9, 2024
@andsel
Copy link
Contributor

andsel commented Jan 9, 2024

Follow up issue for BK failure not related to the way we used Awaitility in this context: #15767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants