Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineBusV2 deadlock proofing #16671

Merged
merged 2 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -141,16 +141,20 @@ public AddressState.ReadOnly mutate(final String address,

consumer.accept(addressState);

// If this addressState has a listener, ensure that any waiting
return addressState.isEmpty() ? null : addressState;
});

if (result == null) {
return null;
} else {
// If the resulting addressState had a listener, ensure that any waiting
// threads get notified so that they can resume immediately
final PipelineInput currentInput = addressState.getInput();
final PipelineInput currentInput = result.getInput();
if (currentInput != null) {
synchronized (currentInput) { currentInput.notifyAll(); }
}

return addressState.isEmpty() ? null : addressState;
});
return result == null ? null : result.getReadOnlyView();
return result.getReadOnlyView();
}
}

private AddressState.ReadOnly get(final String address) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,24 @@
import org.junit.Test;

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatCode;

import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.logstash.RubyUtil;
import org.logstash.ext.JrubyEventExtLibrary;

import java.time.Duration;
import java.util.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.LongAdder;
import java.util.stream.Stream;
Expand Down Expand Up @@ -307,6 +316,56 @@ public void whenInBlockingModeInputsShutdownLast() throws InterruptedException {
assertThat(bus.getAddressState(address)).isNotPresent();
}

@Test
public void blockingShutdownDeadlock() throws InterruptedException {
final ExecutorService executor = Executors.newFixedThreadPool(10);
try {
for (int i = 0; i < 100; i++) {
bus.registerSender(output, addresses);
bus.listen(input, address);
bus.setBlockOnUnlisten(true);

// we use a CountDownLatch to increase the likelihood
// of simultaneous execution
Comment on lines +328 to +329
Copy link
Member Author

@yaauie yaauie Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the countdown latch in place, this test reliably finds the preexisting deadlock in <5 iterations (very usually in <2), and without it we can reliably find it in <40 iterations (very usually in <10). The test runs 100 iterations just to be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! I was curious if there was anything else for getting threads to wait on a particular point and found CyclicBarrier https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/CyclicBarrier.html

I tried running several times with this diff:

diff --git a/logstash-core/src/test/java/org/logstash/plugins/pipeline/PipelineBusTest.java b/logstash-core/src/test/java/org/logstash/plugins/pipeline/PipelineBusTest.java
index 268ed8d09..1dbaddaab 100644
--- a/logstash-core/src/test/java/org/logstash/plugins/pipeline/PipelineBusTest.java
+++ b/logstash-core/src/test/java/org/logstash/plugins/pipeline/PipelineBusTest.java
@@ -39,6 +39,7 @@ import java.util.List;
 import java.util.Set;
 import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.CyclicBarrier;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.TimeUnit;
@@ -325,17 +326,14 @@ public class PipelineBusTest {
                 bus.listen(input, address);
                 bus.setBlockOnUnlisten(true);

-                // we use a CountDownLatch to increase the likelihood
-                // of simultaneous execution
-                final CountDownLatch startLatch = new CountDownLatch(2);
+                // Synchronize when unlisten/unregisterSender start
+                final CyclicBarrier startBarrier = new CyclicBarrier(2);
                 final CompletableFuture<Void> unlistenFuture = CompletableFuture.runAsync(asRunnable(() -> {
-                    startLatch.countDown();
-                    startLatch.await();
+                    startBarrier.await();
                     bus.unlisten(input, address);
                 }), executor);
                 final CompletableFuture<Void> unregisterFuture = CompletableFuture.runAsync(asRunnable(() -> {
-                    startLatch.countDown();
-                    startLatch.await();
+                    startBarrier.await();
                     bus.unregisterSender(output, addresses);
                 }), executor);

Regardless of whether I was using CountdownLatch or CyclicBarrier I could not really see a meaningful difference in (very good with either option) performance.

final CountDownLatch startLatch = new CountDownLatch(2);
final CompletableFuture<Void> unlistenFuture = CompletableFuture.runAsync(asRunnable(() -> {
startLatch.countDown();
startLatch.await();
bus.unlisten(input, address);
}), executor);
final CompletableFuture<Void> unregisterFuture = CompletableFuture.runAsync(asRunnable(() -> {
startLatch.countDown();
startLatch.await();
bus.unregisterSender(output, addresses);
}), executor);

// ensure that our tasks all exit successfully, quickly
assertThatCode(() -> CompletableFuture.allOf(unlistenFuture, unregisterFuture).get(1, TimeUnit.SECONDS))
.withThreadDumpOnError()
.withFailMessage("Expected unlisten and unregisterSender to not deadlock, but they did not return in a reasonable amount of time in the <%s>th iteration", i)
.doesNotThrowAnyException();
}
} finally {
executor.shutdownNow();
}
}

@FunctionalInterface
interface ExceptionalRunnable<E extends Throwable> {
void run() throws E;
}

private Runnable asRunnable(final ExceptionalRunnable<?> exceptionalRunnable) {
return () -> {
try {
exceptionalRunnable.run();
} catch (Throwable e) {
throw new RuntimeException(e);
}
};
}


@Test
public void whenInputFailsOutputRetryOnlyNotYetDelivered() throws InterruptedException {
bus.registerSender(output, addresses);
Expand Down