Skip to content

Conversation

albertzaharovits
Copy link
Contributor

Fixes #125639
Relates #120869

@albertzaharovits albertzaharovits added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 1, 2025
@albertzaharovits albertzaharovits self-assigned this Apr 1, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0 labels Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@albertzaharovits
Copy link
Contributor Author

FWIW here's the stack trace for the failure:

"TEST-ThreadPoolMergeSchedulerTests.testMergeSourceWithFollowUpMergesRunSequentially-seed#[43ADDD8E872EA775]" ID=3471 WAITING on java.util.concurrent.CountDownLatch$Sync@6e886e21	at java.base@24/jdk.internal.misc.Unsafe.park(Native Method)	- waiting on java.util.concurrent.CountDownLatch$Sync@6e886e21	at java.base@24/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:789)	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1138)	at java.base@24/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler.close(ThreadPoolMergeScheduler.java:483)	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests.testMergeSourceWithFollowUpMergesRunSequentially(ThreadPoolMergeSchedulerTests.java:187)	at java.base@24/java.lang.invoke.LambdaForm$DMH/0x000000000a000c00.invokeVirtual(LambdaForm$DMH)	at java.base@24/java.lang.invoke.LambdaForm$MH/0x000000000a120800.invoke(LambdaForm$MH)	at java.base@24/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)	at java.base@24/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154)	at java.base@24/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)	at java.base@24/java.lang.reflect.Method.invoke(Method.java:565)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)	at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)	at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$$Lambda/0x000000000a39d5b0.run(Unknown Source)	at java.base@24/java.lang.Thread.runWith(Thread.java:1460)	at java.base@24/java.lang.Thread.run(Thread.java:1447) "elasticsearch[test][merge][T#1]" ID=3476 WAITING on java.util.concurrent.Semaphore$NonfairSync@29f76e49	at java.base@24/jdk.internal.misc.Unsafe.park(Native Method)	- waiting on java.util.concurrent.Semaphore$NonfairSync@29f76e49	at java.base@24/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:789)	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1138)	at java.base@24/java.util.concurrent.Semaphore.acquire(Semaphore.java:318)	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests.lambda$testMergeSourceWithFollowUpMergesRunSequentially$1(ThreadPoolMergeSchedulerTests.java:228)	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests$$Lambda/0x000000000ad4c208.answer(Unknown Source)	at app//org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:42)	at app//org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:103)	at app//org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)	at app//org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:56)	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptAbstract(MockMethodInterceptor.java:161)	at app//org.apache.lucene.index.MergeScheduler$MergeSource$MockitoMock$KxPeFHhn.merge(Unknown Source)	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler.doMerge(ThreadPoolMergeScheduler.java:267)	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler$MergeTask.run(ThreadPoolMergeScheduler.java:363)	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.runMergeTask(ThreadPoolMergeExecutorService.java:195)	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.lambda$enqueueMergeTaskExecution$3(ThreadPoolMergeExecutorService.java:167)	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService$$Lambda/0x000000000ad490c8.run(Unknown Source)	at app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)	at java.base@24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	at java.base@24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	at java.base@24/java.lang.Thread.runWith(Thread.java:1460)	at java.base@24/java.lang.Thread.run(Thread.java:1447)	Locked synchronizers:	- java.util.concurrent.ThreadPoolExecutor$Worker@70409d3f 
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Not entirely certain how it fixes the specific failure but looks good regardless.

@albertzaharovits
Copy link
Contributor Author

Not entirely certain how it fixes the specific failure but looks good regardless.

Yeah, it was a head-scratcher.
There was a race between the test thread and the merge thread(s). The merge thread uses the runMergeIdx variable to verify that the follow-up merges are executed in the expected order, and the test thread uses that same variable to know when all merges finished executing. But runMergeIdx can mean either the index of the merge currently running or of the one that just finished, depending on when it's checked.
The test dead-locks when the test thread thinks all merges are done, but there is a last one that still needs to run.

@albertzaharovits albertzaharovits merged commit e934600 into elastic:main Apr 2, 2025
17 checks passed
@albertzaharovits albertzaharovits deleted the fix-125639 branch April 2, 2025 14:13
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Apr 9, 2025
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team >test Issues or PRs that are addressing/adding tests v9.1.0

3 participants