Defer unpromotable shard refreshes until index refresh blocks are cleared #120642

fcofdez · 2025-01-22T15:56:33Z

This update postpones unpromotable refreshes for indices with an active INDEX_REFRESH_BLOCK until the block is cleared.
This ensures refresh operations proceed only when the index is no longer blocked.

To avoid indefinite delays, the maximum wait time is governed by the bulk request timeout whereas for explicit refreshes
it relies on the fact that the block will be removed eventually.

Closes ES-10134

…ared This update postpones unpromotable refreshes for indices with an active INDEX_REFRESH_BLOCK until the block is cleared. This ensures refresh operations proceed only when the index is no longer blocked. To avoid indefinite delays, the maximum wait time is governed by the stateless.indices.refresh.max_wait_time_for_unblock setting. Closes ES-10134

elasticsearchmachine · 2025-01-22T15:56:57Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

elasticsearchmachine · 2025-01-22T15:57:21Z

Hi @fcofdez, I've created a changelog YAML for you.

fcofdez · 2025-01-22T15:57:54Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+
+ @Override
+ public void onTimeout(TimeValue timeout) {
+ listener.onFailure(indexLevelBlockException);


I was wondering if we should just rely on the fact that the blocks would be cleared eventually instead of failing the refresh request, wdyt?

I could see us do both. But in the case of bulks it would be nice to use the timeout from the bulk instead.

Also, I think throwing the refresh block exception is confusing, I'd prefer to throw a timeout exception instead.

But we could also forego it, expecting the refresh unblock to happen automatically after a separate timeout.

For refresh triggered by writes we can maybe reuse the postWriteRefreshTimeout and for explicit refreshes reuse the timeout from the original request, and then rely on the block to be cleared automatically in case we wait indefinitely? And get rid of the new setting?

That makes more sense, I used the postWriteRefreshTimeout instead of the setting in 9a22ad6

…elasticsearch into wait-for-refresh-block-clearing

henningandersen

Left a few smaller initial comments.

henningandersen · 2025-01-22T16:54:33Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+ return;
+ }
+
+ clusterStateObserver.waitForNextChange(new ClusterStateObserver.Listener() {


I think we'd normally pass the predicate here instead, avoiding the explicit retries here.

I think that also removes the need for the runnable?

That's a good simplification, done in 9a22ad6

henningandersen · 2025-01-22T22:39:33Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+
+ @Override
+ public void onTimeout(TimeValue timeout) {
+ listener.onFailure(indexLevelBlockException);


I could see us do both. But in the case of bulks it would be nice to use the timeout from the bulk instead.

Also, I think throwing the refresh block exception is confusing, I'd prefer to throw a timeout exception instead.

But we could also forego it, expecting the refresh unblock to happen automatically after a separate timeout.

henningandersen · 2025-01-22T22:41:21Z

...lasticsearch/action/support/broadcast/unpromotable/TransportBroadcastUnpromotableAction.java

+ }
+
+ protected void beforeDispatchingRequestToUnpromotableShards(Request request, ActionListener<Void> listener) {
+ ActionListener.completeWith(listener, () -> null);


Is this not just listener.onResponse(null)?

I just got used to always use the safe completeWith util, but it's true that listener.onResponse(null) should be enough. Tackled in 9a22ad6.

tlrx

Looks nice, I made a quick review and left some comment.

tlrx · 2025-01-23T09:11:19Z

...lasticsearch/action/support/broadcast/unpromotable/TransportBroadcastUnpromotableAction.java


 @Override
 protected void doExecute(Task task, Request request, ActionListener<Response> listener) {
+ beforeDispatchingRequestToUnpromotableShards(


nit: maybe add this in TransportUnpromotableShardRefreshAction directly, unless it is needed for tests?

Is it required by tests finally?

tlrx · 2025-01-23T09:24:00Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+
+ @Override
+ public void onTimeout(TimeValue timeout) {
+ listener.onFailure(indexLevelBlockException);


For refresh triggered by writes we can maybe reuse the postWriteRefreshTimeout and for explicit refreshes reuse the timeout from the original request, and then rely on the block to be cleared automatically in case we wait indefinitely? And get rid of the new setting?

…k-clearing

henningandersen

Looks good, I hope Tanguy will do a more thorough review.

henningandersen · 2025-01-23T10:34:10Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+ }
+
+ var clusterStateObserver = new ClusterStateObserver(clusterService, request.getTimeout(), logger, threadPool.getThreadContext());
+


It is slightly annoying that the observer does not support it but we need to have something like:

if (isIndexBlockedForRefresh(request.shardId().getIndexName(), clusterStateObserver.setAndGetObservedState()) == false) { listener.onResponse(null); return; }

That's right and why I prefer slightly more the Runnable approach. Fixed in 179074d

I still find this simpler than the runnable way which takes over more reponsibility from the observer. One can trick the observer by passing in initial version -1, but it is a bit ugly. We should probably add a method to it to simplify this very common case.

Yes, I was planning on doing that. I'll open a PR shortly

…k-clearing

tlrx

LGTM, I left only minor comments that you can feel free to address or not.

tlrx · 2025-01-24T08:19:05Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+ }, clusterState -> isIndexBlockedForRefresh(request.shardId().getIndexName(), clusterState) == false);
+ }
+
+ private boolean isIndexBlockedForRefresh(String index, ClusterState state) {


nit: can be static

tlrx · 2025-01-24T08:26:08Z

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

+
+ @Override
+ public void onTimeout(TimeValue timeout) {
+ listener.onFailure(new ElasticsearchTimeoutException("index refresh blocked, waiting for shard(s) to be started"));


I think "waiting for shard to be started" is implicit, the timeout is caused by the block to not be removed within the expected delay.

Maybe something like this?

listener.onFailure( new ElasticsearchTimeoutException( "shard refresh timed out waiting for index block to be removed", new ClusterBlockException(Map.of(request.shardId().getIndexName(), Set.of(INDEX_REFRESH_BLOCK))) ) );

tlrx · 2025-01-24T08:28:00Z

...lasticsearch/action/support/broadcast/unpromotable/TransportBroadcastUnpromotableAction.java


 @Override
 protected void doExecute(Task task, Request request, ActionListener<Response> listener) {
+ beforeDispatchingRequestToUnpromotableShards(


Is it required by tests finally?

tlrx · 2025-01-24T08:32:23Z

server/src/main/java/org/elasticsearch/cluster/routing/IndexShardRoutingTable.java

 List<ShardRouting> assignedShards = new ArrayList<>();
 List<ShardRouting> unpromotableShards = new ArrayList<>();
 List<ShardRouting> allInitializingShards = new ArrayList<>();
+ List<ShardRouting> allUnpromotableShards = new ArrayList<>();


I wonder if allUnpromotableShards should be named unpromotableShards and the existing unpromotableShards renamed to assignedUnpromotableShards?

(can be a follow up though)

tlrx · 2025-01-24T08:33:59Z

server/src/main/java/org/elasticsearch/cluster/routing/IndexShardRoutingTable.java

 this.activeShards = CollectionUtils.wrapUnmodifiableOrEmptySingleton(activeShards);
 this.assignedShards = CollectionUtils.wrapUnmodifiableOrEmptySingleton(assignedShards);
 this.unpromotableShards = CollectionUtils.wrapUnmodifiableOrEmptySingleton(unpromotableShards);
+ this.allUnpromotableShards = CollectionUtils.wrapUnmodifiableOrEmptySingleton(allUnpromotableShards);


Should be assert that all unpromotableShards are contained in allUnpromotableShards? Can be costly though.

tlrx · 2025-01-24T08:50:19Z

server/src/main/java/org/elasticsearch/action/support/replication/PostWriteRefresh.java

 });
 case IMMEDIATE -> immediate(indexShard, listener.delegateFailureAndWrap((l, r) -> {
- if (indexShard.getReplicationGroup().getRoutingTable().unpromotableShards().size() > 0) {
+ if (indexShard.getReplicationGroup().getRoutingTable().allUnpromotableShards().size() > 0) {


tlrx · 2025-01-24T08:52:17Z

...elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshActionTests.java

 final var shardId = new ShardId(new Index(randomIdentifier(), randomUUID()), between(0, 3));
- final var shardRouting = TestShardRouting.newShardRouting(shardId, randomUUID(), true, ShardRoutingState.STARTED);
- final var indexShardRoutingTable = new IndexShardRoutingTable.Builder(shardId).addShard(shardRouting).build();
+ final var indexShardRoutingTable = createShardRoutingTableWithPrimaryAndSearchShards(shardId);


Now it always tests with a search shard, maybe we should randomize that (ie, randomly adding a search shard or not)?

tlrx · 2025-01-24T08:58:26Z

...elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshActionTests.java

+ UnpromotableShardRefreshRequest request,
+ ActionListener<ActionResponse.Empty> responseListener
+ ) {
+ assert false : "Unexpected call";


maybe also throw an AssertionError?

tlrx · 2025-01-24T09:01:20Z

...elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshActionTests.java

+
+ if (randomBoolean()) {
+ setState(clusterService, ClusterState.builder(clusterService.state()).version(clusterService.state().version() + 1));
+ assertThat(countDownLatch.getCount(), is(equalTo(1L)));


I wonder if we should remove this? If CI is super slow, we're at risk that the transport message times out before this instruction is executed?

…k-clearing

fcofdez · 2025-01-27T16:14:40Z

@elasticmachine update branch

…elasticsearch into wait-for-refresh-block-clearing

…k-clearing

fcofdez · 2025-01-28T08:21:34Z

@elasticmachine update branch

fcofdez added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team labels Jan 22, 2025

fcofdez requested review from henningandersen and tlrx January 22, 2025 15:56

fcofdez requested a review from a team as a code owner January 22, 2025 15:56

elasticsearchmachine added the v9.0.0 label Jan 22, 2025

Update docs/changelog/120642.yaml

3fb0599

fcofdez commented Jan 22, 2025

View reviewed changes

fcofdez added 2 commits January 22, 2025 17:03

Use unpromotable shard list

1604afd

Merge branch 'wait-for-refresh-block-clearing' of github.com:fcofdez/…

c0dde3a

…elasticsearch into wait-for-refresh-block-clearing

henningandersen reviewed Jan 22, 2025

View reviewed changes

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jan 23, 2025

tlrx reviewed Jan 23, 2025

View reviewed changes

fcofdez added 2 commits January 23, 2025 10:52

Review comments

9a22ad6

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

a922b75

…k-clearing

henningandersen reviewed Jan 23, 2025

View reviewed changes

fcofdez added 3 commits January 23, 2025 11:47

Check for blocks first

179074d

Fix test

76d842a

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

a22a69e

…k-clearing

fcofdez requested a review from tlrx January 23, 2025 14:24

tlrx approved these changes Jan 24, 2025

View reviewed changes

fcofdez added 3 commits January 27, 2025 11:13

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

3470214

…k-clearing

Review comments

7d31f27

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

1fdfb44

…k-clearing

Merge branch 'main' into wait-for-refresh-block-clearing

1b2a2b8

fcofdez added 4 commits January 27, 2025 19:58

Revert renaming

3c7b1ca

Merge branch 'wait-for-refresh-block-clearing' of github.com:fcofdez/…

39431e1

…elasticsearch into wait-for-refresh-block-clearing

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

cda1dfa

…k-clearing

Merge remote-tracking branch 'origin/main' into wait-for-refresh-bloc…

bf1ed0d

…k-clearing

Merge branch 'main' into wait-for-refresh-block-clearing

a3b09a0

fcofdez merged commit 2ebbad4 into elastic:main Jan 28, 2025
16 checks passed

		}

		var clusterStateObserver = new ClusterStateObserver(clusterService, request.getTimeout(), logger, threadPool.getThreadContext());

Defer unpromotable shard refreshes until index refresh blocks are cleared #120642

Defer unpromotable shard refreshes until index refresh blocks are cleared #120642

Conversation

fcofdez commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Jan 22, 2025

elasticsearchmachine commented Jan 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcofdez commented Jan 27, 2025

fcofdez commented Jan 28, 2025

Uh oh!

Labels

5 participants

fcofdez commented Jan 22, 2025 •

edited

Loading