Skip to content

Conversation

@loupipalien
Copy link
Contributor

Usually, several indices are written even the cluster has a lot of indices, so we can cache modified time of translog writer file which already not be written to reduce cost time of nodes and indices stats

… cost time of nodes and indices stats Change-Id: I74ddbce3cdd8f4c3263a8cd51fa8b76bd097ae17
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.8.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 10, 2023
@loupipalien loupipalien changed the title Cache mtime of translog writer which hasn't write operation to reduce… Cache mtime of translog writer file which not be written anymore Apr 10, 2023
@astefan astefan added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed needs:triage Requires assignment of a team area label labels Apr 11, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 11, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Change-Id: I241b669b6913a94a7d41ca08dade693663494ca6
@loupipalien
Copy link
Contributor Author

@astefan can you help to invite a reviewer for this pr

@loupipalien
Copy link
Contributor Author

loupipalien commented Apr 19, 2023

we has a large cluster which more than 4k shards per node, as this cluster use network file system, getting mtime of a file from fs cost avg about 0.7ms ~ 0.8ms
image
after this optimize, only a few of translog writer file that is writting need to get mtime from fs,other translog writer files get mtime from memory cache cost avg just 0.01 ~ 0.02ms
image

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Left a few comments to consider. But overall makes sense to me to cache this value.


@Override
protected boolean needsRefresh() {
LastModifiedTime cached = getNoRefresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the modification time is connected to the written file, I wonder if it would be better to judge the need for refresh based on the getWrittenOffset() value (which is directly got from the file channel that the translog writes to) rather than on the combination of operationCounter and lastSyncedCheckpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingherc thanks for your time and patient. I think it wouldn't be better if we judge the need for refresh base on getWrittenOffset, because getWrittenOffset also invokes a syscall every time that is same as BaseTranslogReader#getLastModifiedTime

return lastModifiedTime.getOrRefresh().lastModifiedTime;
} catch (UncheckedIOException e) {
// wrapped in the cache and unwrap here
throw e.getCause();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a SingleObjectCache, for which you basically do not use the time refresh functionality, I wonder if simply extending the LastModifiedTime record with a similar function .getOrRefresh() would be more straightforward and would also enable throwing the involved IOException directly rather than doing this juggling with the UncheckedIOException.

org.elasticsearch.index.translog.TranslogReader#getLastModifiedTime also has such a simple caching (rather than using the SingleObjectCache).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed

@kingherc kingherc self-assigned this Apr 20, 2023
@kingherc kingherc requested a review from Tim-Brooks April 20, 2023 09:37
…SingleObjectCache Change-Id: I723a19f282ff261ffd3bf1124f4fd91d571e6320
@loupipalien loupipalien requested a review from kingherc April 23, 2023 08:51
Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final comments.

Could you also add some testing around the new last modified time behavior? E.g., in server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java there are some tests around TranslogWriter and you could assert that the last modified time is modified once there's a sync of the translog or some operations added.

Thanks again!


@Override
public long getLastModifiedTime() throws IOException {
if (lastModifiedTime.totalOffset() != totalOffset || lastModifiedTime.syncedOffset() != lastSyncedCheckpoint.offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delved a bit more into the code. I think this is correct apart from one weird edge case I see: org.elasticsearch.index.translog.TranslogWriter#readBytes() seems to be writing buffered ops into the channel file without touching totalOffset nor lastSyncedCheckpoint. Would it make sense to reset the lastModifiedTime (e.g., set its totalOffset to -1 in readBytes() in that if statement?

Copy link
Contributor Author

@loupipalien loupipalien Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find usage of org.elasticsearch.index.translog.TranslogWriter#getLastModifiedTime(),the last mtime of writer file is only used to find earliest last modified age of translog files to construct translog stats,so it seems not need an exact value
If only judge the need for refresh based on lastSyncedCheckpoint, the cached last mtime maybe lag too much when translog durability is set async and sync_interval is set too large, so adding totalOffset condition to get mtime more frequent to decrease the lag
I think writing buffered ops into channel may not change mtime, so not need set totalOffset to -1 in readBytes() to keep cache logic simple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I see it's only used in stats, so I agree there is no need to make it more complicated for this rather unusual edge case.

@kingherc
Copy link
Contributor

@elasticmachine test this please

@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@loupipalien loupipalien reopened this Apr 26, 2023
Change-Id: I26330eee653ba13588669c24b55e354e33d28f60
@loupipalien
Copy link
Contributor Author

loupipalien commented Apr 26, 2023

Could you also add some testing around the new last modified time behavior? E.g., in server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java there are some tests around TranslogWriter and you could assert that the last modified time is modified once there's a sync of the translog or some operations added.

@kingherc I add a test, thanks again for you time and patient ❤️

@loupipalien loupipalien requested a review from kingherc April 27, 2023 10:15
@kingherc
Copy link
Contributor

@elasticmachine test this please

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Feel free to merge it.

@loupipalien loupipalien changed the title Cache mtime of translog writer file which not be written anymore Cache modification time of translog writer file Apr 27, 2023
Change-Id: I99c0b28a2da7ab69d0e100485fed406ef698156d
…/elasticsearch into cache_translog_stats Change-Id: I52d20ad9242b4265404442670554545a826bb057
@kingherc
Copy link
Contributor

@elasticmachine test this please

@kingherc kingherc merged commit 12447ce into elastic:main Apr 28, 2023
@kingherc
Copy link
Contributor

@loupipalien I merged this. Feel free to tell me if there was anything else remaining and we can do another PR. Thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.9.0

5 participants