Fix excessive read latency during and after shard splits #11059

snaury · 2024-10-29T13:25:36Z

Changelog entry

Fixed excessive read latency during and after some shard splits.

Changelog category

Bugfix

Additional information

It was observed that reads sometimes take seconds during frequent shard splits. Turns out shards replied with an OVERLOADED status even after split has already finished, which caused KQP to retry reads repeatedly with an exponential backoff, until eventually a guard condition (after multiple seconds) would cause read actor to finally re-resolve the table again. A correct NOT_FOUND status (which indicates the table no longer exists) fixes this problem.

Fixes #11036.

github-actions · 2024-10-29T13:27:58Z

⚪ 2024-10-29 13:27:57 UTC Pre-commit check linux-x86_64-relwithdebinfo for dfa7e01 has started.
⚪ 2024-10-29 13:28:33 UTC Artifacts will be uploaded here
⚪ 2024-10-29 13:32:01 UTC ya make is running...
🟡 2024-10-29 14:37:41 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
15285	13784	0	2	1396	103

⚪ 2024-10-29 14:38:58 UTC ya make is running... (failed tests rerun, try 2)
🟢 2024-10-29 14:50:31 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
106 (only retried tests)	13	0	0	0	93

🟢 2024-10-29 14:50:38 UTC Build successful.
🟡 2024-10-29 14:50:57 UTC ydbd size 2.8 GiB changed* by +1.2 MiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash	main: `103d800`	merge: `dfa7e01`	diff	diff %
ydbd size	3 034 031 192 Bytes	3 035 250 456 Bytes	+1.2 MiB	+0.040%
ydbd stripped size	480 686 232 Bytes	480 862 936 Bytes	+172.6 KiB	+0.037%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2024-10-29T13:29:41Z

⚪ 2024-10-29 13:29:41 UTC Pre-commit check linux-x86_64-release-asan for dfa7e01 has started.
⚪ 2024-10-29 13:29:52 UTC Artifacts will be uploaded here
⚪ 2024-10-29 13:32:52 UTC ya make is running...
🟡 2024-10-29 15:02:54 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
9229	9158	0	24	13	34

🟢 2024-10-29 15:03:42 UTC Build successful.
🟢 2024-10-29 15:04:16 UTC ydbd size 5.7 GiB changed* by -2.4 KiB, which is <= 0 Bytes vs main: OK

ydbd size dash	main: `06b8cb8`	merge: `dfa7e01`	diff	diff %
ydbd size	6 142 267 184 Bytes	6 142 264 688 Bytes	-2.4 KiB	-0.000%
ydbd stripped size	1 532 866 544 Bytes	1 532 866 736 Bytes	+192 Bytes	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

ydb/core/tx/datashard/datashard_ut_read_iterator.cpp

ydb/core/tx/datashard/datashard__read_iterator.cpp

Fix excessive read latency during and after shard splits

4f17a98

snaury requested a review from azevaykin October 29, 2024 13:25

github-actions bot added bugfix and removed bugfix labels Oct 29, 2024

snaury marked this pull request as ready for review October 29, 2024 13:26

github-actions bot added bugfix and removed bugfix labels Oct 29, 2024

snaury self-assigned this Oct 29, 2024

azevaykin reviewed Oct 29, 2024

View reviewed changes

ydb/core/tx/datashard/datashard_ut_read_iterator.cpp Show resolved Hide resolved

azevaykin reviewed Oct 29, 2024

View reviewed changes

ydb/core/tx/datashard/datashard__read_iterator.cpp Show resolved Hide resolved

azevaykin approved these changes Oct 29, 2024

View reviewed changes

snaury merged commit eee456c into ydb-platform:main Oct 29, 2024
13 checks passed

snaury deleted the bugfix-11036-slow-read-split branch October 29, 2024 15:14

This was referenced Nov 1, 2024

oidc bad cookie #11200

Open

oidc add cookie logs #11367

Merged

shnikd mentioned this pull request Nov 7, 2024

Support QueryMeta and diagnostics #11371

Merged

niksaveliev mentioned this pull request Nov 8, 2024

Fix read session correct close ut #11407

Merged

GrigoriyPA mentioned this pull request Nov 13, 2024

YQ WM fixed race with actor context #11576

Merged

niksaveliev mentioned this pull request Nov 14, 2024

Commit for autopartitioned topics #11629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix excessive read latency during and after shard splits #11059

Fix excessive read latency during and after shard splits #11059

Uh oh!

snaury commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

Fix excessive read latency during and after shard splits #11059

Fix excessive read latency during and after shard splits #11059

Uh oh!

Conversation

snaury commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog entry

Changelog category

Additional information

github-actions bot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions bot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

snaury commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading