Skip to content

Conversation

DaveCTurner
Copy link
Contributor

The default 10s TLS handshake timeout may be too short if there is some
bug causing event-loop latency, and this has more serious consequences
than the underlying performance issue (e.g. it prevents the cluster from
scaling up to work around the problem). With this commit we expose a
setting that allows the timeout to be configured, providing a workaround
in such cases.

The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
@DaveCTurner DaveCTurner requested a review from mhl-b July 9, 2025 12:02
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Network Http and internode communication implementations v9.2.0 labels Jul 9, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jul 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

Copy link
Contributor

github-actions bot commented Jul 9, 2025

Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 9, 2025
Comment on lines 122 to 126
private static final Setting<TimeValue> TRANSPORT_TLS_HANDSHAKE_TIMEOUT_SETTING = Setting.positiveTimeSetting(
"xpack.security.transport.ssl.handshake_timeout",
TimeValue.timeValueSeconds(10),
Setting.Property.NodeScope
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this will affect more than transport connections. It should at least also apply to RCS 2.0 remote cluster client and likely security realms that initiate outbound TLS connections, e.g. OIDC realm.

Most existing SSL settings are affix settings that apply to different contexts. The transport is one of the contexts. Defining these settings is a somewhat involved process via SSLConfigurationSettings to support contexts.

I think we should either:

  1. Support this new setting for different contexts as well.
  2. Dropping the transport part from the setting name, i.e. xpack.security.ssl.handshake_timeout, as well as updating the docs to indicate it applies more broadly.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only affects transport connections, i.e. those which go via SecurityNetty4Transport. That does indeed include remote-cluster connections, but not other outbound TLS connections like the HTTPS ones involved in OIDC. I hadn't noticed that we count RCS2.0 transport connections as distinct from other transport connections in terms of this kind of configuration.

It's a bit tricky tho, I don't really want to have to add support for this setting to all the different contexts in which we do TLS handshakes. At least not today: progress over perfection and all that. If we called it xpack.security.ssl.handshake_timeout then that'd imply it worked everywhere. I'd rather keep it transport-specific, but I think I can see a way to add this to the RCS2.0 settings too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok see dc3a9ac

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right about this does not apply to realms.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit e57a0d0 into elastic:main Jul 10, 2025
34 checks passed
@DaveCTurner DaveCTurner deleted the 2025/07/09/tls-handshake-timeout-setting branch July 10, 2025 15:50
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
@shainaraskas shainaraskas added the docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. label Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Network Http and internode communication implementations docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.2.0

5 participants