- Notifications
You must be signed in to change notification settings - Fork 25.5k
Allow adjustment of transport TLS handshake timeout #130909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adjustment of transport TLS handshake timeout #130909
Conversation
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
🔍 Preview links for changed docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
...ain/java/org/elasticsearch/xpack/core/security/transport/netty4/SecurityNetty4Transport.java Outdated Show resolved Hide resolved
private static final Setting<TimeValue> TRANSPORT_TLS_HANDSHAKE_TIMEOUT_SETTING = Setting.positiveTimeSetting( | ||
"xpack.security.transport.ssl.handshake_timeout", | ||
TimeValue.timeValueSeconds(10), | ||
Setting.Property.NodeScope | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this will affect more than transport
connections. It should at least also apply to RCS 2.0 remote cluster client and likely security realms that initiate outbound TLS connections, e.g. OIDC realm.
Most existing SSL settings are affix settings that apply to different contexts. The transport
is one of the contexts. Defining these settings is a somewhat involved process via SSLConfigurationSettings to support contexts.
I think we should either:
- Support this new setting for different contexts as well.
- Dropping the
transport
part from the setting name, i.e.xpack.security.ssl.handshake_timeout
, as well as updating the docs to indicate it applies more broadly.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only affects transport connections, i.e. those which go via SecurityNetty4Transport
. That does indeed include remote-cluster connections, but not other outbound TLS connections like the HTTPS ones involved in OIDC. I hadn't noticed that we count RCS2.0 transport connections as distinct from other transport connections in terms of this kind of configuration.
It's a bit tricky tho, I don't really want to have to add support for this setting to all the different contexts in which we do TLS handshakes. At least not today: progress over perfection and all that. If we called it xpack.security.ssl.handshake_timeout
then that'd imply it worked everywhere. I'd rather keep it transport-specific, but I think I can see a way to add this to the RCS2.0 settings too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok see dc3a9ac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right about this does not apply to realms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
The default 10s TLS handshake timeout may be too short if there is some
bug causing event-loop latency, and this has more serious consequences
than the underlying performance issue (e.g. it prevents the cluster from
scaling up to work around the problem). With this commit we expose a
setting that allows the timeout to be configured, providing a workaround
in such cases.