Skip to content

Conversation

@jfreden
Copy link
Contributor

@jfreden jfreden commented Oct 9, 2025

This is a follow up to #134137, #134893, #135674 and #134604.

This PR pulls the DN out of the provided signature leaf certificate and uses it to match against the configured certificate_identity pattern on the cross cluster api key.

This PR does not deal with audit logging, that will be done in a followup PR.

assertCrossClusterSearchSuccessfulWithResult();

// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why fail if a certificate is invalid but not required? Is it to protect against misconfiguration?

// Always validate a signature if provided
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) {
// TODO audit logging
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With signature verification happening before API key validation, if the certificate is invalid but the API key is also expired, the user sees 'invalid certificate' and never learns about the expired key. Is this the intended priority?

I do think this way makes sense; if trust can't be established, then there isn't much point in validating the API key itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order right now is:

  1. Validate that the API key is valid, this happens first because of authenticateHeaders
  2. Validate that the provided certificate identity is ok also happens in authenticateHeaders
  3. Validate signature

With the current implementation (1) and (2) would actually not generate an audit log from what I can see. Since (1) is the current implementation I think it's a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intended priority?

My thinking is that validating the signature on every request, even failing ones is potentially expensive, so doing it in auth headers is probably not right. The point of auth headers is to short circuit the auth flow.

Doing the signature verification first in the auth flow decouples it from the rest of the api key service, if we don't do it there we require the api key credentials to contain all the signable payload + signature info and to pass it along to the api key service where we do validation. Not sure if that's a good enough reason, but I don't think it matters if we do it first or last, so keeping decoupling and doing it early is probably acceptable.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.

assertCrossClusterSearchSuccessfulWithResult();

// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why fail if a certificate is invalid but not required? Is it to catch misconfigurations early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if a signature is provided I think we should try to validate it and fail if it's not correct since it's unexpected. Better to be less lenient in this case I think.

Copy link
Contributor

@tvernum tvernum Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also gives customers a way to progressively switch over to signing. If they configure signing on the origin cluster, and it doesn't break, then they can rely on the fact that it's probably working correctly. Then they can turn on enforcement on the linked cluster.

And if that first step does break, then they can just turn off signing on the origin, without needing to reconfigure both sides.

@gmjehovich
Copy link
Contributor

Thanks for working on this! The approach looks good. Moving cert validation into the authentication flow should fix the double audit logging I was seeing. I just left some questions to help with my understanding

@jfreden jfreden force-pushed the validate_signature_identity branch from 79d328b to d1293d5 Compare October 10, 2025 11:46
@jfreden jfreden changed the title Validate signature identity Validate certificate identity from cross cluster creds Oct 10, 2025
@jfreden jfreden force-pushed the validate_signature_identity branch 3 times, most recently from 93eef5b to cc2e580 Compare October 14, 2025 14:40
}
}

private ApiKeyService createApiKeyService(Settings baseSettings, FeatureService customFeatureService) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused so I removed it.

Copy link
Contributor

@gmjehovich gmjehovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked some into the test failures from the CI. The serverless ones seem unrelated, but the failing test on the ci test is interesting.

It looks like that case in testInvalidHeaders never triggers the callback passed to the call to authenticationService.authenticate as [this line] (

final CrossClusterAccessSubjectInfo subjectInfo = crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo();
) with the call to crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo() should be triggering that exception I think

// Always validate a signature if provided
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) {
// TODO audit logging
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.

@jfreden jfreden force-pushed the validate_signature_identity branch from 87edbc9 to 2618ac9 Compare October 15, 2025 08:02
@jfreden jfreden force-pushed the validate_signature_identity branch from 8cce705 to 99fac10 Compare October 15, 2025 08:57
@jfreden jfreden added :Security/Security Security issues without another label >enhancement labels Oct 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jfreden, I've created a changelog YAML for you.

@jfreden jfreden changed the title Validate certificate identity from cross cluster creds Validate Certificate Identity provided in Cross Cluster API Key Certificate Oct 15, 2025
@jfreden jfreden marked this pull request as ready for review October 15, 2025 09:05
@jfreden jfreden requested a review from tvernum October 15, 2025 09:05
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Oct 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@jfreden jfreden requested a review from a team October 15, 2025 14:27
@jfreden
Copy link
Contributor Author

jfreden commented Oct 15, 2025

Ended up adding a pattern cache. Still need to add some testing for it but the basics are there. Will continue tomorrow.

@jfreden jfreden force-pushed the validate_signature_identity branch from f39c655 to ed4093a Compare October 17, 2025 11:29
@jfreden jfreden added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Oct 17, 2025
private final boolean enabled;
private final Settings settings;
private final InactiveApiKeysRemover inactiveApiKeysRemover;
private final Cache<String, Pattern> certificateIdentityPatternCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for certificateIdentityPatternCache? Is regex compilation measurably slow relative to the surrounding changes (e.g. signature verification)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about this initially and didn't add a cache.

We accept arbitrary patterns, so measuring if it's slow relative the surrounding changes is difficult. Another consideration is that the Pattern object can be large in memory, so a cache could make things worse. My guess is that most of the time the patterns won't be that expensive and like you point out, the pattern compilation is probably not that slow.

What convinced me to eventually add it was that we check the pattern twice per request, so if it's a complex pattern it could be expensive. This is also something that's very easy to cache and if we add a signature cache in the future it could be the next bottleneck.

);
}
authTrustManager.checkClientTrusted(signature.certificates(), signature.certificates()[0].getPublicKey().getAlgorithm());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: consider using leaf rather than signature.certificates()[0] since the intermediate var is defined above in method scope. Relatedly, it might be worthwhile to introduce a leafCerificate method (or similar name) on X509CertificateSignature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! I've added a leafCerificate() method and updated to use the leaf variable instead.

);
}
authTrustManager.checkClientTrusted(signature.certificates(), signature.certificates()[0].getPublicKey().getAlgorithm());
for (var certificate : signature.certificates()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to make sure we don't accept expired certs.

Copy link
Contributor Author

@jfreden jfreden Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing it looks like if I add an expired cert to the trust store, we trust it even if it's expired unless we do this explicit check.

@jfreden jfreden requested a review from tvernum October 21, 2025 14:38
@jfreden jfreden force-pushed the validate_signature_identity branch from f39ddba to 566770e Compare October 22, 2025 13:38
@jfreden jfreden added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 23, 2025
@jfreden
Copy link
Contributor Author

jfreden commented Oct 23, 2025

@jfreden jfreden merged commit f44b8b2 into elastic:main Oct 23, 2025
40 of 41 checks passed
@jfreden jfreden deleted the validate_signature_identity branch October 23, 2025 11:10
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025
…ficate (elastic#136299) * Validate certificate identity from cross cluster creds Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Tim Vernum <tim@adjective.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) cloud-deploy Publish cloud docker image for Cloud-First-Testing >enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v9.3.0

5 participants