Skip to content

Conversation

slobodanadamovic
Copy link
Contributor

@slobodanadamovic slobodanadamovic commented Jan 31, 2025

This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

  • The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.
  • The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves #121022
Resolves #121096
Resolves #121101
Resolves #120988
Resolves #121108
Resolves #120983
Resolves #120987
Resolves #121179
Resolves #121183
Resolves #121346
Resolves #121151
Resolves #120985
Resolves #121039
Resolves #121483
Resolves #121116
Resolves #121258
Resolves #121486

@slobodanadamovic slobodanadamovic changed the title Fix internal cluster tests Fix internal cluster and single node security tests Feb 4, 2025
@slobodanadamovic slobodanadamovic added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label Team:Security Meta label for security team auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.1 v9.0.1 labels Feb 4, 2025
@slobodanadamovic slobodanadamovic requested a review from a team February 4, 2025 09:29
@slobodanadamovic slobodanadamovic merged commit 369c641 into elastic:main Feb 16, 2025
22 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 121466

slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves elastic#121022 Resolves elastic#121096 Resolves elastic#121101 Resolves elastic#120988 Resolves elastic#121108 Resolves elastic#120983 Resolves elastic#120987 Resolves elastic#121179 Resolves elastic#121183 Resolves elastic#121346 Resolves elastic#121151 Resolves elastic#120985 Resolves elastic#121039 Resolves elastic#121483 Resolves elastic#121116 Resolves elastic#121258 Resolves elastic#121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml #	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves elastic#121022 Resolves elastic#121096 Resolves elastic#121101 Resolves elastic#120988 Resolves elastic#121108 Resolves elastic#120983 Resolves elastic#120987 Resolves elastic#121179 Resolves elastic#121183 Resolves elastic#121346 Resolves elastic#121151 Resolves elastic#120985 Resolves elastic#121039 Resolves elastic#121483 Resolves elastic#121116 Resolves elastic#121258 Resolves elastic#121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml
@slobodanadamovic
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x
9.0
8.18

Questions ?

Please refer to the Backport tool documentation

slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves elastic#121022 Resolves elastic#121096 Resolves elastic#121101 Resolves elastic#120988 Resolves elastic#121108 Resolves elastic#120983 Resolves elastic#120987 Resolves elastic#121179 Resolves elastic#121183 Resolves elastic#121346 Resolves elastic#121151 Resolves elastic#120985 Resolves elastic#121039 Resolves elastic#121483 Resolves elastic#121116 Resolves elastic#121258 Resolves elastic#121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml #	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves #121022 Resolves #121096 Resolves #121101 Resolves #120988 Resolves #121108 Resolves #120983 Resolves #120987 Resolves #121179 Resolves #121183 Resolves #121346 Resolves #121151 Resolves #120985 Resolves #121039 Resolves #121483 Resolves #121116 Resolves #121258 Resolves #121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
…122732) * Fix internal cluster and single node security tests (#121466) This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves #121022 Resolves #121096 Resolves #121101 Resolves #120988 Resolves #121108 Resolves #120983 Resolves #120987 Resolves #121179 Resolves #121183 Resolves #121346 Resolves #121151 Resolves #120985 Resolves #121039 Resolves #121483 Resolves #121116 Resolves #121258 Resolves #121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml #	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java * fix compilation error
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
…122734) * Fix internal cluster and single node security tests (#121466) This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests. - The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution. - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges. Resolves #121022 Resolves #121096 Resolves #121101 Resolves #120988 Resolves #121108 Resolves #120983 Resolves #120987 Resolves #121179 Resolves #121183 Resolves #121346 Resolves #121151 Resolves #120985 Resolves #121039 Resolves #121483 Resolves #121116 Resolves #121258 Resolves #121486 (cherry picked from commit 369c641) # Conflicts: #	muted-tests.yml #	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java * fix compilation error
ankit--sethi added a commit to ankit--sethi/elasticsearch that referenced this pull request Jul 8, 2025
ankit--sethi added a commit to ankit--sethi/elasticsearch that referenced this pull request Jul 11, 2025
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 14, 2025
…king * upstream/main: (33 commits) Allow both WithEntitlementsOnTestCode and EntitledTestPackages together (elastic#130826) Move streams status actions to cluster:monitor group (elastic#131015) Update JDK base image for OIDC fixture (elastic#131176) Mute org.elasticsearch.xpack.esql.ccq.MultiClustersIT testLookupJoinAliases elastic#131166 Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testEnqueuedMergeTasksAreUnblockedWhenEstimatedMergeSizeChanges elastic#131165 Mute org.elasticsearch.xpack.esql.ccq.MultiClustersIT testNotLikeListKeyword elastic#131155 Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#131154 Check file entitlements on the Lucene FilterFileSystem in tests (elastic#130825) Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {lookup-join.MvJoinKeyOnFromAfterStats ASYNC} elastic#131148 Move FrequencyCappedAction to common package (elastic#131060) Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672 Remove nesting from multi allocation decision (elastic#130844) Disable async search rest tests in release builds (elastic#131132) Fix testStopQueryLocal (elastic#131130) Fixes based on resharding disruption tests (elastic#130870) Remove inactive logger (elastic#131121) Add wait for remote start for the test (elastic#131124) Add existing shards allocator settings to failure store allowed list. (elastic#131056) Don't allow field caps to use semantic queries as index filters (elastic#131111) issue should be already fixed by elastic#121466 (elastic#130860) ...
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Security/Security Security issues without another label Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v8.18.1 v9.0.1 v9.1.0

3 participants