Reuse FieldPermissionsCache in Role parsing #94931

tvernum · 2023-03-31T07:32:31Z

When parsing role descriptors, we ensure that the FieldPermissions ("field_security":{ "grant":[ ... ], "except":[ ... ] }) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant".

However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed.

This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API.

When parsing role descriptors, we ensure that the FieldPermissions (`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant". However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed. This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API.

elasticsearchmachine · 2023-03-31T07:32:56Z

Pinging @elastic/es-security (Team:Security)

elasticsearchmachine · 2023-03-31T07:52:04Z

Hi @tvernum, I've created a changelog YAML for you.

ywangd · 2023-04-03T00:53:17Z

.../main/java/org/elasticsearch/xpack/core/security/authz/permission/FieldPermissionsCache.java

+ if (cache == null) {
+ return new FieldPermissions(definition);


We don't allow this cache to be disabled. So this should never be null. We should either assert that or throw error. Also this constructor is currently only used in tests because that's where cache may not exist. With this new method, it is possible to replace the constructor usages with init and this method and drop this constructor entirely and I think that would be a good thing.

ywangd · 2023-04-03T01:39:39Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/Security.java

 final AuthorizationService authzService = new AuthorizationService(
 settings,
 allRolesStore,
+ fieldPermissionsCache,


Do we want to make this change? The cache used by RBACEngine is techinically cross different index groups and the combination depends on the requested indices patterns, e.g. if we have

"indices": [ { "name": ["logs-*"], "privileges": ["read"], "field_security": {"grant": ["xxx"] } }, { "name": ["*-staging"], "privileges": ["read"], "field_security": {"grant": ["yyy"] } }, ]

If the requested index pattern is logs-nginx-staging, both groups will match and the cache key will be

new FieldPermissionsDefinition( Set.of( new FieldGrantExcludeGroup(["xxx"], null), new FieldGrantExcludeGroup(["yyy"], null) ) )

While the cache used in CompositeRolesStore is most likely keyed by a single index group. With the above example, the two groups will be cached separately. If we combine the two caches, it is possible that more entries would be competing for cache usages.

Along the same line of reasoning, the cache used by RoleDescriptor is purely for a single index definition. Strictly speaking, it can also be different from the cache usage by CompositeRolesStore which can combine index groups if they have exactly the same names definition.

In summary, the cache is keyed slightly different with the 3 use cases:

RoleDescriptor - keyed strictly by a single index group's FLS definition

CompositeRolesStore - keyed by FLS combination of multiple index groups that share the same names definition

RBACEngine - keyed by FLS combination of mulitple index groups that have names matching the requested index.

That said, I do think having a single cache for them is reasonable because we don't really want to differentiate clearly between the different type of cache keys. My only concern is whether combining them could add extra noticeable pressure on this single cache.

ywangd · 2023-04-03T01:47:29Z

.../main/java/org/elasticsearch/xpack/core/security/authz/permission/FieldPermissionsCache.java

+ final FieldPermissionsCache cache = new FieldPermissionsCache(settings);
+ INSTANCE.set(cache);
+ return cache;


It feels a bit odd to have both a staticly shared instance and a public constructor.

tvernum · 2023-04-03T03:13:07Z

@ywangd I'll respond to your comments collectively because I think they're centred around 2 issues:

How much refactoring do we want to do in this PR?
To what extent should we share caches?

On point 1 we have the issues of:

FieldPermissionsCache.resolve works correctly even if the static cache has not been configured.
The FieldPermissionsCache constructor is public.

In both of those cases the reason is that a large number of tests assume the old behaviour of either (1) constructing a cache inside the test or (2) being able to parse a role without configuring a cache.
I can fix them - it's a fairly mechanical issue, but it would change this PR from being 9 modified files (as it is right now) to potentially hundreds. Do we want to do that in this PR? My feeling is that we'd prefer to backport a smaller PR to 7.17, and follow up the PR with a separate cleanup.
However, if you're happy to review a very large change then I can make it here.

To be honest, I hate the static cache, but RoleDescriptor.parse is static, and passing a cache into that method would also be a massive change.
I toyed with the idea of having a RoleDescriptorParser that had the cache as an instance field. That would have been much nicer, but would have been an enormous PR that I don't think we'd be comfortable backporting to 7.17

Perhaps we need to chat about which big change we'd prefer (tighten up the static cache, or remove the need for a static cache) before we go much further.

On the question of sharing caches, my thoughts are:

The FieldPermissionsCache is really just an automaton cache like the one in Automatons, but it knows some special behaviours around meta fields, and grant/except. It's possible to merge it with the Automatons cache (in fact the earlier patch you saw tried to do that). That might also be a reasonable direction to take (do we really need both caches?) though it is complicated by the "subset" checking in FieldPermissionsCache (see more discussion below)
Because FieldPermissionsCache is just automaton caches, I think having a single instance is better, but I accept there are tradeoffs. I'd reasonably expect for some clusters the current 2 cache model means the same automatons are being constructed multiple times. If I've followed the code correctly, in cases where users have 1 role that has a single index group, the 2 caches would end up with the same entries (unnecessarily).
We have a setting for FieldPermissionsCache (authz.store.roles.field_permissions.cache.max_size_in_bytes). It's is strange therefore that we have more than 1 instance of the cache, each doing its own accounting. In current releases it's technically possible for the field permissions cache(s) to use twice as much RAM as configured, because there are 2 of them. I definitely don't want to have three of them,
The FieldPermissionsCache doesn't ever expire entries (unless full) which is probably a mistake, but it also seems like an argument for having only 1 instance. If we're worried about introducing pressure on the cache, that implies its using a lot of RAM, and the 2 (now 3) different usages would compete. While it's possible that could impact an existing cluster negatively, it still seems like the right behaviour. There should be 1 block of RAM allocated to this, and it should evict based on a global LRU rather than an LRU per caller (RBACEngine vs CompositeRolesStore)

Aside: Subset Checking: I find the existing subset checking to be a bit weird. The design is that you can have

{ "grant": "*", "except": "secret.*" }

but not

{ "grant": "public.*", "except": "private.*" }

However, because it's a subset check rather than an intersection check, you also can't say

{ "grant": "shared.*", "except": "*.secret" }

It is what it is, and I don't want to argue to change it, but it means that I don't think we'd want to push this exact check into Automatons - it's a specific behaviour for FieldPermissions, not a general "allow/deny" check.

ywangd · 2023-04-03T06:07:22Z

For the cache sharing, I think you have some good points on having a single cache. It is more intuitive (a single setting configures a single cache). The existing 2-cache structure also has memory duplication issue. So the change here actually fixes it with a different tradeoff. I think it is reasonable to assume that the memory duplication issue (single index section matching the requested index) is more likely to happen than the issue with cache sharing. Based on these, I am on board with the approach of a single shared cache.

As for how much changes we want for this PR, I do think we want to keep it as small as possible for the sake of 7.17 backport. But I also find it is rather non-ideal that it is possible to call new FieldPermissions(...) multiple times and get separate underlying caches. What do you think about the following changes:

private static final AtomicReference<Cache<FieldPermissionsDefinition, FieldPermissions>> CACHE_REF = new AtomicReference<>(); public FieldPermissionsCache(Settings settings) { CACHE_REF.updateAndGet(c -> { if (c == null) { return CacheBuilder.<FieldPermissionsDefinition, FieldPermissions>builder() .setMaximumWeight(CACHE_SIZE_SETTING.get(settings)) .weigher((key, fieldPermissions) -> fieldPermissions.ramBytesUsed()) .build(); } else { return c; } }); } public FieldPermissions getFieldPermissions(FieldPermissionsDefinition fieldPermissionsDefinition) { return CACHE_REF.get().computeIfAbsent(...); }

It helps getting rid of the static init method and the INSTANCE variable. It only generates a single underlying cache on mulitple constructor calls. It should also help fixing the current test failures with SetOnce. It is fairly small change as well. What do you think?

…into reuse-field-permissions-cache

tvernum · 2023-04-06T04:30:01Z

@ywangd I've updated this based on our conversation. I think it ended up being a much simpler change, though we acknowledge there's some debt to refactor later.

ywangd

LGTM

I like how small the changeset is!

ywangd · 2023-04-06T04:40:23Z

...ugin/core/src/test/java/org/elasticsearch/xpack/core/security/authz/RoleDescriptorTests.java

 }

 public void testParseIndicesPrivilegesFailsWhenExceptFieldsAreNotSubsetOfGrantedFields() {
+ resetFieldPermssionsCache();


Just a question: Is this reset necessary or nice to have just in case?

It's no longer necessary. It was needed in an earlier version where sometimes tests ran without a cache, because the behaviour with/without the cache was subtly different.
Now it's just there because it makes sense to have a known clean base for the test (and not have it be at risk of flakiness due to what's existing in the cache)

elasticsearchmachine · 2023-04-06T05:00:13Z

Backport

Status	Branch	Result
✅	7.17
✅	8.7

When parsing role descriptors, we ensure that the FieldPermissions (`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant". However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed. This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API.

When parsing role descriptors, we ensure that the FieldPermissions (`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant". However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed. This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API. Backport of: elastic#94931

* Reuse FieldPermissionsCache in Role parsing (#94931) When parsing role descriptors, we ensure that the FieldPermissions (`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant". However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed. This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API. * Fix spotless and imports

When parsing role descriptors, we ensure that the FieldPermissions (`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid - that is that any patterns compile correctly, and the "except" is a subset of the "grant". However, the previous implementation would not use the FieldPermissionsCache for this, so it would compile (union, intersect & minimize) automatons every time a role was parsed. This was particularly an issue when parsing roles (from the security index) in the GET /_security/role/ endpoint. If there were a large number of roles with field level security the automaton parsing could have significant impact on the performance of this API. Backport of: #94931

tvernum added :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC auto-backport-and-merge v7.17.10 v8.7.1 v8.8.0 labels Mar 31, 2023

tvernum requested a review from ywangd March 31, 2023 07:32

elasticsearchmachine added the Team:Security Meta label for security team label Mar 31, 2023

tvernum added the >enhancement label Mar 31, 2023

Update docs/changelog/94931.yaml

6e08ce7

ywangd reviewed Apr 3, 2023

View reviewed changes

tvernum added 5 commits April 3, 2023 12:11

Fix exception handling

06fe78a

Fix initialisation in IdP tests

3bcc7e4

Merge branch 'main' into reuse-field-permissions-cache

cccf6b9

Fix SecurityTests

f0d7487

Remove + deprecate methods on cache

9349591

tvernum added 4 commits April 4, 2023 13:31

Merge branch 'main' into reuse-field-permissions-cache

320c472

Merge remote-tracking branch 'tvernum/reuse-field-permissions-cache' …

c3e11e1

…into reuse-field-permissions-cache

Merge branch 'main' into reuse-field-permissions-cache

c0e0feb

Move static field to 'RoleDescriptor'

b8727e5

tvernum requested a review from ywangd April 6, 2023 04:28

ywangd approved these changes Apr 6, 2023

View reviewed changes

tvernum merged commit 3ff085c into elastic:main Apr 6, 2023

tvernum mentioned this pull request Apr 6, 2023

[8.7] Reuse FieldPermissionsCache in Role parsing (#94931) #95063

Merged

elasticsearchmachine added the backport pending label Apr 6, 2023

tvernum mentioned this pull request Apr 6, 2023

[7.17] Reuse FieldPermissionsCache in Role parsing #95065

Merged

gmarouli removed the backport pending label May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reuse FieldPermissionsCache in Role parsing #94931

Reuse FieldPermissionsCache in Role parsing #94931

Uh oh!

tvernum commented Mar 31, 2023

elasticsearchmachine commented Mar 31, 2023

elasticsearchmachine commented Mar 31, 2023

ywangd Apr 3, 2023

ywangd Apr 3, 2023

ywangd Apr 3, 2023

tvernum commented Apr 3, 2023

ywangd commented Apr 3, 2023

tvernum commented Apr 6, 2023

ywangd left a comment

ywangd Apr 6, 2023

tvernum Apr 6, 2023

elasticsearchmachine commented Apr 6, 2023 •

edited by tvernum

Loading

Labels

4 participants

		if (cache == null) {
		return new FieldPermissions(definition);

Uh oh!

Reuse FieldPermissionsCache in Role parsing #94931

Reuse FieldPermissionsCache in Role parsing #94931

Uh oh!

Conversation

tvernum commented Mar 31, 2023

elasticsearchmachine commented Mar 31, 2023

elasticsearchmachine commented Mar 31, 2023

ywangd Apr 3, 2023

Choose a reason for hiding this comment

ywangd Apr 3, 2023

Choose a reason for hiding this comment

ywangd Apr 3, 2023

Choose a reason for hiding this comment

tvernum commented Apr 3, 2023

ywangd commented Apr 3, 2023

tvernum commented Apr 6, 2023

ywangd left a comment

Choose a reason for hiding this comment

ywangd Apr 6, 2023

Choose a reason for hiding this comment

tvernum Apr 6, 2023

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 6, 2023 • edited by tvernum Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backport

Labels

4 participants

elasticsearchmachine commented Apr 6, 2023 •

edited by tvernum

Loading