Skip to content

Conversation

@masseyke
Copy link
Member

@masseyke masseyke commented Apr 24, 2023

This adds a new index privilege called manage_dlm. The manage_dlm has permission to perform all DLM actions on an index, including put and delete. It also adds the ability to call DLM get and explain to the view_index_metadata existing index privilege.

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke changed the title Adding manage_dlm role Adding manage_dlm privilege Apr 24, 2023
@masseyke masseyke marked this pull request as ready for review April 24, 2023 21:59
@masseyke masseyke requested a review from andreidan April 24, 2023 22:00
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke masseyke removed the request for review from andreidan April 25, 2023 00:08
@masseyke masseyke marked this pull request as draft April 25, 2023 00:08
@masseyke masseyke marked this pull request as ready for review April 25, 2023 15:20
@masseyke masseyke requested a review from andreidan April 25, 2023 15:21
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Keith, it's looking great so far.

Just a few questions:

Do we need to update ClusterPrivilegeResolver too ?

Do we need documentation update? i.e. get-builtin-privileges.asciidoc

I think the docs for all DLM related APIs need to have this security instruction:

* If the {es} {security-features} are enabled, you must have the `manage_dlm` or `read_ilm` or both cluster privileges to use this API. For more information, see <<security-privileges>>. 

I'd say we should give managed_dlm to the relevant builtin roles defined in ReservedRolesStore ? (at least kibanaSystemRoleDescriptor ?)

containsString("failed to parse role [my_role]. unexpected field [remote_indices]"),
containsString("remote indices not supported for API keys")
containsString("remote indices not supported for API keys"),
containsString("unknown index privilege")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ... expect an unknown index privilege message error after introducing a new privilege?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one surprised me, too. I'll spend a little more time trying to understand why this is. I couldn't find any way to tie a privilege to a version though (like we do for example with xcontent). So I don't think there's anything I can do about this if this is our behavior.

Copy link
Contributor

@n1v0lg n1v0lg Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test here pertains to backwards compatibility, specifically for API keys. For API keys, we serialize the role descriptors (as XContent) associated with it (which includes index privileges), and store those as a doc in the security index. The reason the test is failing here is because the test runs on an old version that doesn't have manage_dlm defined as a privilege yet. Further up in the test, we are creating a random role descriptor with a set of random index privileges, that sometimes include manage_dlm leading to the unknown index privilege failure here.

Just wanted to leave this as context, to save a bit of digging.

In terms of addressing this, for the context of this PR, I think it's all good to keep this as is. I might follow this up next week to make things more robust in a mixed-cluster scenario but it's too much of a side quest for the current PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class (and the thinking around the failure) has evolved. In short, the current behavior is correct.

However, to correctly address it within this test, we should add manage_dlm to the excluded privileges here: https://github.com/elastic/elasticsearch/blob/main/x-pack/qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/ApiKeyBackwardsCompatibilityIT.java#L351

Instead of tolerating containsString("unknown index privilege").

);
private static final Automaton MANAGE_LEADER_INDEX_AUTOMATON = patterns(ForgetFollowerAction.NAME + "*");
private static final Automaton MANAGE_ILM_AUTOMATON = patterns("indices:admin/ilm/*");
private static final Automaton MANAGE_DLM_AUTOMATON = patterns("indices:admin/dlm/*", "indices:admin/data_lifecycle/*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows we have a bit of an inconsistency in our action names. I'll bring the explain API into the data_lifecycle namespace (this comment is just a remark in general :) nothing about this PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #95555 and actually brought all actions under the dlm namespace.
If data_lifecycle is preferable we can go that way ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I had intended to do the same as a follow-up. I'll fix this once that's in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Keith, I merged #95555 so let's update this to just use the dlm namespace

@andreidan andreidan requested a review from jakelandis April 25, 2023 16:26
@andreidan andreidan mentioned this pull request Apr 25, 2023
19 tasks
@andreidan
Copy link
Contributor

Relates to #93596

@n1v0lg
Copy link
Contributor

n1v0lg commented Apr 27, 2023

I believe that this is derived from the action names. Since all of the DLM ones begin with indices:admin (and are actions on indices or data streams) we use index-level privileges.

This part is a little trickier: for all actions we first check if they are cluster actions here which under the hood checks if the action name is part of the ALL_CLUSTER_PATTERN. This means that we can have an action of the form indices:... that is still treated and authorized as a cluster action. We will likely want to do this for the data lifecycle actions so they get authorized against cluster privileges (pending more work on the security model design to say this for sure).

@andreidan
Copy link
Contributor

andreidan commented Apr 27, 2023

@n1v0lg thanks for the update. We're happy to let you folks drive the design and decisions on the security privileges/model. Please let us know what's needed on our side.
If I understand correctly we don't need to change anything to the existing actions (they remain indices:...) and you will come back to this PR with an update (or take over the PR?) ?

@n1v0lg
Copy link
Contributor

n1v0lg commented Apr 27, 2023

@andreidan appreciated!

Please let us know what's needed on our side.

Will do! As soon as I have a better grasp on the final approach I will give feedback directly in this PR and we'll figure out together based on the size of the change what works best, me taking over or just requesting changes. No need to change anything for now. I will also put together a small design doc on the overall approach and keep you in the loop.

@masseyke
Copy link
Member Author

One note before putting this PR aside -- the latest test failure appears to have turned up a real bug (https://gradle-enterprise.elastic.co/s/6kb67wslyurna/tests/:x-pack:qa:rolling-upgrade:v7.17.10%23oldClusterTest/org.elasticsearch.upgrades.ApiKeyBackwardsCompatibilityIT/testCreatingAndUpdatingApiKeys?expanded-stacktrace=WyIwIl0&top-execution=1). I've been talking with @slobodanadamovic about it. If you are running in a mixed mode cluster, and if your newer nodes have a new index privilege, and if you create an API key with this new privilege and use it to make a request that goes to an old cluster, then your request fails. I think Slobodan is going to write up a ticket and look into this. I just thought I'd mention it here so that no one spends too much time on that test failure -- it's not a bug in this PR.

@n1v0lg
Copy link
Contributor

n1v0lg commented Apr 27, 2023

@masseyke thanks for raising this!

I alluded (quite vaguely) to this issue in my comment here:

I might follow this up next week to make things more robust in a mixed-cluster scenario but it's too much of a side quest for the current PR.

I was only thinking about the new privilege we're adding here, but @slobodanadamovic is correct in that it doesn't just apply to the new privilege, but more generally to other privileges we've added in the past, so it makes sense to track it as a bug beyond the scope of this current PR. Good catch 👍

@slobodanadamovic
Copy link
Contributor

@n1v0lg @masseyke

I took some time to look into the issue with BWC and it turns out that it's not only affecting API Keys but also Users.

Creating a role with the newly introduced index privilege will cause problems in a mixed cluster. The same problem can happen when adding a new cluster privileges. I think we need to come up with a generic way of handling BWC for index and cluster privileges.

I've reported a bug here: #95771

@masseyke masseyke closed this May 5, 2023
@n1v0lg
Copy link
Contributor

n1v0lg commented May 25, 2023

Reviving this, since it's the final approach we settled on after iterating on design.

I'll leave a review with a few small comments and nits. @masseyke I'm more than happy to push those changes to this PR, or let you address them. Let me know which you prefer! Thanks for working on this.

@n1v0lg n1v0lg reopened this May 25, 2023
Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Just a few smaller comments. I can address these myself too, if you're swamped. Let me know what you prefer.

String index = (String) ((List<Map<String, Object>>) nodes.get(0).get("indices")).get(0).get("index_name");

Request explainLifecycleRequest = new Request("POST", "/" + index + "/_lifecycle/explain");
Request getLifecycleRequest = new Request("GET", "_data_stream/" + dataStreamName + "/_lifecycle");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would randomize here: randomFrom("_all", "*", dataStreamName) for this and the other calls below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've only granted permission on dlm-* though right? So we'd expect it to potentially fail for others. Or are you suggesting I change the test to make sure that it does fail for others?

Copy link
Contributor

@n1v0lg n1v0lg May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry, I'm suggesting it will work for these patterns as well: the way that index name resolution works is that _all and "*" will resolve into only the names the users has access for (so essentially dataStreamName, i.e., dlm-test)

So if we have a user with access to dlm-*, there are two data streams other-test and dlm-test, and the user runs PUT .../_all/lifecycle it will apply the changes to all data stream the user has access to, i.e., dlm-test, and will skip the other ones, i.e., other-test. Same for DELETE and GET -- this is something we do more generally for index patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize that. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! Only learned this recently also.

containsString("failed to parse role [my_role]. unexpected field [remote_indices]"),
containsString("remote indices not supported for API keys")
containsString("remote indices not supported for API keys"),
containsString("unknown index privilege")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class (and the thinking around the failure) has evolved. In short, the current behavior is correct.

However, to correctly address it within this test, we should add manage_dlm to the excluded privileges here: https://github.com/elastic/elasticsearch/blob/main/x-pack/qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/ApiKeyBackwardsCompatibilityIT.java#L351

Instead of tolerating containsString("unknown index privilege").

@masseyke
Copy link
Member Author

Reviving this, since it's the final approach we settled on after iterating on design.

I'll leave a review with a few small comments and nits. @masseyke I'm more than happy to push those changes to this PR, or let you address them. Let me know which you prefer! Thanks for working on this.

I've got time to address them today. Thanks.

@masseyke masseyke requested a review from n1v0lg May 25, 2023 20:17
Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Few smaller things, but no need for another review from my end.

pr: 95512
summary: Adding `manage_dlm` index privilege and expanding `view_index_metadata` for access to data lifecycle APIs
area: DLM
type: enhancement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really want a changelog here: the privilege is behind a feature flag, and so is the DLM feature itself. I don't know if we've been including changelogs for DLM so far but it might look weird to users to see changelogs for something that isn't available outside a feature flag yet. We could instead mark this non-issue and skip the changelog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO advertising experimental features is ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on my end!

@masseyke masseyke added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 26, 2023
@elasticsearchmachine elasticsearchmachine merged commit 3228058 into elastic:main May 26, 2023
@masseyke masseyke deleted the feature/manage_dlm-role branch May 26, 2023 15:17
elasticsearchmachine pushed a commit that referenced this pull request May 31, 2023
Porting the DLM permissions REST test to the new style of cluster tests. This has the usual nice perks, and also allows us to remove the separate `qa/with-security` package. No functional or test logic changes. I didn't suggest this as part of the PR review for #95512 so as not to block that PR further, and also because I wasn't sure about the overhead of making this change (it did end up taking some battling with gradle).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Data Management Meta label for data/management team Team:Security Meta label for security team v8.9.0

6 participants