Skip to content

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Apr 28, 2020

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

  • Auto creation of data streams can't occur on the coordinating node.
    Based on the index template (v2) either a regular index or a data stream should be created.
    However if the coordinating node is slow in processing cluster state updates then it may be
    unaware of the existence of certain index templates, which then can load to the
    TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
    creating an index or data stream should occur on the master node. See Auto create data streams using index templates v2 #55377
  • From a security perspective it is useful to know whether index creation originates from the
    create index api or from auto creating a new index via the bulk or index api. For example
    a user would be allowed to auto create an index, but not to use the create index api. The
    auto create action will allow security to distinguish these two different patterns of
    index creation.

This change adds the following new transport actions:

  • AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via Auto create data streams using index templates v2 #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is unaware of certain index templates then the TransportBulkAction could create an index instead of a data stream. Therefor the decision of whether an index or data stream should be created should happen on the master node. See elastic#55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation.
@martijnvg martijnvg added >non-issue :Data Management/Indices APIs APIs to create and manage indices and templates v8.0.0 v7.8.0 labels Apr 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Indices APIs)

…IndexAction. This is needed, because AutoCreateAction will make the decision whether to auto create an data stream or index, and AutoCreateIndexAction really auto creates the index. Future change will add AutoCreateDataStream action. If this logic is in a single action then we can't distingues whether a data stream or index is being auto created, which is important because a user may be able to auto create a data stream, but not an index or visa versa.
@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/1

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/docs

@martijnvg martijnvg marked this pull request as ready for review April 29, 2020 13:09
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martijnvg , I did an initial review and added a few comments, primarily the one we already discussed on collapsing the two transport actions into one.

Exception suppressed = null;
for (Map.Entry<String, Exception> entry : response.getFailureByNames().entrySet()) {
Exception e = entry.getValue();
if (e == null || ExceptionsHelper.unwrapCause(e) instanceof ResourceAlreadyExistsException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to handle the ResourceAlreadyExistsException in the auto-create action.

createIndexResponse -> {
// Maybe a bit overkill to ensure visibility of results map across threads...
synchronized (results) {
results.put(name, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map is in the response named "failureByNames". I think we should either not add successful indices into it, rename the map or separate the two parts in the response.

createIndexRequest.cause(request.getCause());
createIndexRequest.masterNodeTimeout(request.masterNodeTimeout());
createIndexRequest.preferV2Templates(request.getPreferV2Templates());
client.execute(AutoCreateIndexAction.INSTANCE, createIndexRequest, ActionListener.wrap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By delegating to a separate transport action, we risk that the retry on no longer master happens inside that transport action, making any decision we make on cluster state here invalid (or at least potentially stale). I would much prefer to do the cluster state update here.

That way we could also create all the indices and streams in one go, reducing the number of cluster states published.

I think auto-create does not have to be specific to data stream or index. It goes together with the document level privileges which are also agnostic of whether the data ends up in a data stream or an index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think auto-create does not have to be specific to data stream or index. It goes together with the document level privileges which are also agnostic of whether the data ends up in a data stream or an index.

Yes and this is another reason to keep all the auto creation logic in a single action.

@martijnvg martijnvg marked this pull request as draft May 3, 2020 20:15
martijnvg added 3 commits May 3, 2020 22:28
* One action for auto creating indices and in the future also auto create data streams * The TransportBulkAction just executes the auto create action like the create index action. * Avoid the complexity of auto creating indices in a single request / cluster state update. This adds quite some complexity while the benefits are likely only noticeable in edge cases (if many indices get auto created)) Also on the security side, authorization of the auto create indices would become much more complex compared to authorization of auto creating an index one at a time.
@martijnvg
Copy link
Member Author

I've changed this PR, so that there is now a single auto create action that auto creates an index. This can then later be used to also auto create data streams. The auto create action also now handle only a single index at a time instead of multiple indices. The additional complexity isn't worth the benefit from auto creating multiple indices in a single remote call / cluster state update. Also this would require additional logic in security to deny / allow specific indices from being auto created.

@martijnvg martijnvg marked this pull request as ready for review May 4, 2020 08:38
@martijnvg martijnvg requested a review from henningandersen May 4, 2020 08:38
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@martijnvg martijnvg merged commit 0a4428c into elastic:master May 4, 2020
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 4, 2020
Backport of elastic#55858 to 7.x branch. Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See elastic#55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via elastic#55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to elastic#53100
martijnvg added a commit that referenced this pull request May 4, 2020
Backport of #55858 to 7.x branch. Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See #55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to #53100
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Jun 3, 2020
The create index action name (`indices:admin/create`) can no longer be used to grant privileges to auto create indices and instead the `create_index` builtin privilege should be used.
@martijnvg
Copy link
Member Author

I've added the breaking label, because the privilege indices:admin/create can no longer be used to allow auto creation of indices and create_index should be used instead.

The indices:admin/create is a privilege based on an internal transport action and we don't provide backwards compatibility guarantees on transport action names being used as privilege and instead the named privileges should be used.

However there is usage of indices:admin/create privilege and therefor I'm marking this pr as breaking.

martijnvg added a commit that referenced this pull request Jun 4, 2020
The create index action name (`indices:admin/create`) can no longer be used to grant privileges to auto create indices and instead the `create_index` builtin privilege should be used. Relates to #55858 Co-authored-by: Jake Landis <jake.landis@elastic.co>
martijnvg added a commit that referenced this pull request Jun 4, 2020
The create index action name (`indices:admin/create`) can no longer be used to grant privileges to auto create indices and instead the `create_index` builtin privilege should be used. Relates to #55858 Co-authored-by: Jake Landis <jake.landis@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants