Azure Logs documentation update #4300

zmoog · 2022-09-26T10:23:57Z

What does this PR do?

Expand Azure Logs integration docs to make it easier for users to set it up.

It addresses issues 2 and 5 from #4169.

Disclaimer: this is an early draft to share the change with the cloud monitoring team members and docs wizards.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
~~I have verified that all data streams collect metrics or logs.~~
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.

Related issues

Relates [META] Azure Logs integration doc update #4169
Relates Azure integration doc fixes #4210

Screenshots

I expanded this section adding a list of the required component to set up an Azure Logs integration. The section also add some details about each component and sub- elements the user need to know later in the Setup section.

elasticmachine · 2022-09-26T10:39:13Z

🚀 Benchmarks report

Package `azure` 👍(1) 💚(2) 💔(6)

Expand to view

Data stream	Previous EPS	New EPS	Diff (%)	Result
`activitylogs`	1153.4	652.32	-501.08 (-43.44%)	💔
`auditlogs`	2272.73	1438.85	-833.88 (-36.69%)	💔
`identity_protection`	2341.92	1828.15	-513.77 (-21.94%)	💔
`platformlogs`	3076.92	2109.7	-967.22 (-31.43%)	💔
`signinlogs`	1600	948.77	-651.23 (-40.7%)	💔
`springcloudlogs`	3597.12	2061.86	-1535.26 (-42.68%)	💔

To see the full report comment with /test benchmark fullreport

elasticmachine · 2022-09-26T10:39:25Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-10-03T08:57:42.515+0000
Duration: 14 min 39 sec

Test stats 🧪

Test	Results
Failed	0
Passed	119
Skipped	0
Total	119

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

elasticmachine · 2022-09-26T10:39:36Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	100.0% (`9/9`)	💚
Files	85.0% (`17/20`)	👎 -12.208
Classes	85.0% (`17/20`)	👎 -12.208
Methods	82.143% (`138/168`)	👎 -7.535
Lines	85.578% (`2504/2926`)	👎 -5.957
Conditionals	100.0% (`0/0`)	💚

Highlighting the kind of data thaw flow between the components is probably helpful.

colleenmcginnis · 2022-09-27T01:40:14Z

👋 @zmoog I like the direction you're taking here!

One thing that isn't clear (yet) in this draft is the relationship between this Azure Logs integration and the individual Active Directory, Activity logs, Firewall logs, Platform logs, and Spring Cloud logs integrations. Are they intended to be used together? Or should a user choose either the Azure Logs integration or one or more of the other integrations for collecting logs? In what situations should you choose each approach?

zmoog · 2022-09-27T13:13:21Z

👋 @zmoog I like the direction you're taking here!

That's good! So I'll move ahead exploring how to tackle the Setup section.

One thing that isn't clear (yet) in this draft is the relationship between this Azure Logs integration and the individual Active Directory, Activity logs, Firewall logs, Platform logs, and Spring Cloud logs integrations. Are they intended to be used together? Or should a user choose either the Azure Logs integration or one or more of the other integrations for collecting logs? In what situations should you choose each approach?

I haven't touched the documentation of the individual integrations yet. The idea is to replicate the AWS doc approach: general and shared information in the main README.md file, and the specific information and references (fields and sample logs) in the individual .md files.

I moved some of the content from the Requirements section to the Setup section. There were too many details.

zmoog · 2022-09-27T21:20:39Z

Hey @colleenmcginnis, I expanded the Setup section by drafting the kind of content users probably need to set up Azure Logs. I also moved some details from the Requirements section to the Setup one. I updated the screenshots.

Let me know what you think approach!

The next step is revising the Settings section, working on the individual integration pages, and moving the references from the README.

Refine event hub setup information a little

Logs reference is probably more useful in the individual integration page.

zmoog · 2022-09-28T21:10:08Z

@colleenmcginnis, I reduced the scope of this PR a little; I plan a series of small, more focused PRs next.

In this PR, I focus on addressing issues 2 and 5 from #4169 (address the missing information about event hubs and the ambiguity between event hub namespace and event hub.).

I also removed the field details and sample events from the Reference section on the main page, similar to the AWS integration doc.

alaudazzi

Left a few editing suggestions, otherwise LGTM.

packages/azure/_dev/build/docs/README.md

alaudazzi · 2022-09-29T05:44:02Z

@zmoog
Great PR 🥇 Thank you for reworking these instructions!
As we discussed yesterday, reducing the scope of this PR and address the remaining items from #4169 makes totally sense.

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

packages/azure/_dev/build/docs/README.md

lucabelluccini · 2022-09-29T10:48:26Z

packages/azure/_dev/build/docs/README.md

+#### How many event hubs?

-Examples:
+Elastic recommends creating one event hub for each Azure service you collect data from. For example, if you plan to collect Azure Active Directory (Azure AD) logs and Activity logs, create two event hubs: one for Azure AD and one for Activity logs.


Still, the configuration of an integration allows specifying only one event hub, but the user can enable the processing of multiple types of events. So it seems as Azure Integration we somehow assume an EventHub can contain more than 1 type of events/logs.

Some integrations are designed to filter incoming logs and drop those that are not on the supported log categories list.

For example, both sign-in and audit logs have a processor in their ingest pipelines that drop logs that don't belong to the supported log categories. So you can enable all the Azure AD logs, and each one will ingest only the right log messages. The price is inefficiencies due to all data streams receiving the same messages: dropping a message is not completely free.

Other integrations, like the generic Event Hub integration, can ingest any log category but it indexes only the common fields. So enabling a generic Event Hub integration alongside others is not recommended and can lead to indexing the same log multiple times, in different data streams, and with different field mapping.

@lucabelluccini, do you think we need to elaborate a little instead of just stating "Elastic recommends creating one event hub"?

For example, both sign-in and audit logs have a processor in their ingest pipelines that drop logs that don't belong to the supported log categories.

If the drop is in the ingest pipeline on Elasticsearch side, leaving all the types of logs in the Event Log settings is a waste of network and ingest pipeline resources on the ingest nodes, as we're basically multiplying N times (N = the type of logs/data streams left enabled)

E.g. I collect only the AD logs and I route them to 1 event hub. If I configure the Azure Logs integration as per defaults (all options enabled) as below, the amount of data transferred to ES (for then being thrown away except for the AD) is huge...

Yep. That's why I want to update this document so badly.

I also want to open a different PR to change the default value from enabled to disabled (the new data streams added to Azure AD logs are disabled by default).

The Azure module for Filebeat also recommends one even hub per log type:

It is recommended to use a separate eventhub for each log type as the field mappings of each log type are different.

The reason is different but still valid.

We agreed to make the following changes:

Add a disclaimer at the top of the README.md document suggesting installing the "individual" integrations.

We "strongly recommend"

lucabelluccini · 2022-09-29T10:54:41Z

packages/azure/_dev/build/docs/README.md

-This setting can also be used to define your own endpoints, like for hybrid cloud models.
+It is not recommended to use the same event hub for multiple integrations.
+
+For high-volume deployments, we recommend one event hub for each data stream.


We introduce the concept of Elasticsearch data stream, but the user doesn't see the word data stream when configuring the Azure Logs Integration.
What is the objective of this statement?

The goal was to introduce the idea that high-vol data streams may require an additional work.

A good trade-off for most users is one event hub for all Azure AD logs (made of four data streams now), but if you have a substantial Active Directory deployment, you may consider moving to one event hub for each data stream.

lucabelluccini · 2022-09-29T11:01:09Z

packages/azure/_dev/build/docs/README.md

+Like all other clients, Elastic Agent should specify a consumer group to access the event hub.

-## Logs reference
+A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets.


This might be misleading - at least to my novice eyes.

If the same Consumer group is used across multiple Elastic Agents with Azure Logs configured identically, allow to read the logs concurrently and without duplicates.
Different consumer groups allow to have a separate view of the event stream and to read the logs independently at their own page and their own offsets.

We need to go one level deeper to understand what's going on.

The current integration structure forces the same event hub and consumer group to be shared across all enabled data streams.

Every enabled data stream spawns an azureeventhub input that connects to the same event hub and uses the same consumer group name:

┌────────────────┐ │ adlogs │ │ <<event hub>> │ └────────────────┘ │ │ │ ┌──────────────┼──────────────┐ $Default $Default $Default │ │ │ ┌──────────┼──────────────┼──────────────┼────────┐ │ ▼ ▼ ▼ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ signin │ │ audit │ │ activity │ │ │ │ <<input>> │ │ <<input>> │ │ <<input>> │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ │ │ │ └─Filebeat─┼──────────────┼──────────────┼────────┘ │ │ │ │ │ │ │ consumer group info │ │ (state, position, or │ │ offset) │ │ │ │ │ │ │ ┌──────────┼──────────────┼──────────────┼────────┐ │ ▼ ▼ ▼ │ │ ┌────────────┐ ┌──────────┐ ┌──────────┐ │ │ │ signin │ │ audit │ │ audit │ │ │ │ <<blob>> │ │ <<blob>> │ │ <<blob>> │ │ │ └────────────┘ └──────────┘ └──────────┘ │ │ │ └─storage account container───────────────────────┘

But, since each input stores the consumer group info (state, position, or offset) on a different blob, it's like each uses a dedicated consumer group. Even if they use the same consume group name.

That's why each data stream connected to the same data stream receives a copy of each message.

If you assign the same agent policy to multiple Elastic Agents, they end up using the same blob and meet the goal of sharing the load:

┌────────────────┐ │ adlogs │ │ <<event hub>> │ └────────────────┘ │ │ │ ┌────────────┴───────────┐ $Default $Default │ │ ┌──────────┼──────────┐ ┌──────────┼──────────┐ │ ▼ │ │ ▼ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ signin │ │ │ │ signin │ │ │ │ <<input>> │ │ │ │ <<input>> │ │ │ └────────────┘ │ │ └────────────┘ │ │ │ │ │ │ │ └─Filebeat─┼──────────┘ └─Filebeat─┼──────────┘ │ │ │ │ │ │ │ consumer group info │ │(state, position, or │ │ offset) │ │ │ ┌──────────┼────────────────────────┼─────────────┐ │ │ │ │ │ │ ┌────────────┐ │ │ │ │ │ signin │ │ │ │ └────▶│ <<blob>> │◀────┘ │ │ └────────────┘ │ │ │ └─storage account container───────────────────────┘

In this example, both Filebeat instances will use the same blob. The blob name is identified by name using the following pattern: filebeat-signinlogs-{{eventhub}}.

@lucabelluccini, what do you think is the best way to describe the role of the consumer group to a user who wants to set up an integration?

We agreed to make the following changes:

add a paragraph that describes that consumer groups allow Agent to collaborate and scale the throughput

underline that having an event hub per log type is a good thing

lucabelluccini · 2022-09-29T11:02:18Z

packages/azure/_dev/build/docs/README.md

+A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets.

-### Activity logs
+In most cases, you can use the default value of `$Default`.


Can we mention an example of when this should be configured?

Yep. I am not an expert here, but I think having a consumer group named after the log type it serves would help. I will update the doc accordingly.

packages/azure/_dev/build/docs/README.md

Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>

- put more emphasis on the "Azure service" concept; we want to make it a first-class citizen of this doc to leverage it when we discuss the recommended 1:1 mapping between service and event hub. - recommend installing the individual integration vs collective one - clarify the role of consumer group and storage account container as enablers of shared logs processing. - minor stuff (add more links and supporting diagrams)

zmoog · 2022-09-30T10:09:04Z

Hey @alaudazzi @lucabelluccini, I pushed an update that addresses the topic we discussed earlier today.

Let me know what you think! I'm more than happy to clarify, expand or fix errors.

packages/azure/_dev/build/docs/README.md

paolafrancesca · 2022-09-30T10:37:25Z

packages/azure/_dev/build/docs/README.md

+
+A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets.
+
+Consumer groups allow the Elastic Agents assigned to the same agent policy to work together on the logs processing to increase ingestion throughput if required.


same as above :)

I find also that mentioning the concept of “horizontal scaling" might increase clarity. but it could be a subjective opinion

Yeah, I agree, and I'm also stealing the "horizontal scaling" thing 😇

packages/azure/changelog.yml

First stab in the Requirements section

6ad6b22

I expanded this section adding a list of the required component to set up an Azure Logs integration. The section also add some details about each component and sub- elements the user need to know later in the Setup section.

zmoog self-assigned this Sep 26, 2022

zmoog added enhancement New feature or request Integration:azure Azure Logs Team:Cloud-Monitoring Label for the Cloud Monitoring team labels Sep 26, 2022

zmoog added 2 commits September 26, 2022 13:11

Move endpoint to the advanced section

f0a350a

Improve storage account container definition

78d2988

zmoog requested a review from alaudazzi September 26, 2022 13:45

zmoog mentioned this pull request Sep 26, 2022

Add Azure tutorial that shows Elastic Agent elastic/observability-docs#2174

Merged

8 tasks

Annotate storage account container diagram

f117c50

Highlighting the kind of data thaw flow between the components is probably helpful.

Update the Setup section

e1ee4a6

I moved some of the content from the Requirements section to the Setup section. There were too many details.

zmoog added 3 commits September 28, 2022 18:56

Edit the Setup section

c1295e9

Refine event hub setup information a little

Remove Reference section from the collective page

56a3378

Logs reference is probably more useful in the individual integration page.

Add reference to individual integrations

da7049c

zmoog marked this pull request as ready for review September 28, 2022 21:11

zmoog requested a review from a team as a code owner September 28, 2022 21:11

zmoog requested a review from colleenmcginnis September 28, 2022 21:26

alaudazzi approved these changes Sep 29, 2022

View reviewed changes

zmoog force-pushed the zmoog/azure-logs-doc-update branch from 08800f7 to aee15cc Compare September 29, 2022 10:29

Fix subject of the phrase 🤦‍♂️

0858939

zmoog force-pushed the zmoog/azure-logs-doc-update branch from aee15cc to 0858939 Compare September 29, 2022 10:34

Apply suggestions from code review

f043f29

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

lucabelluccini reviewed Sep 29, 2022

View reviewed changes

zmoog and others added 5 commits September 29, 2022 17:20

Update packages/azure/_dev/build/docs/README.md

2a18fcc

Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>

Apply suggestions from code review

1f11121

Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>

Update generated docs from feedback contributions

8e2498f

Requirements: update consumer group description

03a5e04

zmoog force-pushed the zmoog/azure-logs-doc-update branch from d930a4b to 69153c6 Compare September 30, 2022 10:05

paolafrancesca reviewed Sep 30, 2022

View reviewed changes

Apply suggestions from code review

25c75c4

zmoog force-pushed the zmoog/azure-logs-doc-update branch from 1208a48 to 25c75c4 Compare September 30, 2022 11:16

tommyers-elastic reviewed Oct 3, 2022

View reviewed changes

packages/azure/changelog.yml Outdated Show resolved Hide resolved

tommyers-elastic approved these changes Oct 3, 2022

View reviewed changes

Update link to PR 🤦‍♂️

f4803e0

zmoog merged commit e4a3193 into elastic:main Oct 3, 2022

zmoog deleted the zmoog/azure-logs-doc-update branch October 3, 2022 09:12

zmoog mentioned this pull request Oct 5, 2022

[META] Azure Logs integration doc update #4169

Closed

6 tasks


		A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets.

		Consumer groups allow the Elastic Agents assigned to the same agent policy to work together on the logs processing to increase ingestion throughput if required.

Azure Logs documentation update #4300

Azure Logs documentation update #4300

Uh oh!

Conversation

zmoog commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist

Related issues

Screenshots

elasticmachine commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Benchmarks report

Package azure 👍(1) 💚(2) 💔(6)

elasticmachine commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

elasticmachine commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🌐 Coverage report

colleenmcginnis commented Sep 27, 2022

zmoog commented Sep 27, 2022

zmoog commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

zmoog commented Sep 28, 2022

alaudazzi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alaudazzi commented Sep 29, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

zmoog commented Sep 30, 2022

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Labels

7 participants

zmoog commented Sep 26, 2022 •

edited

Loading

elasticmachine commented Sep 26, 2022 •

edited

Loading

Package `azure` 👍(1) 💚(2) 💔(6)

elasticmachine commented Sep 26, 2022 •

edited

Loading

elasticmachine commented Sep 26, 2022 •

edited

Loading

zmoog commented Sep 27, 2022 •

edited

Loading