Skip to content

Conversation

@kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Aug 12, 2025

Proposed commit message

This PR removed all regex usage in the awsfirehose ingest pipeline. The script now uses standard Painless string methods (.contains(), .split(), .length(), etc.) to identify log types in order to lower the risk of high CPU consumption.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

elastic-package stack down elastic-package stack up -d --services=elasticsearch $(elastic-package stack shellinit) cd integrations/packages/awsfirehose elastic-package test pipeline --data-streams logs 

Benchmark results

Both results are ran 4 times and calculated an average with command:

elastic-package benchmark pipeline --data-streams logs 

With the change in this PR

╭─────────────────────────╮ │ parameters │ ├──────────────────┬──────┤ │ source_doc_count │ 16 │ │ doc_count │ 1000 │ ╰──────────────────┴──────╯ ╭───────────────────────────╮ │ pipeline_performance │ ├─────────────────┬─────────┤ │ processing_time │ 0.08s │ │ eps │ 12440.54│ ╰─────────────────┴─────────╯ ╭────────────────────────────────────╮ │ procs_by_total_time │ ├───────────────────────────┬────────┤ │ script @ default.yml:10 │ 78.63% │ │ reroute @ default.yml:205 │ 1.24% │ │ reroute @ default.yml:189 │ 1.24% │ │ set @ default.yml:7 │ 1.24% │ ╰───────────────────────────┴────────╯ ╭─────────────────────────────────────╮ │ procs_by_avg_time_per_doc │ ├───────────────────────────┬─────────┤ │ script @ default.yml:10 │ 65µs │ │ reroute @ default.yml:205 │ 5.291µs │ │ reroute @ default.yml:189 │ 5.291µs │ │ set @ default.yml:7 │ 1µs │ ╰───────────────────────────┴─────────╯ 

Without the change in this PR

╭─────────────────────────╮ │ parameters │ ├──────────────────┬──────┤ │ source_doc_count │ 16 │ │ doc_count │ 1000 │ ╰──────────────────┴──────╯ ╭────────────────────────────╮ │ pipeline_performance │ ├─────────────────┬──────────┤ │ processing_time │ 0.11s │ │ eps │ 9007.51 │ ╰─────────────────┴──────────╯ ╭───────────────────────────────────╮ │ procs_by_total_time │ ├──────────────────────────┬────────┤ │ script @ default.yml:10 │ 79.52% │ │ reroute @ default.yml:78 │ 0.90% │ │ reroute @ default.yml:62 │ 0.90% │ │ set @ default.yml:7 │ 0.90% │ ╰──────────────────────────┴────────╯ ╭────────────────────────────────────╮ │ procs_by_avg_time_per_doc │ ├──────────────────────────┬─────────┤ │ script @ default.yml:10 │ 89.5µs │ │ reroute @ default.yml:78 │ 5.291µs │ │ reroute @ default.yml:62 │ 5.291µs │ │ set @ default.yml:7 │ 1µs │ ╰──────────────────────────┴─────────╯ 
@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner August 12, 2025 19:51
@andrewkroh andrewkroh added Integration:awsfirehose Amazon Data Firehose Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Aug 12, 2025
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Copy link
Contributor

@Kavindu-Dodan Kavindu-Dodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it. Per shown results it's almost ~40% performance increase

- version: "1.8.2"
changes:
- description: Remove regex from ingest pipeline
type: bugfix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be an enhancement and 1.9.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I dont have a strong preference as this is an improvement of previous main change. I will let Kaiyan decide on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated on this too... But I think this is more of a bugfix on the 1.8.0 version instead of a new feature. Thats why I kept it a bugfix

// AWS CloudFront Logs
if (tokenCount==33 && ctx.message =~ /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s[a-zA-Z0-9-]+\s\d+\s(\d+\.\d+\.\d+\.\d+|[a-fA-F0-9:]+)/) {
ctx.event.dataset = 'aws.cloudfront_logs';
// AWS ELB Logs - Updated logic to handle multiple log types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

…eline/default.yml Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @kaiyan-sheng

@kaiyan-sheng kaiyan-sheng merged commit e0c5a36 into elastic:main Aug 13, 2025
9 checks passed
@kaiyan-sheng kaiyan-sheng deleted the firehose_pipeline branch August 13, 2025 14:33
@elastic-vault-github-plugin-prod

Package awsfirehose - 1.8.2 containing this change is available at https://epr.elastic.co/package/awsfirehose/1.8.2/

robester0403 pushed a commit to robester0403/integrations that referenced this pull request Aug 14, 2025
@andrewkroh andrewkroh added the bugfix Pull request that fixes a bug issue label Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes a bug issue Integration:awsfirehose Amazon Data Firehose Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services]

6 participants