-   Notifications  
You must be signed in to change notification settings  - Fork 509
 
Remove regex from ingest pipeline #14914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
 🚀 Benchmarks reportTo see the full report comment with   |  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it. Per shown results it's almost ~40% performance increase
| - version: "1.8.2" | ||
| changes: | ||
| - description: Remove regex from ingest pipeline | ||
| type: bugfix | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be an enhancement and 1.9.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I dont have a strong preference as this is an improvement of previous main change. I will let Kaiyan decide on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated on this too... But I think this is more of a bugfix on the 1.8.0 version instead of a new feature. Thats why I kept it a bugfix
   packages/awsfirehose/data_stream/logs/elasticsearch/ingest_pipeline/default.yml  Outdated   Show resolved Hide resolved  
 | // AWS CloudFront Logs | ||
| if (tokenCount==33 && ctx.message =~ /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s[a-zA-Z0-9-]+\s\d+\s(\d+\.\d+\.\d+\.\d+|[a-fA-F0-9:]+)/) { | ||
| ctx.event.dataset = 'aws.cloudfront_logs'; | ||
| // AWS ELB Logs - Updated logic to handle multiple log types | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…eline/default.yml Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
 💚 Build Succeeded
 History
  |  
   |  
|   Package awsfirehose - 1.8.2 containing this change is available at https://epr.elastic.co/package/awsfirehose/1.8.2/  |  




Proposed commit message
This PR removed all regex usage in the
awsfirehoseingest pipeline. The script now uses standard Painless string methods (.contains(), .split(), .length(), etc.) to identify log types in order to lower the risk of high CPU consumption.Checklist
changelog.ymlfile.How to test this PR locally
Benchmark results
Both results are ran 4 times and calculated an average with command:
With the change in this PR
Without the change in this PR