- Notifications
You must be signed in to change notification settings - Fork 25.5k
Initial implementation of random sampling #135660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, i left some small suggestions to improve readability
server/src/main/java/org/elasticsearch/ingest/SamplingService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/ingest/SamplingService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/ingest/SamplingService.java Outdated Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the core random sampling functionality for Elasticsearch behind a feature flag, focusing on the sampling logic and statistics collection without REST API endpoints.
- Adds complete sampling implementation with support for rates, conditional scripts, max samples, and statistics
- Implements memory-efficient storage using SoftReferences to prevent memory issues
- Provides comprehensive test coverage for all sampling scenarios and edge cases
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
SamplingService.java | Core implementation with sampling logic, statistics tracking, and data structures |
SamplingServiceTests.java | Comprehensive unit tests covering sampling scenarios, conditions, and limits |
SamplingServiceSampleStatsTests.java | Tests for statistics serialization and combination logic |
SamplingServiceRawDocumentTests.java | Tests for raw document serialization and mutation |
NodeConstruction.java | Updates service construction to include required dependencies |
IngestService.java | Simplifies sampling integration by removing unnecessary IndexRequest copying |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
server/src/main/java/org/elasticsearch/ingest/SamplingService.java Outdated Show resolved Hide resolved
server/src/test/java/org/elasticsearch/ingest/SamplingServiceSampleStatsTests.java Outdated Show resolved Hide resolved
server/src/test/java/org/elasticsearch/ingest/SamplingServiceRawDocumentTests.java Outdated Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This builds on #134766 and #134585, adding the logic to perform random sampling. For now, it is only tested via unit tests because there is no REST API yet to create a sampling configuration, or to retrieve a sample. Those are coming up next.
This implements the basics of sampling behind a feature flag, including:
It does not yet support the following, which will come in subsequent PRs: