-
Couldn't load subscription status.
- Fork 831
refactor(query): new hyperloglog and ndv #14585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Docker Image for PR
|
1 similar comment
Docker Image for PR
|
| I found the join order is not very accurate (due to inaccurate selectivity) , we need support #14587 later . |
Docker Image for PR
|
| case_name: &str, | ||
| snapshot_count: u32, | ||
| table_statistic_count: u32, | ||
| _table_statistic_count: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the arg be removed?
| // assert_eq!( | ||
| // ts_count, table_statistic_count, | ||
| // "case [{}], check snapshot statistics count", | ||
| // case_name | ||
| // ); | ||
| assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless?
| The segment file may grow too much. Converted to draft now. |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
New HyperLogLog implementation
simple_hll is a simple HyperLogLog implementation in rust. It is designed to be simple to use and less space to store (with Sparse HyperLogLog).
Adaptive Sparse serialization of hyperloglog
The serialized bytes could be
[1, 1<<14 = 16k]withP=14.Refactor table function
fuse_statisticand test the ndv results within expected error rate rangeAdditional param to set the error rate in approx_count_distinct function
approx_count_distinct(0.1)(number)Refactor column stats using hyperloglog with
P = 12with max size = 4k (1<<12)Acknowledgements
Some codes and tests are borrowed and inspired from:
Reference papers:
Thanks @jimexist @crepererum for the initial codes and the paper author: Otmar Ertl
Others
Tests
Type of change
This change is