- Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Support log for Decimal128 and Decimal256 #17023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: theirix <theirix@gmail.com>
| Numeric(1), | ||
| Numeric(2), | ||
| Exact(vec![DataType::Float32]), | ||
| Exact(vec![DataType::Float64]), | ||
| Exact(vec![DataType::Float32, DataType::Float32]), | ||
| Exact(vec![DataType::Float64, DataType::Float64]), | ||
| Exact(vec![ | ||
| DataType::Int64, | ||
| DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
| ]), | ||
| Exact(vec![ | ||
| DataType::Float32, | ||
| DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
| ]), | ||
| Exact(vec![ | ||
| DataType::Float64, | ||
| DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
| ]), | ||
| Exact(vec![ | ||
| DataType::Int64, | ||
| DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
| ]), | ||
| Exact(vec![ | ||
| DataType::Float32, | ||
| DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
| ]), | ||
| Exact(vec![ | ||
| DataType::Float64, | ||
| DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
| ]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Exact(...) signatures superseded by the Numeric(1) and Numeric(2)? Considering integers, floats and decimals are considered numeric types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Numeric is a class of types including decimals, floats and decimals.
The reason I included the third signature onwards is inability of Numeric(1) to support a null value, and we need to support select log(null); as per SLT and postgres feature parity.
If you have an idea how to make it work with nulls, I'd appreciate help. I don't like the current list of signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. It also seems to make queries like this fail:
1. query failed: DataFusion error: Error during planning: Internal error: Function 'log' failed to match any signature, errors: Error during planning: Function 'log' expects 1 arguments but received 2,Error during planning: For function 'log' Decimal128(3, 1) and Decimal256(76, 0) are not coercible to a common numeric type. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues No function matches the given name and argument types 'log(Decimal128(3, 1), Decimal256(76, 0))'. You might need to add explicit type casts. Candidate functions: log(Numeric(1)) log(Numeric(2)) [SQL] select log(10.0, 100000000000000000000000000000000000::decimal(76,0)); at /Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/decimal.slt:836 I'll try look at the coercion code a bit and try to understand what's going on here 🤔
(I'm assuming Numeric(2) actually means coerce to a common type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a null coercion for numerics could help, because it worked with float32 in main. Null handling is hard because we have to conform with PostgreSQL and/or standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely remove these signatures:
Exact(vec![DataType::Float32]), Exact(vec![DataType::Float64]),As they are indeed superseded by Numeric(1); Numeric(2) will need to remain with the rest of the signatures as it only guarantees coercing to a common numeric type across all the arguments which I don't think we want for all invocations of log. This is the part I misunderstood, I'll look into raising a PR to clarify some of the documentation later perhaps.
Refactoring decimal128 conversions Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
We can raise a separate issue for 256-bit support (the values that can't be casted to 128-bit)
| Created #17555 just so we have something to track the NotYetImplemented item |
| Thank you, @Jefffrey ! |
Which issue does this PR close?
Rationale for this change
Add Decimal128 and Decimal256 support for
logUDF.It's a most generic function, allowing for specifying a logarithm base, but by default it is
log10, which makes it a good candidate for long decimals. Thecalculate_binary_mathhelper could simplify a lot of UDFs (a subject of future PRs).Since decimals only support integer logarithms, the result is rounded and then converted back to a float, but it is still done on the
i128/i256level, not by rounding input parameters as before.Also, if numbers are parsed as floats, there is precision lost due to floating-point handling. So the majority of tests are targeting
parse_float_as_decimal=trueas in #14612. Otherwise, the log result could differ by one or two due to rounding – see regression SLT.Notably, we still miss math for 256-bits. Arrow's i256 type uses the [BigInt implementation], which could provide log10 at least, but we can extend decimal arithmetic in Arrow as well.
What changes are included in this PR?
ScalarValue::{new_one,new_zero,new_ten,distance}support forDecimal128andDecimal256#16831 for zero scalePlease note, there are more specialised functions
log2,log10,ln(), which are not handled byLogFunc. They could be migrated to this UDF later, providing a base value explicitly.Are these changes tested?
Are there any user-facing changes?
No, except the precision of log calculation can increase for float inputs (since f64 is used). For decimals, results could become more accurate since no float downcasting is involved.