Skip to content

Conversation

@theirix
Copy link
Contributor

@theirix theirix commented Aug 3, 2025

Which issue does this PR close?

Rationale for this change

Add Decimal128 and Decimal256 support for log UDF.
It's a most generic function, allowing for specifying a logarithm base, but by default it is log10, which makes it a good candidate for long decimals. The calculate_binary_math helper could simplify a lot of UDFs (a subject of future PRs).

Since decimals only support integer logarithms, the result is rounded and then converted back to a float, but it is still done on the i128/i256 level, not by rounding input parameters as before.

Also, if numbers are parsed as floats, there is precision lost due to floating-point handling. So the majority of tests are targeting parse_float_as_decimal=true as in #14612. Otherwise, the log result could differ by one or two due to rounding – see regression SLT.

Notably, we still miss math for 256-bits. Arrow's i256 type uses the [BigInt implementation], which could provide log10 at least, but we can extend decimal arithmetic in Arrow as well.

What changes are included in this PR?

Please note, there are more specialised functions log2, log10, ln (), which are not handled by LogFunc. They could be migrated to this UDF later, providing a base value explicitly.

Are these changes tested?

  • Unit tests
  • Regression SLT tests
  • Manual invocation of datafusion-cli
  • Manual comparison of results to other large decimal math implementations (Python, WolframAlpha)

Are there any user-facing changes?

No, except the precision of log calculation can increase for float inputs (since f64 is used). For decimals, results could become more accurate since no float downcasting is involved.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Aug 3, 2025
@theirix theirix marked this pull request as ready for review August 10, 2025 07:23
Comment on lines 70 to 99
Numeric(1),
Numeric(2),
Exact(vec![DataType::Float32]),
Exact(vec![DataType::Float64]),
Exact(vec![DataType::Float32, DataType::Float32]),
Exact(vec![DataType::Float64, DataType::Float64]),
Exact(vec![
DataType::Int64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Int64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Exact(...) signatures superseded by the Numeric(1) and Numeric(2)? Considering integers, floats and decimals are considered numeric types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Numeric is a class of types including decimals, floats and decimals.

The reason I included the third signature onwards is inability of Numeric(1) to support a null value, and we need to support select log(null); as per SLT and postgres feature parity.

If you have an idea how to make it work with nulls, I'd appreciate help. I don't like the current list of signatures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. It also seems to make queries like this fail:

1. query failed: DataFusion error: Error during planning: Internal error: Function 'log' failed to match any signature, errors: Error during planning: Function 'log' expects 1 arguments but received 2,Error during planning: For function 'log' Decimal128(3, 1) and Decimal256(76, 0) are not coercible to a common numeric type. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues No function matches the given name and argument types 'log(Decimal128(3, 1), Decimal256(76, 0))'. You might need to add explicit type casts. Candidate functions: log(Numeric(1)) log(Numeric(2)) [SQL] select log(10.0, 100000000000000000000000000000000000::decimal(76,0)); at /Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/decimal.slt:836 

I'll try look at the coercion code a bit and try to understand what's going on here 🤔

(I'm assuming Numeric(2) actually means coerce to a common type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a null coercion for numerics could help, because it worked with float32 in main. Null handling is hard because we have to conform with PostgreSQL and/or standard

Copy link
Contributor

@Jefffrey Jefffrey Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely remove these signatures:

Exact(vec![DataType::Float32]), Exact(vec![DataType::Float64]),

As they are indeed superseded by Numeric(1); Numeric(2) will need to remain with the rest of the signatures as it only guarantees coercing to a common numeric type across all the arguments which I don't think we want for all invocations of log. This is the part I misunderstood, I'll look into raising a PR to clarify some of the documentation later perhaps.

theirix and others added 3 commits September 8, 2025 23:59
 Refactoring decimal128 conversions Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

We can raise a separate issue for 256-bit support (the values that can't be casted to 128-bit)

@Jefffrey Jefffrey merged commit 572c204 into apache:main Sep 14, 2025
30 checks passed
@Jefffrey
Copy link
Contributor

Created #17555 just so we have something to track the NotYetImplemented item

@theirix
Copy link
Contributor Author

theirix commented Sep 14, 2025

Thank you, @Jefffrey !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

2 participants