Add gmean #259

jmh530 · 2020-05-26T00:11:29Z

Adds gmean.

I was not able to get this working with complex numbers because ^^ would not compile properly because pow is not defined in this case. There is a sqrt function in std.complex, but nothing more general.

Same issue as before wrt to not being able to run UTs locally.

9il · 2020-05-26T07:11:40Z

source/mir/math/stat.d

+ ///
+ F gmean(F = T)() @property
+ {
+ return (cast(F) prodAccumulator.prod) ^^ (cast(F) 1 / cast(F) count);


Please replace ^^ with mir.math.common: pow everywhere in the PR. ^^ is slow and depends on std.math.

9il · 2020-05-26T07:59:43Z

Same issue as before wrt to not being able to run UTs locally.

To test locally you may want to strip version(mir_test) for selected tests and patch dub.sdl to run only patched tests.

jmh530 · 2020-05-26T10:58:57Z

Same issue as before wrt to not being able to run UTs locally.

To test locally you may want to strip version(mir_test) for selected tests and patch dub.sdl to run only patched tests.

Thanks, this made fixing some bugs much easier. However, I still got the following error in the linking stage. All I did was create new versions for mir_test_gmean and then changed dub.sdl version for unittest.

Linking... lld-link: error: could not open 'libcmt.lib': no such file or directory lld-link: error: could not open 'OLDNAMES.lib': no such file or directory Error: linker exited with status 1 dmd failed with exit code 1.

9il · 2020-05-26T12:47:42Z

Does this related only to this package or others as well?

9il · 2020-05-26T12:48:31Z

This looks like problem of LDC setup.

jmh530 · 2020-05-26T15:07:42Z

@9il On Windows, lld is called if it cannot find a Microsoft linker is not found. I think I need to find install Visual Studio Community (or call dub with ldc2).

codecov-commenter · 2020-05-27T01:38:46Z

Codecov Report

Merging #259 into master will increase coverage by 0.11%.
The diff coverage is 92.10%.

@@ Coverage Diff @@ ## master #259 +/- ## ========================================== + Coverage 90.06% 90.17% +0.11%  ========================================== Files 52 52 Lines 10617 10677 +60 ========================================== + Hits 9562 9628 +66  + Misses 1055 1049 -6

Impacted Files	Coverage Δ
source/mir/math/numeric.d	`89.36% <78.57%> (+5.04%)`	⬆️
source/mir/interpolate/polynomial.d	`95.83% <100.00%> (ø)`
source/mir/math/stat.d	`99.65% <100.00%> (+0.10%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bed9830...89bf503. Read the comment docs.

jmh530 · 2020-05-27T01:39:53Z

@9il I was able to fix my issue with running the unittests locally after getting the linker configured properly on this machine.

9il · 2020-05-27T04:04:15Z

source/mir/math/stat.d

+private
+auto pow(T, U)(in T x, in U power) {
+ static if (is(U : int)) {
+ import mir.math.common: powi;


how can we need powi for gmean?

9il · 2020-05-27T04:05:19Z

source/mir/math/stat.d

+ return pow(x, power);
+ } else {
+ import std.math: pow;
+ return pow(x, power);


why can't we use mir.math.common: pow ?

@9il Yes I had realized that wasn't going to work when I was finally able to run the UTs locally, but hadn't gotten around to changing it. I instead created an nthroot function that can also calculate integer roots. I was getting some out of memory errors for uint.max.repeat(3) on my machine, but it works here. I'm happy to adjust the algorithm if you know of a way to reduce the risk of this issue. Or just remove it and keep pow by itself.

I nthroot(I, J)(in I x, in J n) if (isIntegral!(I) && isIntegral!J)

How it can return an integer? gmean seems should never return an integer, but floating-point number even for integer input. The default FP type can be set to double. The same true for the median and mean functions.

For gmean, I was thinking of something like gmean([3, 9, 64]) that would have an integer return of 12.

I was thinking about consistency with other functions in mir.math.stat. mean([1, 2]) will return a integer, I believe median([1, 2]) will as well. However, this is not true for hmean([1, 2]) because any integer except 1 should be inversed to a fraction and any fractional input should be a float when passing it in (I think there is a bug in the return type there that I will need to fix).

The only way to guarantee that mean([1, 2]) always produces 1.5 is to force the user to put a type in, as in mean!float([1, 2]), and probably disallow mean!int([1, 2]). I have no issue doing this, but it wasn't what we had done for the others.

I am sorry for ignoring this issue before. Let's go with the FP return type for all similar functions, or just for gmean for this PR and fix others later.

No problem.

I think it will be important to still allow something like mean([1, 2]) (and maybe even mean!int([1, 2])) but have the result be a floating point type. The problem is that if you try to just prevent the use of integral types, such as by placing template constraints on F with isFloatingPoint!F, then you also can no longer call it with user-defined types or complex types. However, I can instead adjust gmeanType with special handling for when it is not a floating point type.

Hmm, this may be more of an issue for std.traits.isFloatingPoint than the mir version.

9il · 2020-05-28T14:11:54Z

source/mir/math/stat.d

+{
+ import mir.math.common: sqrt, pow;
+
+ assert(x > 0, "nthroot: Can only take nth root of positive numbers");


sqrt and pow are defined and return NaN, let's give them to return it.

9il · 2020-05-28T14:24:59Z

source/mir/math/stat.d

+ F gmean(F = T)() @property
+ if (isFloatingPoint!F)
+ {
+ return nthroot(cast(F) prodAccumulator.prod, count);


prodAccumulator.prod may have exponent overflow (in case of separate exponent accumulation), while the gmean itself can be defined.

In the case of the exponent overflow, we can define gmean as

nthroot(cast(F) prodAccumulator.mantissa, count) * exp2(cast(F) prodAccumulator.exp / count)

I need to add some stuff to prod or disable the naive algorithm for gmean. I will not have time to work on this until later today most likely.

We can remove the algorithm selection completely from ProdAccumulator and use only floating-point accumulation.

Later we can replace the hardware floating-point accumulation with extended software floating-point accumulation. I am working for it as part of Ion,
https://github.com/libmir/mir-ion/blob/master/source/mir/bignum/fp.d

It would make API simpler and the precision would be improved a lot.

The fix I was thinking about for ProdAccumulator was actually pretty simple. I was going to add exp and manitssa member functions that would also work when the algorithm is naive and isFloatingPoint!T is true. Since nthroot requires the type to be floating point, then it would not be an issue.

I have no issue with future changes to the algorithm so that it uses what you are doing for mir.ion on the floating point side of things. However, naive was added so that prod can handle complex types and user-defined types. Removing that would eliminate the ability to call prod for those types, which some people may want. It also makes it easy to implement a prod function that only takes integer types and returns an integer type (ideally this would have its own implementation, but I considered that something for future work). Regardless, I think that has some uses. Precision isn't the concern for integer types, overflow is. And, where overflow is a concern, people should be using longs or std.bigint.

We need Prod only for statistics. User-defined and complex types can use reduce for product and they can't use gmean anyway.

@9il Ok, I should have something later today or tomorrow fixing this.

It occurs to me that one of my issues with reduce is the same as the reason why phobos has both std.algorithm.iteration.reduce and std.algorithm.iteration.fold. reduce is not easily used in UFCS chains because the slice/array/range is the second parameter.

Perhaps I can add a separate fold function to mir.algorithm?

Perhaps I can add a separate fold function to mir.algorithm?

LGTM

9il · 2020-05-29T12:33:29Z

source/mir/math/numeric.d

+ }
+
+ } else static if (prodAlgo == ProdAlgo.naive && 
+ isFloatingPoint!T) {


please remove naive (complitely and the enum itself) and simplify the structure. Integers should be accumulated like a double by default.
Later Ion's Fp will replace double.

9il reviewed May 26, 2020

View reviewed changes

Add gmean

7238ae5

jmh530 force-pushed the jmh530-gmean branch from 20ce9d8 to 7238ae5 Compare May 27, 2020 01:34

9il reviewed May 27, 2020

View reviewed changes

jmh530 added 4 commits May 27, 2020 08:16

Add nthroot

407a8c2

Remove commented code

c6ba9d5

Fix FP in gmean

3169fc0

Resolve gmean integer/double issue

2fa4ec0

jmh530 changed the title ~~WIP: Add gmean~~ Add gmean May 28, 2020

9il reviewed May 28, 2020

View reviewed changes

jmh530 added 3 commits May 28, 2020 10:25

Remove nthroot assert

1522469

Address GMeanAccumulator.gmean overflow concerns

7c7d16e

Add x/exp/mantissa to ProdAccumulator

98cac69

9il reviewed May 29, 2020

View reviewed changes

Simplify prod

89bf503

9il merged commit e823e67 into libmir:master May 30, 2020

jmh530 deleted the jmh530-gmean branch June 1, 2020 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gmean #259

Add gmean #259

Uh oh!

jmh530 commented May 26, 2020

9il May 26, 2020

9il commented May 26, 2020

jmh530 commented May 26, 2020

9il commented May 26, 2020

9il commented May 26, 2020

jmh530 commented May 26, 2020

codecov-commenter commented May 27, 2020 •

edited

Loading

jmh530 commented May 27, 2020

9il May 27, 2020

9il May 27, 2020

jmh530 May 27, 2020

9il May 27, 2020 •

edited

Loading

jmh530 May 27, 2020

9il May 27, 2020

jmh530 May 27, 2020

jmh530 May 27, 2020

9il May 28, 2020

9il May 28, 2020

jmh530 May 28, 2020

9il May 28, 2020

jmh530 May 28, 2020

9il May 29, 2020

jmh530 May 29, 2020

9il May 29, 2020

9il May 29, 2020 •

edited

Loading

Labels

3 participants

Add gmean #259

Add gmean #259

Uh oh!

Conversation

jmh530 commented May 26, 2020

Choose a reason for hiding this comment

9il commented May 26, 2020

jmh530 commented May 26, 2020

9il commented May 26, 2020

9il commented May 26, 2020

jmh530 commented May 26, 2020

codecov-commenter commented May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

jmh530 commented May 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

9il May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

9il May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Labels

3 participants

codecov-commenter commented May 27, 2020 •

edited

Loading

9il May 27, 2020 •

edited

Loading

9il May 29, 2020 •

edited

Loading