- Notifications
You must be signed in to change notification settings - Fork 36
Add gmean #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gmean #259
Conversation
source/mir/math/stat.d Outdated
| /// | ||
| F gmean(F = T)() @property | ||
| { | ||
| return (cast(F) prodAccumulator.prod) ^^ (cast(F) 1 / cast(F) count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace ^^ with mir.math.common: pow everywhere in the PR. ^^ is slow and depends on std.math.
To test locally you may want to strip |
Thanks, this made fixing some bugs much easier. However, I still got the following error in the linking stage. All I did was create new versions for |
| Does this related only to this package or others as well? |
| This looks like problem of LDC setup. |
| @9il On Windows, lld is called if it cannot find a Microsoft linker is not found. I think I need to find install Visual Studio Community (or call dub with ldc2). |
Codecov Report
@@ Coverage Diff @@ ## master #259 +/- ## ========================================== + Coverage 90.06% 90.17% +0.11% ========================================== Files 52 52 Lines 10617 10677 +60 ========================================== + Hits 9562 9628 +66 + Misses 1055 1049 -6
Continue to review full report at Codecov.
|
| @9il I was able to fix my issue with running the unittests locally after getting the linker configured properly on this machine. |
source/mir/math/stat.d Outdated
| private | ||
| auto pow(T, U)(in T x, in U power) { | ||
| static if (is(U : int)) { | ||
| import mir.math.common: powi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we need powi for gmean?
source/mir/math/stat.d Outdated
| return pow(x, power); | ||
| } else { | ||
| import std.math: pow; | ||
| return pow(x, power); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we use mir.math.common: pow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@9il Yes I had realized that wasn't going to work when I was finally able to run the UTs locally, but hadn't gotten around to changing it. I instead created an nthroot function that can also calculate integer roots. I was getting some out of memory errors for uint.max.repeat(3) on my machine, but it works here. I'm happy to adjust the algorithm if you know of a way to reduce the risk of this issue. Or just remove it and keep pow by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I nthroot(I, J)(in I x, in J n) if (isIntegral!(I) && isIntegral!J) How it can return an integer? gmean seems should never return an integer, but floating-point number even for integer input. The default FP type can be set to double. The same true for the median and mean functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For gmean, I was thinking of something like gmean([3, 9, 64]) that would have an integer return of 12.
I was thinking about consistency with other functions in mir.math.stat. mean([1, 2]) will return a integer, I believe median([1, 2]) will as well. However, this is not true for hmean([1, 2]) because any integer except 1 should be inversed to a fraction and any fractional input should be a float when passing it in (I think there is a bug in the return type there that I will need to fix).
The only way to guarantee that mean([1, 2]) always produces 1.5 is to force the user to put a type in, as in mean!float([1, 2]), and probably disallow mean!int([1, 2]). I have no issue doing this, but it wasn't what we had done for the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for ignoring this issue before. Let's go with the FP return type for all similar functions, or just for gmean for this PR and fix others later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
I think it will be important to still allow something like mean([1, 2]) (and maybe even mean!int([1, 2])) but have the result be a floating point type. The problem is that if you try to just prevent the use of integral types, such as by placing template constraints on F with isFloatingPoint!F, then you also can no longer call it with user-defined types or complex types. However, I can instead adjust gmeanType with special handling for when it is not a floating point type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this may be more of an issue for std.traits.isFloatingPoint than the mir version.
source/mir/math/stat.d Outdated
| { | ||
| import mir.math.common: sqrt, pow; | ||
| | ||
| assert(x > 0, "nthroot: Can only take nth root of positive numbers"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqrt and pow are defined and return NaN, let's give them to return it.
source/mir/math/stat.d Outdated
| F gmean(F = T)() @property | ||
| if (isFloatingPoint!F) | ||
| { | ||
| return nthroot(cast(F) prodAccumulator.prod, count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prodAccumulator.prod may have exponent overflow (in case of separate exponent accumulation), while the gmean itself can be defined.
In the case of the exponent overflow, we can define gmean as
nthroot(cast(F) prodAccumulator.mantissa, count) * exp2(cast(F) prodAccumulator.exp / count)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add some stuff to prod or disable the naive algorithm for gmean. I will not have time to work on this until later today most likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the algorithm selection completely from ProdAccumulator and use only floating-point accumulation.
Later we can replace the hardware floating-point accumulation with extended software floating-point accumulation. I am working for it as part of Ion,
https://github.com/libmir/mir-ion/blob/master/source/mir/bignum/fp.d
It would make API simpler and the precision would be improved a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix I was thinking about for ProdAccumulator was actually pretty simple. I was going to add exp and manitssa member functions that would also work when the algorithm is naive and isFloatingPoint!T is true. Since nthroot requires the type to be floating point, then it would not be an issue.
I have no issue with future changes to the algorithm so that it uses what you are doing for mir.ion on the floating point side of things. However, naive was added so that prod can handle complex types and user-defined types. Removing that would eliminate the ability to call prod for those types, which some people may want. It also makes it easy to implement a prod function that only takes integer types and returns an integer type (ideally this would have its own implementation, but I considered that something for future work). Regardless, I think that has some uses. Precision isn't the concern for integer types, overflow is. And, where overflow is a concern, people should be using longs or std.bigint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need Prod only for statistics. User-defined and complex types can use reduce for product and they can't use gmean anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@9il Ok, I should have something later today or tomorrow fixing this.
It occurs to me that one of my issues with reduce is the same as the reason why phobos has both std.algorithm.iteration.reduce and std.algorithm.iteration.fold. reduce is not easily used in UFCS chains because the slice/array/range is the second parameter.
Perhaps I can add a separate fold function to mir.algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can add a separate fold function to mir.algorithm?
LGTM
source/mir/math/numeric.d Outdated
| } | ||
| | ||
| } else static if (prodAlgo == ProdAlgo.naive && | ||
| isFloatingPoint!T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove naive (complitely and the enum itself) and simplify the structure. Integers should be accumulated like a double by default.
Later Ion's Fp will replace double.
Adds gmean.
I was not able to get this working with complex numbers because
^^would not compile properly becausepowis not defined in this case. There is asqrtfunction instd.complex, but nothing more general.Same issue as before wrt to not being able to run UTs locally.