Skip to content

Conversation

@jmh530
Copy link
Contributor

@jmh530 jmh530 commented May 26, 2020

Adds gmean.

I was not able to get this working with complex numbers because ^^ would not compile properly because pow is not defined in this case. There is a sqrt function in std.complex, but nothing more general.

Same issue as before wrt to not being able to run UTs locally.

///
F gmean(F = T)() @property
{
return (cast(F) prodAccumulator.prod) ^^ (cast(F) 1 / cast(F) count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace ^^ with mir.math.common: pow everywhere in the PR. ^^ is slow and depends on std.math.

@9il
Copy link
Member

9il commented May 26, 2020

Same issue as before wrt to not being able to run UTs locally.

To test locally you may want to strip version(mir_test) for selected tests and patch dub.sdl to run only patched tests.

@jmh530
Copy link
Contributor Author

jmh530 commented May 26, 2020

Same issue as before wrt to not being able to run UTs locally.

To test locally you may want to strip version(mir_test) for selected tests and patch dub.sdl to run only patched tests.

Thanks, this made fixing some bugs much easier. However, I still got the following error in the linking stage. All I did was create new versions for mir_test_gmean and then changed dub.sdl version for unittest.

Linking... lld-link: error: could not open 'libcmt.lib': no such file or directory lld-link: error: could not open 'OLDNAMES.lib': no such file or directory Error: linker exited with status 1 dmd failed with exit code 1. 
@9il
Copy link
Member

9il commented May 26, 2020

Does this related only to this package or others as well?

@9il
Copy link
Member

9il commented May 26, 2020

This looks like problem of LDC setup.

@jmh530
Copy link
Contributor Author

jmh530 commented May 26, 2020

@9il On Windows, lld is called if it cannot find a Microsoft linker is not found. I think I need to find install Visual Studio Community (or call dub with ldc2).

@codecov-commenter
Copy link

codecov-commenter commented May 27, 2020

Codecov Report

Merging #259 into master will increase coverage by 0.11%.
The diff coverage is 92.10%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #259 +/- ## ========================================== + Coverage 90.06% 90.17% +0.11%  ========================================== Files 52 52 Lines 10617 10677 +60 ========================================== + Hits 9562 9628 +66  + Misses 1055 1049 -6 
Impacted Files Coverage Δ
source/mir/math/numeric.d 89.36% <78.57%> (+5.04%) ⬆️
source/mir/interpolate/polynomial.d 95.83% <100.00%> (ø)
source/mir/math/stat.d 99.65% <100.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bed9830...89bf503. Read the comment docs.

@jmh530
Copy link
Contributor Author

jmh530 commented May 27, 2020

@9il I was able to fix my issue with running the unittests locally after getting the linker configured properly on this machine.

private
auto pow(T, U)(in T x, in U power) {
static if (is(U : int)) {
import mir.math.common: powi;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we need powi for gmean?

return pow(x, power);
} else {
import std.math: pow;
return pow(x, power);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use mir.math.common: pow ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@9il Yes I had realized that wasn't going to work when I was finally able to run the UTs locally, but hadn't gotten around to changing it. I instead created an nthroot function that can also calculate integer roots. I was getting some out of memory errors for uint.max.repeat(3) on my machine, but it works here. I'm happy to adjust the algorithm if you know of a way to reduce the risk of this issue. Or just remove it and keep pow by itself.

Copy link
Member

@9il 9il May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I nthroot(I, J)(in I x, in J n) if (isIntegral!(I) && isIntegral!J) 

How it can return an integer? gmean seems should never return an integer, but floating-point number even for integer input. The default FP type can be set to double. The same true for the median and mean functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gmean, I was thinking of something like gmean([3, 9, 64]) that would have an integer return of 12.

I was thinking about consistency with other functions in mir.math.stat. mean([1, 2]) will return a integer, I believe median([1, 2]) will as well. However, this is not true for hmean([1, 2]) because any integer except 1 should be inversed to a fraction and any fractional input should be a float when passing it in (I think there is a bug in the return type there that I will need to fix).

The only way to guarantee that mean([1, 2]) always produces 1.5 is to force the user to put a type in, as in mean!float([1, 2]), and probably disallow mean!int([1, 2]). I have no issue doing this, but it wasn't what we had done for the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for ignoring this issue before. Let's go with the FP return type for all similar functions, or just for gmean for this PR and fix others later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

I think it will be important to still allow something like mean([1, 2]) (and maybe even mean!int([1, 2])) but have the result be a floating point type. The problem is that if you try to just prevent the use of integral types, such as by placing template constraints on F with isFloatingPoint!F, then you also can no longer call it with user-defined types or complex types. However, I can instead adjust gmeanType with special handling for when it is not a floating point type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this may be more of an issue for std.traits.isFloatingPoint than the mir version.

@jmh530 jmh530 changed the title WIP: Add gmean Add gmean May 28, 2020
{
import mir.math.common: sqrt, pow;

assert(x > 0, "nthroot: Can only take nth root of positive numbers");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqrt and pow are defined and return NaN, let's give them to return it.

F gmean(F = T)() @property
if (isFloatingPoint!F)
{
return nthroot(cast(F) prodAccumulator.prod, count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prodAccumulator.prod may have exponent overflow (in case of separate exponent accumulation), while the gmean itself can be defined.

In the case of the exponent overflow, we can define gmean as

nthroot(cast(F) prodAccumulator.mantissa, count) * exp2(cast(F) prodAccumulator.exp / count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add some stuff to prod or disable the naive algorithm for gmean. I will not have time to work on this until later today most likely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the algorithm selection completely from ProdAccumulator and use only floating-point accumulation.

Later we can replace the hardware floating-point accumulation with extended software floating-point accumulation. I am working for it as part of Ion,
https://github.com/libmir/mir-ion/blob/master/source/mir/bignum/fp.d

It would make API simpler and the precision would be improved a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix I was thinking about for ProdAccumulator was actually pretty simple. I was going to add exp and manitssa member functions that would also work when the algorithm is naive and isFloatingPoint!T is true. Since nthroot requires the type to be floating point, then it would not be an issue.

I have no issue with future changes to the algorithm so that it uses what you are doing for mir.ion on the floating point side of things. However, naive was added so that prod can handle complex types and user-defined types. Removing that would eliminate the ability to call prod for those types, which some people may want. It also makes it easy to implement a prod function that only takes integer types and returns an integer type (ideally this would have its own implementation, but I considered that something for future work). Regardless, I think that has some uses. Precision isn't the concern for integer types, overflow is. And, where overflow is a concern, people should be using longs or std.bigint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need Prod only for statistics. User-defined and complex types can use reduce for product and they can't use gmean anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@9il Ok, I should have something later today or tomorrow fixing this.

It occurs to me that one of my issues with reduce is the same as the reason why phobos has both std.algorithm.iteration.reduce and std.algorithm.iteration.fold. reduce is not easily used in UFCS chains because the slice/array/range is the second parameter.

Perhaps I can add a separate fold function to mir.algorithm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I can add a separate fold function to mir.algorithm?

LGTM

}

} else static if (prodAlgo == ProdAlgo.naive &&
isFloatingPoint!T) {
Copy link
Member

@9il 9il May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove naive (complitely and the enum itself) and simplify the structure. Integers should be accumulated like a double by default.
Later Ion's Fp will replace double.

@9il 9il merged commit e823e67 into libmir:master May 30, 2020
@jmh530 jmh530 deleted the jmh530-gmean branch June 1, 2020 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants