| Copyright | (c) Edward Kmett 2013-2015 |
|---|---|
| License | BSD3 |
| Maintainer | Edward Kmett <ekmett@gmail.com> |
| Stability | experimental |
| Portability | non-portable |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Data.HyperLogLog
Contents
Description
See the original paper for details: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
Synopsis
- data HyperLogLog s p
- class HasHyperLogLog a s p | a -> s p where
- hyperLogLog :: Lens' a (HyperLogLog s p)
- size :: Reifies p Integer => HyperLogLog s p -> Approximate Int64
- intersectionSize :: Reifies p Integer => [HyperLogLog s p] -> Approximate Int64
- insert :: forall s p a. (Reifies s SipKey, Reifies p Integer, Serial a) => a -> HyperLogLog s p -> HyperLogLog s p
- insertHash :: Reifies p Integer => Word32 -> HyperLogLog s p -> HyperLogLog s p
- cast :: forall p q s. (Reifies p Integer, Reifies q Integer) => HyperLogLog s p -> Maybe (HyperLogLog s q)
- coerceConfig :: forall p q r s. (Reifies p Integer, Reifies q Integer, Reifies r SipKey, Reifies s SipKey) => Maybe (Coercion (HyperLogLog r p) (HyperLogLog s q))
HyperLogLog
data HyperLogLog s p Source #
Initialize a new counter:
>>>runHyperLogLog (mempty :: DefaultHyperLogLog 3) == V.fromList [0,0,0,0,0,0,0,0]True
Please note how you specify a counter size with the n invocation. Sizes of up to 16 are valid, with 7 being a likely good minimum for decent accuracy.
Let's count a list of unique items and get the latest estimate:
>>>size (foldr insert mempty [1..10] :: DefaultHyperLogLog 4)Approximate {_confidence = 0.9972, _lo = 2, _estimate = 9, _hi = 17}
Note how insert can be used to add new observations to the approximate counter.
The s type parameter configures the SipKey that is passed to the hash function when inserting a new value. Note that if cryptographic security is a primary consideration, it is recommended that you create HyperLogLog values using generateHyperLogLog so that the SipKey is randomly generated using system entropy. In contrast, the HyperLogLog data constructor and the mempty method allow constructing HyperLogLog values with fixed SipKeys, which can result in exponentially inaccurate estimates if exploited by an adversary. (See https://eprint.iacr.org/2021/1139.)
Instances
class HasHyperLogLog a s p | a -> s p where Source #
Methods
hyperLogLog :: Lens' a (HyperLogLog s p) Source #
Instances
| HasHyperLogLog (HyperLogLog s p) (s :: k1) (p :: k2) Source # | |
Defined in Data.HyperLogLog.Type Methods hyperLogLog :: Lens' (HyperLogLog s p) (HyperLogLog s p) Source # | |
size :: Reifies p Integer => HyperLogLog s p -> Approximate Int64 Source #
Approximate size of our set
intersectionSize :: Reifies p Integer => [HyperLogLog s p] -> Approximate Int64 Source #
insert :: forall s p a. (Reifies s SipKey, Reifies p Integer, Serial a) => a -> HyperLogLog s p -> HyperLogLog s p Source #
insertHash :: Reifies p Integer => Word32 -> HyperLogLog s p -> HyperLogLog s p Source #
Insert a value that has already been hashed by whatever user defined hash function you want.
cast :: forall p q s. (Reifies p Integer, Reifies q Integer) => HyperLogLog s p -> Maybe (HyperLogLog s q) Source #
coerceConfig :: forall p q r s. (Reifies p Integer, Reifies q Integer, Reifies r SipKey, Reifies s SipKey) => Maybe (Coercion (HyperLogLog r p) (HyperLogLog s q)) Source #
If the two types p and q reify the same configuration, and if the two types r and s reify the same SipKey, then we can coerce between and HyperLogLog r p. We do this by building a hole in the HyperLogLog s qnominal role for the configuration parameter.