First off. I am very embarrassed to have made such an obvious mistake in the opening paragraph. It has been corrected, please forgive gods of bits and bytes...
I also suck at titles. So those of you who pointed out that it should be 'How to estimate the cardinality of...' are correct, no argument from me there.
Re bloom filters. You can use bloom's to estimate the number of occurrences of a given key but you can't use a bloom to get the total cardinality of the set.
One commenter suggested using Cassandra. We do use Cassandra for counting (accurately) the number of times a URL has been shared but we do not use it for our primary analytics engine. Also, I'm not sure exactly what you are proposing here though. Store all IDs in C* and then count the number of keys? What if I want to bucket the data by day, hour, week, month, etc? Would I have to store the key n times for each bucket? As a coincidence Jonathan Ellis, of Cassandra fame, was the inspiration for our bloom filter implementation. Details are refenced in our github project.
The bloom filter could be used in a similar way, you have X hash functions used and Y total bits, so for a bloom filter with Z bits set you anticipate N unique items were hashed. Probably not as good as the HyperLogLog in terms of memory space / % error, but there you go.
I also suck at titles. So those of you who pointed out that it should be 'How to estimate the cardinality of...' are correct, no argument from me there.
Re bloom filters. You can use bloom's to estimate the number of occurrences of a given key but you can't use a bloom to get the total cardinality of the set.
One commenter suggested using Cassandra. We do use Cassandra for counting (accurately) the number of times a URL has been shared but we do not use it for our primary analytics engine. Also, I'm not sure exactly what you are proposing here though. Store all IDs in C* and then count the number of keys? What if I want to bucket the data by day, hour, week, month, etc? Would I have to store the key n times for each bucket? As a coincidence Jonathan Ellis, of Cassandra fame, was the inspiration for our bloom filter implementation. Details are refenced in our github project.