Ever stumble upon an algorithm that's so mind-blowing you can't believe it isn't all over the place? Well, get ready for a wild ride with CVM, a magical formula unveiled in early 2023 that lets you count the number of unique elements in a stream using less memory than the number of elements themselves!
Picture this: You're trying to figure out how many unique words there are in Shakespeare's Hamlet. But here's the catch—you've only got a whiteboard that fits 100 words. 🤯 How do you do it?
Step 1: Start jotting down every unique word you see, ignoring repeats, until your board is full with 100 unique words.
Step 2: Flip a coin for each word. Heads, keep it. Tails, erase it. You'll end up with about 50 words. This is round 0.
Step 3: Repeat the process, but now, for every repeat word, flip a coin to decide if it stays. Heads, it stays. Tails, it's gone. Do another round of coin flips for the words on the board to decide if they stay at the end of the round. About 50 will remain. This is round 1.
Step 4: Continue this pattern, making it progressively harder to keep words as you go. By round 3, you need 3 heads in a row to keep a word. Round 4, 4 heads in a row. And so on.
By the end, each word's chance of sticking around is (1/2)^k. Say you finish Hamlet with 62 words left after 6 rounds. Divide 62 by (1/2)^6 and voilà—3,968 unique words. Hamlet actually has 3,967 unique words! 🎉
Incredible, right? With just a tiny memory of 100 words, you can estimate a count 40 times larger!
Imagine platforms like TikTok, Twitch, and YouTube using this to count live stream viewers. The memory savings would be astronomical!
It's incredible that in the 2020s, we can still discover new ways to count things. And the most mind-boggling fact is that, it takes probably like less than 50 lines of codes to implement the whole thing! This just make my day!
You can check out the paper here:
https://lnkd.in/gSzBSDu5
#Algorithms #DataScience #Efficiency
Can't wait!