Quantifying Attention: How We Know What Your Soul Doesn't Want to See
We came across a big problem in our data analysis: platforms record what you like, but few record what you dislike. Here's how we solve the missing half of the equation.
We came across a big problem in our data analysis. So many platforms record what you like, and few record what you dislike (or have that exportable via API or datasets). Even more than that, users may find it tedious to mark things as "not interested" etc.
This means when creating an Onairos Persona—that truly understands, maybe even better than you, who you are and what you like and dislike—we are missing half of the equation.
The Problem:
My dataset from X, ChatGPT, YouTube and Pinterest was 126 dislikes and over a thousand likes. From an ML perspective this is a biased dataset and not good to use, since any models trained purely on your sentiment data would then favor likes—it's been trained to see more likes, it will default to that more. This introduces false positives. Bad.
Inferring Dislikes
So how do we get dislikes then? Well we can use what we have available and a traditional ML technique: clustering.
First we retrieve for the baseline algorithm all your likes with timestamps and your feed/timeline. (For how we actually get this metadata or how we get your feed/timeline, stay tuned for our next breakdown.)
The basic concept is: you want to find the highest density of likes, and each of these forms a cluster. Then whatever isn't liked (marked as 1) within that cluster—you can cross-reference since you have the timestamps of likes and timestamps of your feed—you can mark as an implicit dislike (0).
Implicit dislikes are marked as 0 here but you can experiment with softer values (close to 0 but not 0).
The Logic
This is because each cluster is a timeperiod or mapped to a timeperiod of maximum attention, and so we assume within that timeperiod if the density of likes is large enough, you saw every post within and so everything you didn't like we can say you "disliked". You can see now why we mark it as a soft dislike.
The Limitations
This is inherently lossy. Not just the fact all ML algorithms are, especially clustering without including all items. But also since:
- You might have only been looking at the posts you liked, and were distracted elsewhere
- Actually you didn't dislike the post, felt neutral
- You liked so many of those posts before and it just got repetitive and you skipped it
- Etc etc.
But it is the best approximation you can get, and as we will show you can get pretty damn close.
Why Naive Clustering Fails
If you also just use an n-cluster algorithm however you fail spectacularly. Because you:
- Miss granularity/precision: if there are more accurate clusters than we can mark, we miss out
- Add unnecessary items within a cluster: noise that shouldn't be there
Also what determines a good cluster? The naive approach is number of clusters, or even items in a cluster. A cluster is a density approximation, and density is our approximation of attention.
Density ≈ Attention ≈ Understanding
Stay tuned for our next deep dive into how we retrieve your feed timeline and make this all work in practice.
Author
Zion Darko
Founder & CEO
Inventor and Dreamer and CEO.