How divepool works
The pipeline behind divepool.social.
Divepool helps you discover Bluesky accounts and posts by topic — semantic search across the AT Protocol network, powered by embeddings, keyword analysis, and clustering. This page is how it works.
How it works
We continuously pull posts from across the AT Protocol network — the open social system Bluesky runs on — and store them all. The ones long enough to carry meaning (at least 120 characters, in English or German) get embedded: each post becomes a numeric vector that captures what it’s about, so that posts on similar topics end up close together in vector space.
For each account, we group its posts into clusters based on what they’re about. Accounts with fewer than 50 long posts skip clustering. Every cluster then gets two summaries: a medoid — the real post closest to the cluster’s centre — and a short list of keywords, pulled out by comparing word use inside the cluster against the rest of the corpus. The medoid is the representative post; the keywords are the topic label.
All the surfaces are built on the same machinery. Type a topic and we turn your query into a vector, find the nearest posts across the network, then cluster and label them — so you get themed groups, not a flat list. Type a handle instead and you see that account’s own clusters with their keyword labels. The personalised feed turns those clusters into a stream of new posts on the same themes, with accounts you block or mute filtered out alongside posts we’ve already shown you.
Try it
Type a Bluesky handle to see what an account consistently posts about, or a topic to find posts on that subject across the network. Either way you get themed clusters with keyword labels. There’s also a discovery feed on Bluesky — the public, pre-baked version.
For developers, parts of divepool are exposed through a public API. For tinkerers, the experimental flags page exposes a handful of settings you can play with.
Caveats
- English and German only. Other languages are stored but not embedded yet.
- Text similarity, not engagement. We rank on what posts say, not on likes or reposts.
- Recency is a tiebreaker, not a driver. A great post from last week can beat a mediocre post from an hour ago.
- Coverage is partial. Not every account is synced, and short posts don’t make it past the 120-character floor.
About
Divepool is built by Jasper Rädisch, who also makes PingArthur (AT Protocol uptime monitoring) and Is Bluesky Dying (network-wide churn signal).
Spot a problem? DMs are open: @divepool.social.