Hey, I have been coding and preparing for the ATmosphereConf until the last second, so I did not have the time to write a blog post.
Please just check out divepool.social, the divepool feed on Bluesky or contact me with any kind of inquiry/feedback. Deeply appreciated!

You can also read the following slides to get some feeling for the tech I am using or the nerd I am.

ATmosphereConf 2026, Vancouver

Keywords vs. Embeddings

by Jasper @raedisch.net / @divepool.social

ATmosphereConf 2026, Vancouver

Disclaimers

  • My level: ELI12 (lateral mover)
  • Respecting !no-unauthenticated
  • No LLM training (yet)
ATmosphereConf 2026, Vancouver


ATmosphereConf 2026, Vancouver

TL;DR - the stack

  • Hardware: Hetzner and "old" Mac Studio M1 Ultras
  • Network: Tailscale, Caddy
  • Metrics: Prometheus
  • DB: PostgreSQL with pgvector AND pgvectorscale (DiskANN!)
  • DB interface: pgx, tern, sqlc
  • Languages: Go, JavaScript, Python, Rust, SQL
  • Compression: zstd
  • Language detection: lingua-rs
  • Embedder: EmbeddingGemma (Matryoshka!) via MLX
  • Algorithms: UMAP, HDBSCAN, TF-IDF
  • ATproto: indigo
  • Frontend: Rollup, TailwindCSS, Font Awesome, and minidenticons
  • Misc: Ansible, Overmind, SmartGit, Sketch, Ubuntu
ATmosphereConf 2026, Vancouver

TL;DR - the stack

  • Hardware: Hetzner and "old" Mac Studio M1 Ultras
  • Network: Tailscale, Caddy
  • Metrics: Prometheus
  • DB: PostgreSQL with pgvector AND pgvectorscale (DiskANN!)
  • DB interface: pgx, tern, sqlc
  • Languages: Go, JavaScript, Python, Rust, SQL
  • Compression: zstd
  • Language detection: lingua-rs
  • Embedder: EmbeddingGemma (Matryoshka!) via MLX
  • Algorithms: UMAP, HDBSCAN, TF-IDF
  • ATproto: indigo
  • Frontend: Rollup, TailwindCSS, Font Awesome, and minidenticons
  • Misc: Ansible, Overmind, SmartGit, Sketch, Ubuntu
ATmosphereConf 2026, Vancouver

Like(d) about Bluesky (Twitter)

  1. "Do we match?" (platonically) answered in seconds.
  2. Quantity over quality
  3. Long tail discovery
  4. Lateral navigation
  5. Data access
ATmosphereConf 2026, Vancouver

How to boost these functions?

ATmosphereConf 2026, Vancouver

Follower graph


ATmosphereConf 2026, Vancouver

TF-IDF

Keyword extraction by comparing term frequencies in a document with those across all documents

ATmosphereConf 2026, Vancouver

TF-IDF

Highly individual tokenizing pipeline (is that normal?)

  • Language detection (about 3% Bluesky mismatch rate)
  • Cleaning, hashtag splitting, stemming, ngrams…
ATmosphereConf 2026, Vancouver


ATmosphereConf 2026, Vancouver

(This is where Claude Code took over.)

ATmosphereConf 2026, Vancouver

Embeddings

  • Magic
  • Embed all the posts (that are English or German and longer than 120 chars)!
ATmosphereConf 2026, Vancouver

Naive approach

You are one (average of all your posts).

ATmosphereConf 2026, Vancouver

More nuanced approach (used by divepool discovery feed)

You contain multitudes (of post clusters).

ATmosphereConf 2026, Vancouver

How divepool.social works right now

Start from a random spot in the embedding space, retrieve the closest 1000 posts (1 per account), cluster them, show medoid per cluster, repeat.

ATmosphereConf 2026, Vancouver

Topic labels?

-> TF-IDF on embedding clusters

ATmosphereConf 2026, Vancouver

Slightly better labels?

-> Embed and rerank terms

ATmosphereConf 2026, Vancouver

Time for demo?!


ATmosphereConf 2026, Vancouver


ATmosphereConf 2026, Vancouver

Actually useful labels?

Outlook: fine tuned labeler?

(Smallest Qwen 3.5 already only takes 1 second for labeling.)

ATmosphereConf 2026, Vancouver

Further outlook

  1. Accessibility
    • improve UX
    • native apps (Expo?!)
    • API access?!
  2. Performance
    • embeddings are instant, indices are not (yet)
    • more stable setup (backups!!!)
  3. Business
    • is there a (B2C) product somewhere in here?
ATmosphereConf 2026, Vancouver

Thank you and please contact me:

@raedisch.net

(I am around on Monday.)

ATmosphereConf 2026, Vancouver