Skip to content

Smart Recommendations

Glean’s smart recommendation system is based on vector embedding technology. By learning your reading preferences, it calculates preference scores for articles to help you prioritize content you’re interested in.

When each article is ingested, the system automatically generates a vector representation (Embedding):

  1. Extract article title and summary
  2. Generate vector using Embedding model
  3. Store in Milvus vector database

The system learns preferences based on your feedback:

ActionSignal WeightDescription
👍 Like+1.0Explicit positive feedback
⭐ Bookmark+0.7Implicit positive feedback
👎 Dislike-1.0Explicit negative feedback

The system maintains a preference model for each user:

User Preference
├── Positive Vector (positive_embedding) # Aggregated vector of liked content
├── Positive Weight (positive_count) # Cumulative positive signals
├── Negative Vector (negative_embedding) # Aggregated vector of disliked content
├── Negative Weight (negative_count) # Cumulative negative signals
├── Source Affinity (source_affinity) # Positive/negative stats per feed
└── Author Affinity (author_affinity) # Positive/negative stats per author

Preference scores range from 0-100:

Score = (positive_similarity - negative_similarity + 1) / 2 × 100 × confidence + 50 × (1 - confidence)
  • Positive Similarity: Cosine similarity between article and positive preference vector
  • Negative Similarity: Cosine similarity between article and negative preference vector
  • Confidence: Based on feedback count, more feedback means higher confidence

When new users have no feedback data:

  • All articles default to score 50
  • Displayed sorted by time
  • Gradually personalized as feedback increases

In smart recommendation view mode, articles are displayed in layers by preference score:

LayerScore RangeDisplaySorting
📌 Recommended≥ 70Pinned at topBy score descending
📰 Normal40-70Normal displayBy time descending
🔽 May Not Interest< 40Collapsed by defaultBy time descending

You can adjust layering thresholds in settings:

SettingDefaultDescription
Recommendation Score Threshold70Above this score shows in recommended layer
Not Interested Score Threshold40Below this score shows in collapsed layer

When you browse the article list, the system:

  1. Retrieves article vector representations
  2. Calculates similarity with your preference vectors
  3. Computes preference scores in real-time
  4. Displays articles in layers by score

After each feedback, preference vectors are updated using incremental moving average:

# Simplified example
new_vector = (old_vector × old_weight + article_vector × signal_weight) / (old_weight + signal_weight)

This approach:

  • Avoids recalculating entire history
  • New feedback has greater impact
  • Old preferences gradually decay

Preference updates are executed asynchronously via background tasks, not affecting main operation response times.

The system supports multiple embedding providers:

Runs locally, no API key required:

Terminal window
EMBEDDING_PROVIDER=sentence-transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384

Use OpenAI Embedding API:

Terminal window
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSION=1536
EMBEDDING_API_KEY=sk-xxx

Use ByteDance’s Embedding service:

Terminal window
EMBEDDING_PROVIDER=volc-engine
EMBEDDING_MODEL=doubao-embedding
EMBEDDING_DIMENSION=1024
EMBEDDING_API_KEY=your-api-key
  1. Actively give like/dislike feedback when browsing articles
  2. First 10-20 feedbacks have the greatest impact on the model
  3. Bookmarks also count as positive signals
  • Regularly provide feedback to keep the model updated
  • Mark disliked content too, helps filter noise
  • If recommendations are inaccurate, check if you’ve given enough feedback