Alignment-Lab-AI
diff --git a/‎README.md‎
Lines changed: 40 additions & 9 deletions b/‎README.md‎
Lines changed: 40 additions & 9 deletions
diff --git a/‎assets/images/pyversity_logo.png‎
1.68 MB b/‎assets/images/pyversity_logo.png‎
1.68 MB
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/pyversity/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎src/pyversity/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/pyversity/core.py‎ renamed to ‎src/pyversity/pyversity.py‎ b/‎src/pyversity/core.py‎ renamed to ‎src/pyversity/pyversity.py‎
@@ -1,9 +1,21 @@
-# Pyversity — Diversified Re‑Ranking for Retrieval
 
-Pyversity is a small, fast library for diversifying retrieval results.
+<h2 align="center">
+ <img width="35%" alt="Pyversity logo" src="assets/images/pyversity_logo.png"><br/>
+ Fast Diversification for Retrieval
+</h2>
+
+<div align="center">
+
+[Quickstart](#quickstart) •
+[Supported Strategies](#supported-strategies) •
+[Motivation](#motivation)
+
+</div>
+
+Pyversity is a fast, lightweight library for diversifying retrieval results.
 Retrieval systems often return highly similar items. Pyversity efficiently re-ranks these results to encourage diversity, surfacing items that remain relevant but less redundant.
 
-It implements several popular strategies such as MMR, MSD, DPP, and Cover with a clear, unified API. More information about the supported strategies can be found in the [supported strategies section](#supported-strategies).
+It implements several popular diversification strategies such as MMR, MSD, DPP, and Cover with a clear, unified API. More information about the supported strategies can be found in the [supported strategies section](#supported-strategies). The only dependency is NumPy, making the package very lightweight.
 
 
 ## Quickstart
@@ -19,9 +31,9 @@ Diversify retrieval results:
 import numpy as np
 from pyversity import diversify, Strategy
 
-# Define embeddings and scores
-embeddings  = np.random.randn(100, 256).astype(np.float32)
-scores  = np.random.rand(100).astype(np.float32)
+# Define embeddings and scores (e.g. cosine similarities of a query result)
+embeddings = np.random.randn(100, 256)
+scores = np.random.rand(100)
 
 # Diversify with with a chosen strategy (in this case MMR)
 diversified_result = diversify(
@@ -34,19 +46,34 @@ diversified_result = diversify(
 diversified_indices = diversified_result.indices
 ```
 
-
+The returned `DiversificationResult` can be used to access the diversified `indices`, as well as the `marginal gains` of the selected strategy and other useful info. The strategies are extremely fast and scalable: this example runs in 0.0001s.
 
 ## Supported Strategies
 
-The following table describes the supported strategies, how they work, their time complexity, and when to use them.
+The following table describes the supported strategies, how they work, their time complexity, and when to use them. The papers linked in the [references](#references) section provide more in-depth information on the strengths/weaknesses of the supported strategies.
 
 | Strategy | What It Does | Time Complexity | When to Use |
 | ------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------- | ---------------------------------------------------------------------------------------------- |
-| **MMR** (Maximum Marginal Relevance) | Keeps the most relevant items while down-weighting those too similar to what’s already picked. | **O(k · n · d)** | Best **default**. Fast, simple, and works well when you just want to avoid near-duplicates. |
+| **MMR** (Maximum Marginal Relevance) | Keeps the most relevant items while down-weighting those too similar to what’s already picked. | **O(k · n · d)** | Good default. Fast, simple, and works well when you just want to avoid near-duplicates. |
 | **MSD** (Max Sum of Distances) | Prefers items that are both relevant and far from *all* previous selections. | **O(k · n · d)** | Use when you want stronger spread, i.e. results that cover a wider range of topics or styles. |
 | **DPP** (Determinantal Point Process) | Samples diverse yet relevant items using probabilistic “repulsion.” | **O(k · n · d + n · k²)** | Ideal when you want to eliminate redundancy or ensure diversity is built-in to selection. |
 | **COVER** (Facility-Location) | Ensures selected items collectively represent the full dataset’s structure. | **O(k · n²)** | Great for topic coverage or clustering scenarios, but slower for large `n`. |
 
+
+## Motivation
+
+Traditional retrieval systems rank results purely by relevance (how closely each item matches the query) While effective, this can lead to redundancy: top results often look nearly identical, which can create a poor user experience.
+
+Diversification techniques like MMR, MSD, COVER, and DPP help balance relevance and variety.
+Each new item is chosen not only because it’s relevant, but also because it adds new information that wasn’t already covered by earlier results.
+
+This improves exploration, user satisfaction, and coverage across many domains, for example:
+
+- E-commerce: Show different product styles, not multiple copies of the same black pants.
+- News search: Highlight articles from different outlets or viewpoints.
+- Academic retrieval: Surface papers from different subfields or methods.
+- RAG / LLM contexts: Avoid feeding the model near-duplicate passages.
+
 ## References
 
 The implementations in this package are based on the following research papers:
@@ -61,3 +88,7 @@ The implementations in this package are based on the following research papers:
 
 - **DPP (efficient greedy implementation)**: Chen, L., Zhang, G., & Zhou, H. (2018). Fast greedy MAP inference for determinantal point process to improve recommendation diversity.
 [Link](https://arxiv.org/pdf/1709.05135)
+
+## Author
+
+Thomas van Dongen
@@ -1,6 +1,6 @@
 [project]
 name = "pyversity"
-description = "Retrieval Result Diversification"
+description = "Fast Diversification for Retrieval"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = { file = "LICENSE" }
 requires-python = ">=3.10"
 
@@ -1,5 +1,5 @@
-from pyversity.core import diversify
 from pyversity.datatypes import DiversificationResult, Metric, Strategy
+from pyversity.pyversity import diversify
 from pyversity.strategies import cover, dpp, mmr, msd
 from pyversity.version import __version__