Skip to content

Commit a33c18d

Browse files
authored
docs: Added more docs, added logo (Pringled#6)
* Added logo * Added logo * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs * Updated docs
1 parent 189805f commit a33c18d

File tree

5 files changed

+42
-11
lines changed

5 files changed

+42
-11
lines changed

README.md

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,21 @@
1-
# Pyversity — Diversified Re‑Ranking for Retrieval
21

3-
Pyversity is a small, fast library for diversifying retrieval results.
2+
<h2 align="center">
3+
<img width="35%" alt="Pyversity logo" src="assets/images/pyversity_logo.png"><br/>
4+
Fast Diversification for Retrieval
5+
</h2>
6+
7+
<div align="center">
8+
9+
[Quickstart](#quickstart)
10+
[Supported Strategies](#supported-strategies)
11+
[Motivation](#motivation)
12+
13+
</div>
14+
15+
Pyversity is a fast, lightweight library for diversifying retrieval results.
416
Retrieval systems often return highly similar items. Pyversity efficiently re-ranks these results to encourage diversity, surfacing items that remain relevant but less redundant.
517

6-
It implements several popular strategies such as MMR, MSD, DPP, and Cover with a clear, unified API. More information about the supported strategies can be found in the [supported strategies section](#supported-strategies).
18+
It implements several popular diversification strategies such as MMR, MSD, DPP, and Cover with a clear, unified API. More information about the supported strategies can be found in the [supported strategies section](#supported-strategies). The only dependency is NumPy, making the package very lightweight.
719

820

921
## Quickstart
@@ -19,9 +31,9 @@ Diversify retrieval results:
1931
import numpy as np
2032
from pyversity import diversify, Strategy
2133

22-
# Define embeddings and scores
23-
embeddings = np.random.randn(100, 256).astype(np.float32)
24-
scores = np.random.rand(100).astype(np.float32)
34+
# Define embeddings and scores (e.g. cosine similarities of a query result)
35+
embeddings = np.random.randn(100, 256)
36+
scores = np.random.rand(100)
2537

2638
# Diversify with with a chosen strategy (in this case MMR)
2739
diversified_result = diversify(
@@ -34,19 +46,34 @@ diversified_result = diversify(
3446
diversified_indices = diversified_result.indices
3547
```
3648

37-
49+
The returned `DiversificationResult` can be used to access the diversified `indices`, as well as the `marginal gains` of the selected strategy and other useful info. The strategies are extremely fast and scalable: this example runs in 0.0001s.
3850

3951
## Supported Strategies
4052

41-
The following table describes the supported strategies, how they work, their time complexity, and when to use them.
53+
The following table describes the supported strategies, how they work, their time complexity, and when to use them. The papers linked in the [references](#references) section provide more in-depth information on the strengths/weaknesses of the supported strategies.
4254

4355
| Strategy | What It Does | Time Complexity | When to Use |
4456
| ------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------- | ---------------------------------------------------------------------------------------------- |
45-
| **MMR** (Maximum Marginal Relevance) | Keeps the most relevant items while down-weighting those too similar to what’s already picked. | **O(k · n · d)** | Best **default**. Fast, simple, and works well when you just want to avoid near-duplicates. |
57+
| **MMR** (Maximum Marginal Relevance) | Keeps the most relevant items while down-weighting those too similar to what’s already picked. | **O(k · n · d)** | Good default. Fast, simple, and works well when you just want to avoid near-duplicates. |
4658
| **MSD** (Max Sum of Distances) | Prefers items that are both relevant and far from *all* previous selections. | **O(k · n · d)** | Use when you want stronger spread, i.e. results that cover a wider range of topics or styles. |
4759
| **DPP** (Determinantal Point Process) | Samples diverse yet relevant items using probabilistic “repulsion.” | **O(k · n · d + n · k²)** | Ideal when you want to eliminate redundancy or ensure diversity is built-in to selection. |
4860
| **COVER** (Facility-Location) | Ensures selected items collectively represent the full dataset’s structure. | **O(k · n²)** | Great for topic coverage or clustering scenarios, but slower for large `n`. |
4961

62+
63+
## Motivation
64+
65+
Traditional retrieval systems rank results purely by relevance (how closely each item matches the query) While effective, this can lead to redundancy: top results often look nearly identical, which can create a poor user experience.
66+
67+
Diversification techniques like MMR, MSD, COVER, and DPP help balance relevance and variety.
68+
Each new item is chosen not only because it’s relevant, but also because it adds new information that wasn’t already covered by earlier results.
69+
70+
This improves exploration, user satisfaction, and coverage across many domains, for example:
71+
72+
- E-commerce: Show different product styles, not multiple copies of the same black pants.
73+
- News search: Highlight articles from different outlets or viewpoints.
74+
- Academic retrieval: Surface papers from different subfields or methods.
75+
- RAG / LLM contexts: Avoid feeding the model near-duplicate passages.
76+
5077
## References
5178

5279
The implementations in this package are based on the following research papers:
@@ -61,3 +88,7 @@ The implementations in this package are based on the following research papers:
6188

6289
- **DPP (efficient greedy implementation)**: Chen, L., Zhang, G., & Zhou, H. (2018). Fast greedy MAP inference for determinantal point process to improve recommendation diversity.
6390
[Link](https://arxiv.org/pdf/1709.05135)
91+
92+
## Author
93+
94+
Thomas van Dongen

assets/images/pyversity_logo.png

1.68 MB
Loading

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "pyversity"
3-
description = "Retrieval Result Diversification"
3+
description = "Fast Diversification for Retrieval"
44
readme = { file = "README.md", content-type = "text/markdown" }
55
license = { file = "LICENSE" }
66
requires-python = ">=3.10"

src/pyversity/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
from pyversity.core import diversify
21
from pyversity.datatypes import DiversificationResult, Metric, Strategy
2+
from pyversity.pyversity import diversify
33
from pyversity.strategies import cover, dpp, mmr, msd
44
from pyversity.version import __version__
55

File renamed without changes.

0 commit comments

Comments
 (0)