Skip to content

Commit 272b9ca

Browse files
authored
feat: Constrain diversity for DPP, add scale factor (Pringled#10)
* Added constraint for dpp * Added constraint for dpp * Added constraint for dpp * Added constraint for dpp * Added constraint for dpp
1 parent fcd098b commit 272b9ca

File tree

3 files changed

+15
-6
lines changed

3 files changed

+15
-6
lines changed

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,15 @@ from pyversity import diversify, Strategy
3535
embeddings = np.random.randn(100, 256)
3636
scores = np.random.rand(100)
3737

38-
# Diversify with with a chosen strategy (in this case MMR) and a diversity of 0.5 (balanced)
38+
# Diversify the result
3939
diversified_result = diversify(
4040
embeddings=embeddings,
4141
scores=scores,
42-
k=10,
43-
strategy=Strategy.MMR,
44-
diversity=0.5
42+
k=10, # Number of items to select
43+
strategy=Strategy.MMR, # Diversification strategy to use
44+
diversity=0.5 # Diversity parameter (higher values prioritize diversity)
4545
)
46+
4647
# Get the indicices of the diversified result
4748
diversified_indices = diversified_result.indices
4849
```

src/pyversity/pyversity.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def diversify(
2222
:param k: The number of items to select for the diversified result.
2323
:param strategy: The diversification strategy to apply.
2424
Supported strategies are: 'mmr' (default), 'msd', 'cover', and 'dpp'.
25-
:param diversity: Diversity parameter. Higher values prioritize diversity and lower values prioritize relevance.
25+
:param diversity: Diversity parameter (range of [0, 1]). Higher values prioritize diversity and lower values prioritize relevance.
2626
:param **kwargs: Additional keyword arguments passed to the specific strategy function.
2727
:return: A DiversificationResult containing the selected item indices,
2828
their marginal gains, the strategy used, and the parameters.

src/pyversity/strategies/dpp.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ def dpp(
1717
scores: np.ndarray,
1818
k: int,
1919
diversity: float = 0.5,
20+
scale: float = 1.0,
2021
) -> DiversificationResult:
2122
"""
2223
Greedy determinantal point process (DPP) selection.
@@ -30,12 +31,17 @@ def dpp(
3031
:param k: Number of items to select.
3132
:param diversity: Controls the influence of relevance scores in the DPP kernel (inverse of beta parameter).
3233
Higher values increase the emphasis on diversity.
34+
:param scale: Optional scaling factor for the beta parameter to adjust relevance influence.
3335
:return: A DiversificationResult containing the selected item indices,
3436
their marginal gains, the strategy used, and the parameters.
37+
:raises ValueError: If diversity is not in [0, 1].
3538
"""
39+
if not (0.0 <= float(diversity) <= 1.0):
40+
raise ValueError("diversity must be in [0, 1]")
41+
3642
# Beta parameter to control relevance influence in DPP kernel.
3743
# This is the inverse of diversity to align with common notation.
38-
beta = 1 - diversity
44+
beta = (1 - diversity) * scale
3945

4046
# Prepare inputs
4147
feature_matrix, relevance_scores, top_k, early_exit = prepare_inputs(embeddings, scores, k)
@@ -46,6 +52,7 @@ def dpp(
4652
marginal_gains=np.empty(0, np.float32),
4753
strategy=Strategy.DPP,
4854
diversity=diversity,
55+
parameters={"scale": scale},
4956
)
5057
# Normalize feature vectors to unit length for cosine similarity
5158
feature_matrix = normalize_rows(feature_matrix)
@@ -102,4 +109,5 @@ def dpp(
102109
marginal_gains=marginal_gains[:step],
103110
strategy=Strategy.DPP,
104111
diversity=diversity,
112+
parameters={"scale": scale},
105113
)

0 commit comments

Comments
 (0)