Optimal partition of $n$ points

Question

Given an integer $n$, and 3 real sequences $\{x_1, \dots, x_n\}, \{y_1, \dots, y_n\}$ and $\{w_1, \dots, w_n\}$ with $x_k, y_k, w_k > 0$, for all $k \in \{1, \dots, n\}$. For a fixed $p < n$ let $\{C_1, \dots, C_p\}$ be a partition of the set $\{1, \dots, n\}$ as in $C_1 \cup \dots \cup C_p$ = $\{1, \dots, n\}$, with pairwise disjoint.

We define for $ k \in \{1,\dots, n\} $, $C(k)$ as the class containing $ k $, it means there exists a unique $j \in \{1, \dots, p\}$ such that $ k \in C_j=C(k)$. I am looking to solve the following problem:

$$ \max_{P=(C_1, \dots, C_p)} \sum_{j=1}^n w_j \frac{\sum_{k \in C(j)} y_k}{\sum_{k \in C(j)} x_k} $$

I am looking for an algorithm to solve this problem with good complexity (ideally polynomial time) and a reference to the literature if this problem is already known.

One reformulation I have considered is to define the complex numbers $ z_1, \dots, z_n $ such that $z_k = x_k + i \times y_k$ for all $k \in \{1, \dots, n\}$, so that the problem becomes:

$$ \max_{P=(C_1, \dots, C_p)} \sum_{j=1}^n w_j \tan\left( \arg \left( \sum_{k \in C(j)} z_k \right) \right) $$

where for $z \in \mathbb{C}, \arg(z)$ is defined mod $2 \pi$ by $z=\vert z \vert e^{i \times \arg(z)}$.

I am not sure if this reformulation is helpful.

I have very little knowledge of operational research, so I am looking for literature to help solve this type of problem. I believe that with a monotonicity assumption, this could be reduced to a graph problem.

Perhaps the convexity of $\tan$ on $ [0, \frac{\pi}{2}[$ could also be useful?

Remark 1 : I can rewrite my problem as follows:

$$\max_{P=(C_1, \dots, C_p)}\sum_{j=1}^p \frac{\left(\sum_{k \in C_j}y_k \right) \left( \sum_{k \in C_j}w_k\right)}{\sum_{k \in C_j}x_k}$$

This is almost the same problem as in this post : Optimal partition search, except that in the latter, we have the particular case $y=w$.

Remark 2

My sequences $x=\{x_1,\dots, x_n\}, y=\{y_1, \dots, y_k \}$ and $w=\{w_1, \dots, w_n\}$ are not arbitrary but come from a physical problem, and in my dataset, there exists a constant $k_r > 0$ such that $x - y = k_r$.

Moreover, $w \ll x $. Finally, $x$, $y$, and $w$ are non-negative and always sorted in decreasing order: $x_1 > x_2 > \dots > x_n > 0$ ect ...

I am attaching some graphs to illustrate this:

Remark 3

Here a code to generate my data.

#Some constants k_r=6.8*10**(-3) k_d=2.99*10**(-4) tau_H=0.25 sigma_H=0.047 k_H=8.7*10**(-6) R=1.389*10**(-7) h=0.4 #Some functions def I(I_s,q,N): Z=[-h*(n-0.5)/N for n in range(1,N+1)] return [I_s*np.exp(1/h*np.log(1/q)*z) for z in Z] def alpha(i): return k_d*tau_H*(sigma_H*i)**2/(tau_H*sigma_H*i+1)+k_r def beta(i): return alpha(i)-k_r def gamma(i): return k_H*sigma_H*i/(tau_H*sigma_H*i+1)

$(I_S,q,n)$ are $3$ parametres, for exemple $n=50$. Besides, $I_s$ is between $ 50~\mu \text{mol}~\text{m}^{-2}~\text{s}^{-1}$ and $2500~\mu \text{mol}~\text{m}^{-2}~\text{s}^{-1}$. $q$ is between $0.001$ and $0.1$.

$x,y,w$ depend on $(I_s,q, n)$ and are got as following :

enter preformatted text here def generate_xyw(I_s,q,n): intensity=I(I_s,q,n) x=[alpha(i) for i in intensity] y=[beta(i) for i in intensity] w=[gamma(i) for i in intensity] return x,y,w

Finally, here the objective function :

def objective_function(partition, x, y, w): total = 0 for subset in partition: sum_x = sum(x[k] for k in subset) sum_y = sum(y[k] for k in subset) sum_w = sum(w[k] for k in subset) total += (sum_y * sum_w) / sum_x return total

Remark 4 :

Here some exemples of x,y,w :

Exemple 1, n=20, p=8 x=[0.031734386869619546, 0.028114773786248753, 0.02500095746124575, 0.022322881262651355, 0.02002026370143802, 0.018041238125995708, 0.016341180402791745, 0.014881698110517521, 0.013629758459185334, 0.01255693535474994, 0.011638758843708498, 0.010854152657549989, 0.010184947785304998, 0.009615461966816253, 0.009132136731727278, 0.008723225100294272, 0.008378524285632391, 0.008089148659956332, 0.007847338846610508, 0.007646303081056538] y=[0.024934386869619546, 0.021314773786248752, 0.01820095746124575, 0.015522881262651354, 0.013220263701438018, 0.011241238125995708, 0.009541180402791744, 0.00808169811051752, 0.006829758459185335, 0.00575693535474994, 0.004838758843708498, 0.004054152657549989, 0.0033849477853049982, 0.002815461966816253, 0.002332136731727279, 0.0019232251002942726, 0.0015785242856323914, 0.0012891486599563321, 0.0010473388466105083, 0.0008463030810565387] w=[3.327397189186497e-05, 3.303985856387542e-05, 3.277201810575901e-05, 3.246629920726947e-05, 3.2118264409167744e-05, 3.172324450401634e-05, 3.127642155639144e-05, 3.0772945937594106e-05, 3.0208091973507955e-05, 2.9577455028712766e-05, 2.8877189832180718e-05, 2.8104285419597507e-05, 2.7256866254243926e-05, 2.6334502245185015e-05, 2.5338503289491927e-05, 2.4272167867950054e-05, 2.314095172555498e-05, 2.1952523449580294e-05, 2.0716680114097696e-05, 1.9445108455108922e-05] Exemple 2, n=20, p=12 x=[0.018233001464936494, 0.015681110832323653, 0.013663795700816881, 0.012072649471329307, 0.010821569370018151, 0.009842064760569254, 0.009079506265419026, 0.00849013642970723, 0.00803870844174435, 0.007696651850488342, 0.007440681861171401, 0.007251772980111323, 0.007114414361670069, 0.007016062238600948, 0.006946712107442491, 0.006898531854224716, 0.006865521896777285, 0.00684319160597832, 0.006828256499488985, 0.006818366354441052] y=[0.011433001464936494, 0.008881110832323652, 0.006863795700816882, 0.005272649471329307, 0.0040215693700181515, 0.003042064760569255, 0.0022795062654190268, 0.0016901364297072308, 0.0012387084417443501, 0.0008966518504883421, 0.0006406818611714013, 0.0004517729801113233, 0.0003144143616700696, 0.0002160622386009484, 0.00014671210744249177, 9.85318542247168e-05, 6.552189677728505e-05, 4.319160597832019e-05, 2.8256499488985704e-05, 1.836635444105259e-05] w=[3.176657969002066e-05, 3.1065438454688126e-05, 3.022557323118029e-05, 2.9230692066097497e-05, 2.8067631248385212e-05, 2.6728750243508827e-05, 2.5214536422462323e-05, 2.353596233919086e-05, 2.171597309870185e-05, 1.978946469925486e-05, 1.7801337174030657e-05, 1.5802669254007467e-05, 1.3845625260178508e-05, 1.1978131294329622e-05, 1.0239438767534542e-05, 8.657387432177672e-06, 7.2476407624875955e-06, 6.014638781858573e-06, 4.953686937841454e-06, 4.0535278287472135e-06] Exemple 3, n=15 p=9 x=[0.017140958969020684, 0.013316273144252648, 0.010823466923717988, 0.009218592702555723, 0.00820604102194055, 0.007586078432295241, 0.00722129205528342, 0.007016486857528025, 0.006907036374494345, 0.0068512153687844005, 0.006823874780969982, 0.006810912349301935, 0.00680491679976062, 0.006802193132999357, 0.006800971451803378] y=[0.010340958969020684, 0.006516273144252648, 0.004023466923717988, 0.0024185927025557235, 0.0014060410219405512, 0.0007860784322952412, 0.0004212920552834204, 0.00021648685752802526, 0.00010703637449434571, 5.1215368784400875e-05, 2.3874780969982377e-05, 1.091234930193554e-05, 4.916799760620351e-06, 2.1931329993573473e-06, 9.714518033783481e-07] w=[3.150189498705775e-05, 3.004045308101965e-05, 2.8069777307216367e-05, 2.5534808403277594e-05, 2.2464625892882282e-05, 1.9006336467426224e-05, 1.5415128009394162e-05, 1.1987447998609264e-05, 8.969287062409137e-06, 6.494686185188362e-06, 4.5817205470108376e-06, 3.1691207976881912e-06, 2.1608986080024373e-06, 1.458640175517353e-06, 9.777679421225571e-07]

@RobPratt, thank you for your improvements to the body of the post! For the title, I believe that it is conventional to respect posters' choice not to use TeX in the title, as long as it does not significantly affect readability. — LSpice
– LSpice, Commented Oct 1, 2024 at 18:16
Do you know anything about the signs of $x_k,y_k,w_k$? How big is $n$? Do you have sample data you can share? — RobPratt
– RobPratt, Commented Oct 1, 2024 at 21:58
If by monotonicity, you mean sorting by $y_k w_k/x_k$ and then grouping only consecutive points, this does not guarantee optimality. — RobPratt
– RobPratt, Commented Oct 2, 2024 at 21:10
If you can share explicit data $x,y,w,p$ for $n=50$, I'll try a mixed integer linear programming formulation. — RobPratt
– RobPratt, Commented Oct 2, 2024 at 21:14
I have added code to get x,y,w. The goal is to get the best partition (so find also the best p). I need also to minimize my objective function. — Adam
– Adam, Commented Oct 4, 2024 at 11:42

RobPratt · Accepted Answer · 2024-10-04 20:46:05Z

You can formulate the problem as a mixed integer nonlinear programming problem as follows. Let binary decision variable $u_{jk}$ indicate whether point $j$ appears in class $k$. The problem is to maximize $$\sum_{k=1}^p \frac{\left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right)}{\sum_{j=1}^n x_j u_{jk}}$$ subject to \begin{align} \sum_{k=1}^p u_{jk} &= 1 &&\text{for $j\in\{1,\dots,n\}$} \tag1\label1 \\ \sum_{j=1}^n u_{jk} &\ge 1 &&\text{for $k\in\{1,\dots,p\}$} \tag2\label2 \end{align} Constraint \eqref{1} assigns each point to exactly one class. Constraint \eqref{2} assigns at least one point to each class.

This problem can be linearized, and I used a mixed integer linear programming solver to find optimal solutions.

For your Example 1, here is an optimal solution, with objective value 0.0003034162:

 1 2 3 4 5 6 7 8 1 0 0 1 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 0 0 0 0 1 0 0 0 4 0 0 0 0 0 1 0 0 5 0 1 0 0 0 0 0 0 6 0 0 0 0 0 0 1 0 7 0 0 0 1 0 0 0 0 8 0 0 0 0 0 0 0 1 9 0 0 0 0 0 1 0 0 10 0 0 0 0 0 0 1 0 11 1 0 0 0 0 0 0 0 12 0 1 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 14 0 0 0 0 1 0 0 0 15 0 0 0 0 0 1 0 0 16 0 0 0 0 1 0 0 0 17 0 0 1 0 0 0 0 0 18 1 0 0 0 0 0 0 0 19 0 0 1 0 0 0 0 0 20 0 0 1 0 0 0 0 0

For your Example 2, here is an optimal solution, with objective value 0.0001091495:

 1 2 3 4 5 6 7 8 9 10 11 12 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 5 1 0 0 0 0 0 0 0 0 0 0 0 6 1 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 1 0 0 0 0 8 0 0 0 0 1 0 0 0 0 0 0 0 9 0 0 0 0 1 0 0 0 0 0 0 0 10 0 0 0 0 1 0 0 0 0 0 0 0 11 0 0 1 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 1 0 13 0 1 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 1 0 0 15 0 0 0 0 0 1 0 0 0 0 0 0 16 0 0 0 1 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 1 0 0 0 0 0 18 0 0 0 1 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 1 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 1

For your Example 3, here is an optimal solution, with objective value 0.000059952 :

 1 2 3 4 5 6 7 8 9 1 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 1 0 0 6 1 0 0 0 0 0 0 0 0 7 0 1 0 0 0 0 0 0 0 8 0 0 0 0 0 0 1 0 0 9 0 0 0 1 0 0 0 0 0 10 0 0 0 0 1 0 0 0 0 11 0 0 0 0 0 0 0 1 0 12 0 1 0 0 0 0 0 0 0 13 0 0 1 0 0 0 0 0 0 14 0 0 0 1 0 0 0 0 0 15 0 0 0 0 0 1 0 0 0

One way to linearize the objective is to introduce a new decision variable $v_k$ to represent the summand so that the objective function becomes $\sum_{k=1}^p v_k$. Because $x_j,y_j,w_j>0$, we have lower bound $$\underline{v}_k = \frac{\left(\min_{j=1}^n y_j\right)\left(\min_{j=1}^n w_j\right)}{\sum_{j=1}^n x_j}$$ and upper bound $$\overline{v}_k = \frac{\left(\sum_{j=1}^n y_j\right)\left(\sum_{j=1}^n w_j \right)}{\min_{j=1}^n x_j}$$ Now we want to enforce $$v_k = \frac{\left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right)}{\sum_{j=1}^n x_j u_{jk}},$$ equivalently, $$v_k \sum_{j=1}^n x_j u_{jk} = \left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right) \tag3\label3$$ Nonlinear constraint \eqref{3} involves binary-bounded products $u_{jk} v_k$ and binary-binary products $u_{jk}u_{\ell m}$, both of which can be linearized. See, for example, https://go.documentation.sas.com/doc/en/pgmsascdc/v_055/casmopt/casmopt_optmodel_details57.htm

$\begingroup$ Thank you very much. How do you linearize this problem ? $\endgroup$

Adam
– Adam

2024-10-04 19:19:11 +00:00
Commented Oct 4, 2024 at 19:19 — Adam
– Adam, Commented Oct 4, 2024 at 19:19

Stack Exchange Network

Optimal partition of $n$ points

1 Answer 1

You must log in to answer this question.

Linked

Optimal partition of $n$ points

1 Answer 1

You must log in to answer this question.

Linked

Related