4
$\begingroup$

Given an integer $n$, and 3 real sequences $\{x_1, \dots, x_n\}, \{y_1, \dots, y_n\}$ and $\{w_1, \dots, w_n\}$ with $x_k, y_k, w_k > 0$, for all $k \in \{1, \dots, n\}$. For a fixed $p < n$ let $\{C_1, \dots, C_p\}$ be a partition of the set $\{1, \dots, n\}$ as in $C_1 \cup \dots \cup C_p$ = $\{1, \dots, n\}$, with pairwise disjoint.

We define for $ k \in \{1,\dots, n\} $, $C(k)$ as the class containing $ k $, it means there exists a unique $j \in \{1, \dots, p\}$ such that $ k \in C_j=C(k)$. I am looking to solve the following problem:

$$ \max_{P=(C_1, \dots, C_p)} \sum_{j=1}^n w_j \frac{\sum_{k \in C(j)} y_k}{\sum_{k \in C(j)} x_k} $$

I am looking for an algorithm to solve this problem with good complexity (ideally polynomial time) and a reference to the literature if this problem is already known.

One reformulation I have considered is to define the complex numbers $ z_1, \dots, z_n $ such that $z_k = x_k + i \times y_k$ for all $k \in \{1, \dots, n\}$, so that the problem becomes:

$$ \max_{P=(C_1, \dots, C_p)} \sum_{j=1}^n w_j \tan\left( \arg \left( \sum_{k \in C(j)} z_k \right) \right) $$

where for $z \in \mathbb{C}, \arg(z)$ is defined mod $2 \pi$ by $z=\vert z \vert e^{i \times \arg(z)}$.

I am not sure if this reformulation is helpful.

I have very little knowledge of operational research, so I am looking for literature to help solve this type of problem. I believe that with a monotonicity assumption, this could be reduced to a graph problem.

Perhaps the convexity of $\tan$ on $ [0, \frac{\pi}{2}[$ could also be useful?

Remark 1 : I can rewrite my problem as follows:

$$\max_{P=(C_1, \dots, C_p)}\sum_{j=1}^p \frac{\left(\sum_{k \in C_j}y_k \right) \left( \sum_{k \in C_j}w_k\right)}{\sum_{k \in C_j}x_k}$$

This is almost the same problem as in this post : Optimal partition search, except that in the latter, we have the particular case $y=w$.

Remark 2

My sequences $x=\{x_1,\dots, x_n\}, y=\{y_1, \dots, y_k \}$ and $w=\{w_1, \dots, w_n\}$ are not arbitrary but come from a physical problem, and in my dataset, there exists a constant $k_r > 0$ such that $x - y = k_r$.

Moreover, $w \ll x $. Finally, $x$, $y$, and $w$ are non-negative and always sorted in decreasing order: $x_1 > x_2 > \dots > x_n > 0$ ect ...

I am attaching some graphs to illustrate this:

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

Remark 3

Here a code to generate my data.

#Some constants k_r=6.8*10**(-3) k_d=2.99*10**(-4) tau_H=0.25 sigma_H=0.047 k_H=8.7*10**(-6) R=1.389*10**(-7) h=0.4 #Some functions def I(I_s,q,N): Z=[-h*(n-0.5)/N for n in range(1,N+1)] return [I_s*np.exp(1/h*np.log(1/q)*z) for z in Z] def alpha(i): return k_d*tau_H*(sigma_H*i)**2/(tau_H*sigma_H*i+1)+k_r def beta(i): return alpha(i)-k_r def gamma(i): return k_H*sigma_H*i/(tau_H*sigma_H*i+1) 

$(I_S,q,n)$ are $3$ parametres, for exemple $n=50$. Besides, $I_s$ is between $ 50~\mu \text{mol}~\text{m}^{-2}~\text{s}^{-1}$ and $2500~\mu \text{mol}~\text{m}^{-2}~\text{s}^{-1}$. $q$ is between $0.001$ and $0.1$.

$x,y,w$ depend on $(I_s,q, n)$ and are got as following :

enter preformatted text here def generate_xyw(I_s,q,n): intensity=I(I_s,q,n) x=[alpha(i) for i in intensity] y=[beta(i) for i in intensity] w=[gamma(i) for i in intensity] return x,y,w 

Finally, here the objective function :

def objective_function(partition, x, y, w): total = 0 for subset in partition: sum_x = sum(x[k] for k in subset) sum_y = sum(y[k] for k in subset) sum_w = sum(w[k] for k in subset) total += (sum_y * sum_w) / sum_x return total 

Remark 4 :

Here some exemples of x,y,w :

Exemple 1, n=20, p=8 x=[0.031734386869619546, 0.028114773786248753, 0.02500095746124575, 0.022322881262651355, 0.02002026370143802, 0.018041238125995708, 0.016341180402791745, 0.014881698110517521, 0.013629758459185334, 0.01255693535474994, 0.011638758843708498, 0.010854152657549989, 0.010184947785304998, 0.009615461966816253, 0.009132136731727278, 0.008723225100294272, 0.008378524285632391, 0.008089148659956332, 0.007847338846610508, 0.007646303081056538] y=[0.024934386869619546, 0.021314773786248752, 0.01820095746124575, 0.015522881262651354, 0.013220263701438018, 0.011241238125995708, 0.009541180402791744, 0.00808169811051752, 0.006829758459185335, 0.00575693535474994, 0.004838758843708498, 0.004054152657549989, 0.0033849477853049982, 0.002815461966816253, 0.002332136731727279, 0.0019232251002942726, 0.0015785242856323914, 0.0012891486599563321, 0.0010473388466105083, 0.0008463030810565387] w=[3.327397189186497e-05, 3.303985856387542e-05, 3.277201810575901e-05, 3.246629920726947e-05, 3.2118264409167744e-05, 3.172324450401634e-05, 3.127642155639144e-05, 3.0772945937594106e-05, 3.0208091973507955e-05, 2.9577455028712766e-05, 2.8877189832180718e-05, 2.8104285419597507e-05, 2.7256866254243926e-05, 2.6334502245185015e-05, 2.5338503289491927e-05, 2.4272167867950054e-05, 2.314095172555498e-05, 2.1952523449580294e-05, 2.0716680114097696e-05, 1.9445108455108922e-05] Exemple 2, n=20, p=12 x=[0.018233001464936494, 0.015681110832323653, 0.013663795700816881, 0.012072649471329307, 0.010821569370018151, 0.009842064760569254, 0.009079506265419026, 0.00849013642970723, 0.00803870844174435, 0.007696651850488342, 0.007440681861171401, 0.007251772980111323, 0.007114414361670069, 0.007016062238600948, 0.006946712107442491, 0.006898531854224716, 0.006865521896777285, 0.00684319160597832, 0.006828256499488985, 0.006818366354441052] y=[0.011433001464936494, 0.008881110832323652, 0.006863795700816882, 0.005272649471329307, 0.0040215693700181515, 0.003042064760569255, 0.0022795062654190268, 0.0016901364297072308, 0.0012387084417443501, 0.0008966518504883421, 0.0006406818611714013, 0.0004517729801113233, 0.0003144143616700696, 0.0002160622386009484, 0.00014671210744249177, 9.85318542247168e-05, 6.552189677728505e-05, 4.319160597832019e-05, 2.8256499488985704e-05, 1.836635444105259e-05] w=[3.176657969002066e-05, 3.1065438454688126e-05, 3.022557323118029e-05, 2.9230692066097497e-05, 2.8067631248385212e-05, 2.6728750243508827e-05, 2.5214536422462323e-05, 2.353596233919086e-05, 2.171597309870185e-05, 1.978946469925486e-05, 1.7801337174030657e-05, 1.5802669254007467e-05, 1.3845625260178508e-05, 1.1978131294329622e-05, 1.0239438767534542e-05, 8.657387432177672e-06, 7.2476407624875955e-06, 6.014638781858573e-06, 4.953686937841454e-06, 4.0535278287472135e-06] Exemple 3, n=15 p=9 x=[0.017140958969020684, 0.013316273144252648, 0.010823466923717988, 0.009218592702555723, 0.00820604102194055, 0.007586078432295241, 0.00722129205528342, 0.007016486857528025, 0.006907036374494345, 0.0068512153687844005, 0.006823874780969982, 0.006810912349301935, 0.00680491679976062, 0.006802193132999357, 0.006800971451803378] y=[0.010340958969020684, 0.006516273144252648, 0.004023466923717988, 0.0024185927025557235, 0.0014060410219405512, 0.0007860784322952412, 0.0004212920552834204, 0.00021648685752802526, 0.00010703637449434571, 5.1215368784400875e-05, 2.3874780969982377e-05, 1.091234930193554e-05, 4.916799760620351e-06, 2.1931329993573473e-06, 9.714518033783481e-07] w=[3.150189498705775e-05, 3.004045308101965e-05, 2.8069777307216367e-05, 2.5534808403277594e-05, 2.2464625892882282e-05, 1.9006336467426224e-05, 1.5415128009394162e-05, 1.1987447998609264e-05, 8.969287062409137e-06, 6.494686185188362e-06, 4.5817205470108376e-06, 3.1691207976881912e-06, 2.1608986080024373e-06, 1.458640175517353e-06, 9.777679421225571e-07] 
$\endgroup$
7
  • $\begingroup$ @RobPratt, thank you for your improvements to the body of the post! For the title, I believe that it is conventional to respect posters' choice not to use TeX in the title, as long as it does not significantly affect readability. $\endgroup$ Commented Oct 1, 2024 at 18:16
  • $\begingroup$ Do you know anything about the signs of $x_k,y_k,w_k$? How big is $n$? Do you have sample data you can share? $\endgroup$ Commented Oct 1, 2024 at 21:58
  • $\begingroup$ If by monotonicity, you mean sorting by $y_k w_k/x_k$ and then grouping only consecutive points, this does not guarantee optimality. $\endgroup$ Commented Oct 2, 2024 at 21:10
  • $\begingroup$ If you can share explicit data $x,y,w,p$ for $n=50$, I'll try a mixed integer linear programming formulation. $\endgroup$ Commented Oct 2, 2024 at 21:14
  • $\begingroup$ I have added code to get x,y,w. The goal is to get the best partition (so find also the best p). I need also to minimize my objective function. $\endgroup$ Commented Oct 4, 2024 at 11:42

1 Answer 1

2
$\begingroup$

You can formulate the problem as a mixed integer nonlinear programming problem as follows. Let binary decision variable $u_{jk}$ indicate whether point $j$ appears in class $k$. The problem is to maximize $$\sum_{k=1}^p \frac{\left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right)}{\sum_{j=1}^n x_j u_{jk}}$$ subject to \begin{align} \sum_{k=1}^p u_{jk} &= 1 &&\text{for $j\in\{1,\dots,n\}$} \tag1\label1 \\ \sum_{j=1}^n u_{jk} &\ge 1 &&\text{for $k\in\{1,\dots,p\}$} \tag2\label2 \end{align} Constraint \eqref{1} assigns each point to exactly one class. Constraint \eqref{2} assigns at least one point to each class.

This problem can be linearized, and I used a mixed integer linear programming solver to find optimal solutions.

For your Example 1, here is an optimal solution, with objective value 0.0003034162:

 1 2 3 4 5 6 7 8 1 0 0 1 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 0 0 0 0 1 0 0 0 4 0 0 0 0 0 1 0 0 5 0 1 0 0 0 0 0 0 6 0 0 0 0 0 0 1 0 7 0 0 0 1 0 0 0 0 8 0 0 0 0 0 0 0 1 9 0 0 0 0 0 1 0 0 10 0 0 0 0 0 0 1 0 11 1 0 0 0 0 0 0 0 12 0 1 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 14 0 0 0 0 1 0 0 0 15 0 0 0 0 0 1 0 0 16 0 0 0 0 1 0 0 0 17 0 0 1 0 0 0 0 0 18 1 0 0 0 0 0 0 0 19 0 0 1 0 0 0 0 0 20 0 0 1 0 0 0 0 0 

For your Example 2, here is an optimal solution, with objective value 0.0001091495:

 1 2 3 4 5 6 7 8 9 10 11 12 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 5 1 0 0 0 0 0 0 0 0 0 0 0 6 1 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 1 0 0 0 0 8 0 0 0 0 1 0 0 0 0 0 0 0 9 0 0 0 0 1 0 0 0 0 0 0 0 10 0 0 0 0 1 0 0 0 0 0 0 0 11 0 0 1 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 1 0 13 0 1 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 1 0 0 15 0 0 0 0 0 1 0 0 0 0 0 0 16 0 0 0 1 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 1 0 0 0 0 0 18 0 0 0 1 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 1 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 1 

For your Example 3, here is an optimal solution, with objective value 0.000059952 :

 1 2 3 4 5 6 7 8 9 1 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 1 0 0 6 1 0 0 0 0 0 0 0 0 7 0 1 0 0 0 0 0 0 0 8 0 0 0 0 0 0 1 0 0 9 0 0 0 1 0 0 0 0 0 10 0 0 0 0 1 0 0 0 0 11 0 0 0 0 0 0 0 1 0 12 0 1 0 0 0 0 0 0 0 13 0 0 1 0 0 0 0 0 0 14 0 0 0 1 0 0 0 0 0 15 0 0 0 0 0 1 0 0 0 

One way to linearize the objective is to introduce a new decision variable $v_k$ to represent the summand so that the objective function becomes $\sum_{k=1}^p v_k$. Because $x_j,y_j,w_j>0$, we have lower bound $$\underline{v}_k = \frac{\left(\min_{j=1}^n y_j\right)\left(\min_{j=1}^n w_j\right)}{\sum_{j=1}^n x_j}$$ and upper bound $$\overline{v}_k = \frac{\left(\sum_{j=1}^n y_j\right)\left(\sum_{j=1}^n w_j \right)}{\min_{j=1}^n x_j}$$ Now we want to enforce $$v_k = \frac{\left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right)}{\sum_{j=1}^n x_j u_{jk}},$$ equivalently, $$v_k \sum_{j=1}^n x_j u_{jk} = \left(\sum_{j=1}^n y_j u_{jk}\right)\left(\sum_{j=1}^n w_j u_{jk}\right) \tag3\label3$$ Nonlinear constraint \eqref{3} involves binary-bounded products $u_{jk} v_k$ and binary-binary products $u_{jk}u_{\ell m}$, both of which can be linearized. See, for example, https://go.documentation.sas.com/doc/en/pgmsascdc/v_055/casmopt/casmopt_optmodel_details57.htm

$\endgroup$
1
  • $\begingroup$ Thank you very much. How do you linearize this problem ? $\endgroup$ Commented Oct 4, 2024 at 19:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.