Given a dataset $X,$ having $p$ features, organize the units $x_i \in X $ into fixed number of clusters $g,$ with fixed cluster size $B.$
Clustering policy: minimize the sum of a linear combination of the average of the features inside the clusters. That is $x_i \in R^p $ we must find $[X_1,X_2,\ldots,X_g]$ st.: $$\min\sum_{s \in {1,\ldots,g}}\sum_{x_i \in X_s}\sum^p_{j=1}w^T_j\frac{\sum^B_{k=1}x_{kj}}{B}$$ where w is the vector of the weight of the linear combination on the averages of the feature inside the clusters.
I'm wondering how to solve such a problem, keeping in mind that my dataset is very big (50k units) and also lots of clusters must be found. In particular: $n = 50000, p=3, g=500$.
I tried approaching the problem as suggested here https://www.quora.com/Which-clustering-algorithms-can-be-run-on-a-dataset-of-3-million-samples, so I wrote my implementation of online k-means. However still for a single iterations it takes more than 20 minutes.
What is a good approach to the problem?