Python Forum
Create homogeneous groups with Kmeans ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Create homogeneous groups with Kmeans ?
#1
Hello to all,

I recently started to study automatic classification using the K-Means method, a method that interests me greatly. For the example, I have a database that lists cheeses as well as different components (calories, lipids, etc.), in this form: https://zupimages.net/viewer.php?id=20/36/imce.png

I wanted to create 4 groups, with the lowest homogeneity (average distance of observations from the center of their respective classes), and the highest dispersion (average distance between classes). I know that statistical software like Sphinx can give these numbers (example of a rendering here: https://zupimages.net/viewer.php?id=20/36/khlr.png).
What I'm thinking of doing is creating a number of group combinations with KMeans, and then only getting the combination that meets the conditions listed. Unfortunately, it was impossible for me to find how to extract this homogeneity and this dispersion, despite my research.

However, my research allowed me to create an algorithm, reproducible:
import pandas as pd import numpy as np from matplotlib import pyplot as plt from scipy.cluster.hierarchy import dendrogram, linkage, fcluster from sklearn import cluster, metrics data = pd.DataFrame({"fromage" : ["fromage1", "fromage2", "fromage3", "fromage4", "fromage5", "fromage6", "fromage7", "fromage8", "fromage9", "fromage10", "fromage11", "fromage12", "fromage13", "fromage14", "fromage15", "fromage16", "fromage17", "fromage18", "fromage19", "fromage20", "fromage21"], "calories" : np.random.uniform(low=100, high=450, size=(21,)), "sodium" : np.random.uniform(low=20, high=450, size=(21,)), "calcium" : np.random.uniform(low=70, high=250, size=(21,)), "lipides" : np.random.uniform(low=20, high=30, size=(21,)), "retinol" : np.random.uniform(low=50, high=120, size=(21,)), "folates" : np.random.uniform(low=1, high=30, size=(21,)), "proteines" : np.random.uniform(low=7, high=20, size=(21,)), "cholesterol" : np.random.uniform(low=100, high=450, size=(21,))}) #CConvertir l'index data = data.set_index("fromage") #Créer mes groupes kmeans = cluster.KMeans(n_clusters=4, init='k-means++', max_iter=300, n_init=10, random_state=0) kmeans.fit(data) #index triés des groupes idk = np.argsort(kmeans.labels_) #moyenne par variable m = data.mean() #TSS TSS = data.shape[0]*data.var(ddof=0) #data.frame conditionnellement aux groupes gb = data.groupby(kmeans.labels_) #effectifs conditionnels nk = gb.size() #MOYENNE DES FACTEURS PAR CLASSE mk = gb.mean() #pour chaque groupe écart à la moyenne par variable EMk = (mk-m)**2 #pondéré par les effectifs du groupe EM = EMk.multiply(nk,axis=0) #somme des valeurs => BSS BSS = np.sum(EM,axis=0) #variance expliquée par l'appartenance aux groupes pour chaque variable R2 = BSS/TSS
Is it possible to extract these numbers with one of the libraries that I used?
Thank you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Random student selection from groups. esahan 7 2,855 Jul-08-2024, 12:28 AM
Last Post: AdamHensley
  How to group related products in relationship groups? RegionHUser 2 1,381 Jun-02-2024, 03:51 PM
Last Post: Pedroski55
  kmeans install error (please help me) muratuzun 3 9,593 May-06-2022, 02:14 PM
Last Post: snippsat
  Ldap Search for finding user Groups ilknurg 1 3,442 Mar-11-2022, 12:10 PM
Last Post: DeaD_EyE
  Make Groups with the List Elements quest 2 3,271 Jul-11-2021, 09:58 AM
Last Post: perfringo
  Understanding Regex Groups matt_the_hall 5 4,994 Jan-11-2021, 02:55 PM
Last Post: matt_the_hall
  How to solve equations, with groups of variables and or constraints? ThemePark 0 2,607 Oct-05-2020, 07:22 PM
Last Post: ThemePark
  Regex: finding if three groups have a value in them Daring_T 7 5,651 May-15-2020, 12:27 AM
Last Post: Daring_T
  How to take group of numbers summed in groups of 3... jaguare22 1 2,529 May-05-2020, 05:23 AM
Last Post: Yoriz
  Listing groups tharpa 2 3,742 Nov-26-2019, 07:25 AM
Last Post: DeaD_EyE

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.