pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.
Version: 0.11.dev
License: The 3-Clause BSD License
E-Mail: pyclustering@yandex.ru
Documentation: https://pyclustering.github.io/docs/0.10.1/html/
Homepage: https://pyclustering.github.io/
PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki
Required packages: scipy, matplotlib, numpy, Pillow
Python version: >=3.6 (32-bit, 64-bit)
C++ version: >= 14 (32-bit, 64-bit)
Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:
# As by default - C/C++ part of the library is used xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True); # The same - C/C++ part of the library is used by default xmeans_instance_2 = xmeans(data_points, start_centers, 20); # Switch off core - Python is used xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);Installation using pip3 tool:
$ pip3 install pyclusteringManual installation from official repository using Makefile:
# get sources of the pyclustering library, for example, from repository $ mkdir pyclustering $ cd pyclustering/ $ git clone https://github.com/annoviko/pyclustering.git . # compile CCORE library (core of the pyclustering library). $ cd ccore/ $ make ccore_64bit # build for 64-bit OS # $ make ccore_32bit # build for 32-bit OS # return to parent folder of the pyclustering library $ cd ../ # install pyclustering library $ python3 setup.py install # optionally - test the library $ python3 setup.py testManual installation using CMake:
# get sources of the pyclustering library, for example, from repository $ mkdir pyclustering $ cd pyclustering/ $ git clone https://github.com/annoviko/pyclustering.git . # generate build files. $ mkdir build $ cmake .. # build pyclustering-shared target depending on what was generated (Makefile or MSVC solution) # if Makefile has been generated then $ make pyclustering-shared # return to parent folder of the pyclustering library $ cd ../ # install pyclustering library $ python3 setup.py install # optionally - test the library $ python3 setup.py testManual installation using Microsoft Visual Studio solution:
- Clone repository from: https://github.com/annoviko/pyclustering.git
- Open folder pyclustering/ccore
- Open Visual Studio project ccore.sln
- Select solution platform: x86 or x64
- Build pyclustering-shared project.
- Add pyclustering folder to python path or install it using setup.py
# install pyclustering library $ python3 setup.py install # optionally - test the library $ python3 setup.py testIn case of any questions, proposals or bugs related to the pyclustering please contact to pyclustering@yandex.ru or create an issue here.
| Branch | master | 0.10.dev | 0.10.1.rel |
|---|---|---|---|
| Build (Linux, MacOS) | |||
| Build (Win) | |||
| Code Coverage |
If you are using pyclustering library in a scientific paper, please, cite the library:
Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.
BibTeX entry:
@article{Novikov2019, doi = {10.21105/joss.01230}, url = {https://doi.org/10.21105/joss.01230}, year = 2019, month = {apr}, publisher = {The Open Journal}, volume = {4}, number = {36}, pages = {1230}, author = {Andrei Novikov}, title = {{PyClustering}: Data Mining Library}, journal = {Journal of Open Source Software} } Clustering algorithms and methods (module pyclustering.cluster):
| Algorithm | Python | C++ |
|---|---|---|
| Agglomerative | ✓ | ✓ |
| BANG | ✓ | |
| BIRCH | ✓ | |
| BSAS | ✓ | ✓ |
| CLARANS | ✓ | |
| CLIQUE | ✓ | ✓ |
| CURE | ✓ | ✓ |
| DBSCAN | ✓ | ✓ |
| Elbow | ✓ | ✓ |
| EMA | ✓ | |
| Fuzzy C-Means | ✓ | ✓ |
| GA (Genetic Algorithm) | ✓ | ✓ |
| G-Means | ✓ | ✓ |
| HSyncNet | ✓ | ✓ |
| K-Means | ✓ | ✓ |
| K-Means++ | ✓ | ✓ |
| K-Medians | ✓ | ✓ |
| K-Medoids | ✓ | ✓ |
| MBSAS | ✓ | ✓ |
| OPTICS | ✓ | ✓ |
| ROCK | ✓ | ✓ |
| Silhouette | ✓ | ✓ |
| SOM-SC | ✓ | ✓ |
| SyncNet | ✓ | ✓ |
| Sync-SOM | ✓ | |
| TTSAS | ✓ | ✓ |
| X-Means | ✓ | ✓ |
Oscillatory networks and neural networks (module pyclustering.nnet):
| Model | Python | C++ |
|---|---|---|
| CNN (Chaotic Neural Network) | ✓ | |
| fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model) | ✓ | |
| HHN (Oscillatory network based on Hodgkin-Huxley model) | ✓ | ✓ |
| Hysteresis Oscillatory Network | ✓ | |
| LEGION (Local Excitatory Global Inhibitory Oscillatory Network) | ✓ | ✓ |
| PCNN (Pulse-Coupled Neural Network) | ✓ | ✓ |
| SOM (Self-Organized Map) | ✓ | ✓ |
| Sync (Oscillatory network based on Kuramoto model) | ✓ | ✓ |
| SyncPR (Oscillatory network for pattern recognition) | ✓ | ✓ |
| SyncSegm (Oscillatory network for image segmentation) | ✓ | ✓ |
Graph Coloring Algorithms (module pyclustering.gcolor):
| Algorithm | Python | C++ |
|---|---|---|
| DSatur | ✓ | |
| Hysteresis | ✓ | |
| GColorSync | ✓ |
Containers (module pyclustering.container):
| Algorithm | Python | C++ |
|---|---|---|
| KD Tree | ✓ | ✓ |
| CF Tree | ✓ |
The library contains examples for each algorithm and oscillatory network model:
Clustering examples: pyclustering/cluster/examples
Graph coloring examples: pyclustering/gcolor/examples
Oscillatory network examples: pyclustering/nnet/examples
Data clustering by CURE algorithm
from pyclustering.cluster import cluster_visualizer; from pyclustering.cluster.cure import cure; from pyclustering.utils import read_sample; from pyclustering.samples.definitions import FCPS_SAMPLES; # Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ]. input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN); # Allocate three clusters. cure_instance = cure(input_data, 3); cure_instance.process(); clusters = cure_instance.get_clusters(); # Visualize allocated clusters. visualizer = cluster_visualizer(); visualizer.append_clusters(clusters, input_data); visualizer.show();Data clustering by K-Means algorithm
from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.samples.definitions import FCPS_SAMPLES from pyclustering.utils import read_sample # Load list of points for cluster analysis. sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) # Prepare initial centers using K-Means++ method. initial_centers = kmeans_plusplus_initializer(sample, 2).initialize() # Create instance of K-Means algorithm with prepared centers. kmeans_instance = kmeans(sample, initial_centers) # Run cluster analysis and obtain results. kmeans_instance.process() clusters = kmeans_instance.get_clusters() final_centers = kmeans_instance.get_centers() # Visualize obtained results kmeans_visualizer.show_clusters(sample, clusters, final_centers)Data clustering by OPTICS algorithm
from pyclustering.cluster import cluster_visualizer from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer from pyclustering.samples.definitions import FCPS_SAMPLES from pyclustering.utils import read_sample # Read sample for clustering from some file sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN) # Run cluster analysis where connectivity radius is bigger than real radius = 2.0 neighbors = 3 amount_of_clusters = 3 optics_instance = optics(sample, radius, neighbors, amount_of_clusters) # Performs cluster analysis optics_instance.process() # Obtain results of clustering clusters = optics_instance.get_clusters() noise = optics_instance.get_noise() ordering = optics_instance.get_ordering() # Visualize ordering diagram analyser = ordering_analyser(ordering) ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters) # Visualize clustering results visualizer = cluster_visualizer() visualizer.append_clusters(clusters, sample) visualizer.show()Simulation of oscillatory network PCNN
from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer # Create Pulse-Coupled neural network with 10 oscillators. net = pcnn_network(10) # Perform simulation during 100 steps using binary external stimulus. dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1]) # Allocate synchronous ensembles from the output dynamic. ensembles = dynamic.allocate_sync_ensembles() # Show output dynamic. pcnn_visualizer.show_output_dynamic(dynamic, ensembles)Simulation of chaotic neural network CNN
from pyclustering.cluster import cluster_visualizer from pyclustering.samples.definitions import SIMPLE_SAMPLES from pyclustering.utils import read_sample from pyclustering.nnet.cnn import cnn_network, cnn_visualizer # Load stimulus from file. stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3) # Create chaotic neural network, amount of neurons should be equal to amount of stimulus. network_instance = cnn_network(len(stimulus)) # Perform simulation during 100 steps. steps = 100 output_dynamic = network_instance.simulate(steps, stimulus) # Display output dynamic of the network. cnn_visualizer.show_output_dynamic(output_dynamic) # Display dynamic matrix and observation matrix to show clustering phenomenon. cnn_visualizer.show_dynamic_matrix(output_dynamic) cnn_visualizer.show_observation_matrix(output_dynamic) # Visualize clustering results. clusters = output_dynamic.allocate_sync_ensembles(10) visualizer = cluster_visualizer() visualizer.append_clusters(clusters, stimulus) visualizer.show()Cluster allocation on FCPS dataset collection by DBSCAN:
Cluster allocation by OPTICS using cluster-ordering diagram:
Partial synchronization (clustering) in Sync oscillatory network:
Cluster visualization by SOM (Self-Organized Feature Map)




