Skip to content

Commit 80947cd

Browse files
authored
Merge branch 'main' into add/avl-trees
2 parents 65e2aa4 + a9797cd commit 80947cd

File tree

8 files changed

+257
-1
lines changed

8 files changed

+257
-1
lines changed

contrib/ds-algorithms/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,5 @@
1616
- [Two Pointer Technique](two-pointer-technique.md)
1717
- [Hashing through Linear Probing](hashing-linear-probing.md)
1818
- [Hashing through Chaining](hashing-chaining.md)
19-
- [AVL Trees](avl-trees.md)
19+
- [AVL Trees](avl-trees.md)
20+
- [Splay Trees](splay-trees.md)
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Splay Tree
2+
3+
In Data Structures and Algorithms, a **Splay Tree** is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, search, and deletion in O(log n) amortized time. This is achieved by a process called **splaying**, where the accessed node is moved to the root through a series of tree rotations.
4+
5+
## Points to be Remembered
6+
7+
- **Splaying**: Moving the accessed node to the root using rotations.
8+
- **Rotations**: Tree rotations (left and right) are used to balance the tree during splaying.
9+
- **Self-adjusting**: The tree adjusts itself with each access, keeping frequently accessed nodes near the root.
10+
11+
## Real Life Examples of Splay Trees
12+
13+
- **Cache Implementation**: Frequently accessed data is kept near the top of the tree, making repeated accesses faster.
14+
- **Networking**: Routing tables in network switches can use splay trees to prioritize frequently accessed routes.
15+
16+
## Applications of Splay Trees
17+
18+
Splay trees are used in various applications in Computer Science:
19+
20+
- **Cache Implementations**
21+
- **Garbage Collection Algorithms**
22+
- **Data Compression Algorithms (e.g., LZ78)**
23+
24+
Understanding these applications is essential for Software Development.
25+
26+
## Operations in Splay Tree
27+
28+
Key operations include:
29+
30+
- **INSERT**: Insert a new element into the splay tree.
31+
- **SEARCH**: Find the position of an element in the splay tree.
32+
- **DELETE**: Remove an element from the splay tree.
33+
34+
## Implementing Splay Tree in Python
35+
36+
```python
37+
class SplayTreeNode:
38+
def __init__(self, key):
39+
self.key = key
40+
self.left = None
41+
self.right = None
42+
43+
class SplayTree:
44+
def __init__(self):
45+
self.root = None
46+
47+
def insert(self, key):
48+
self.root = self.splay_insert(self.root, key)
49+
50+
def search(self, key):
51+
self.root = self.splay_search(self.root, key)
52+
return self.root
53+
54+
def splay(self, root, key):
55+
if not root or root.key == key:
56+
return root
57+
58+
if root.key > key:
59+
if not root.left:
60+
return root
61+
if root.left.key > key:
62+
root.left.left = self.splay(root.left.left, key)
63+
root = self.rotateRight(root)
64+
elif root.left.key < key:
65+
root.left.right = self.splay(root.left.right, key)
66+
if root.left.right:
67+
root.left = self.rotateLeft(root.left)
68+
return root if not root.left else self.rotateRight(root)
69+
70+
else:
71+
if not root.right:
72+
return root
73+
if root.right.key > key:
74+
root.right.left = self.splay(root.right.left, key)
75+
if root.right.left:
76+
root.right = self.rotateRight(root.right)
77+
elif root.right.key < key:
78+
root.right.right = self.splay(root.right.right, key)
79+
root = self.rotateLeft(root)
80+
return root if not root.right else self.rotateLeft(root)
81+
82+
def splay_insert(self, root, key):
83+
if not root:
84+
return SplayTreeNode(key)
85+
86+
root = self.splay(root, key)
87+
88+
if root.key == key:
89+
return root
90+
91+
new_node = SplayTreeNode(key)
92+
93+
if root.key > key:
94+
new_node.right = root
95+
new_node.left = root.left
96+
root.left = None
97+
else:
98+
new_node.left = root
99+
new_node.right = root.right
100+
root.right = None
101+
102+
return new_node
103+
104+
def splay_search(self, root, key):
105+
return self.splay(root, key)
106+
107+
def rotateRight(self, node):
108+
temp = node.left
109+
node.left = temp.right
110+
temp.right = node
111+
return temp
112+
113+
def rotateLeft(self, node):
114+
temp = node.right
115+
node.right = temp.left
116+
temp.left = node
117+
return temp
118+
119+
def preOrder(self, root):
120+
if root:
121+
print(root.key, end=' ')
122+
self.preOrder(root.left)
123+
self.preOrder(root.right)
124+
125+
#Example usage:
126+
splay_tree = SplayTree()
127+
splay_tree.insert(50)
128+
splay_tree.insert(30)
129+
splay_tree.insert(20)
130+
splay_tree.insert(40)
131+
splay_tree.insert(70)
132+
splay_tree.insert(60)
133+
splay_tree.insert(80)
134+
135+
print("Preorder traversal of the Splay tree is:")
136+
splay_tree.preOrder(splay_tree.root)
137+
138+
splay_tree.search(60)
139+
140+
print("\nSplay tree after search operation for key 60:")
141+
splay_tree.preOrder(splay_tree.root)
142+
```
143+
144+
## Output
145+
146+
```markdown
147+
Preorder traversal of the Splay tree is:
148+
50 30 20 40 70 60 80
149+
150+
Splay tree after search operation for key 60:
151+
60 50 30 20 40 70 80
152+
```
153+
154+
## Complexity Analysis
155+
156+
The worst-case time complexities of the main operations in a Splay Tree are as follows:
157+
158+
- **Insertion**: (O(n)). In the worst case, insertion may take linear time if the tree is highly unbalanced.
159+
- **Search**: (O(n)). In the worst case, searching for a node may take linear time if the tree is highly unbalanced.
160+
- **Deletion**: (O(n)). In the worst case, deleting a node may take linear time if the tree is highly unbalanced.
161+
162+
While these operations can take linear time in the worst case, the splay operation ensures that the tree remains balanced over a sequence of operations, leading to better average-case performance.
541 KB
Loading
12.4 KB
Loading
12.7 KB
Loading
6.01 KB
Loading

contrib/machine-learning/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@
1919
- [Hierarchical Clustering](hierarchical-clustering.md)
2020
- [Grid Search](grid-search.md)
2121
- [Transformers](transformers.md)
22+
- [K-Means](kmeans.md)
2223
- [K-nearest neighbor (KNN)](knn.md)

contrib/machine-learning/kmeans.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# K-Means Clustering
2+
Unsupervised Learning Algorithm for Grouping Similar Data.
3+
4+
## Introduction
5+
K-means clustering is a fundamental unsupervised machine learning algorithm that excels at grouping similar data points together. It's a popular choice due to its simplicity and efficiency in uncovering hidden patterns within unlabeled datasets.
6+
7+
## Unsupervised Learning
8+
Unlike supervised learning algorithms that rely on labeled data for training, unsupervised algorithms, like K-means, operate solely on input data (without predefined categories). Their objective is to discover inherent structures or groupings within the data.
9+
10+
## The K-Means Objective
11+
Organize similar data points into clusters to unveil underlying patterns. The main objective is to minimize total intra-cluster variance or the squared function.
12+
13+
![image](assets/knm.png)
14+
## Clusters and Centroids
15+
A cluster represents a collection of data points that share similar characteristics. K-means identifies a pre-determined number (k) of clusters within the dataset. Each cluster is represented by a centroid, which acts as its central point (imaginary or real).
16+
17+
## Minimizing In-Cluster Variation
18+
The K-means algorithm strategically assigns each data point to a cluster such that the total variation within each cluster (measured by the sum of squared distances between points and their centroid) is minimized. In simpler terms, K-means strives to create clusters where data points are close to their respective centroids.
19+
20+
## The Meaning Behind "K-Means"
21+
The "means" in K-means refers to the averaging process used to compute the centroid, essentially finding the center of each cluster.
22+
23+
## K-Means Algorithm in Action
24+
![image](assets/km_.png)
25+
The K-means algorithm follows an iterative approach to optimize cluster formation:
26+
27+
1. **Initial Centroid Placement:** The process begins with randomly selecting k centroids to serve as initial reference points for each cluster.
28+
2. **Data Point Assignment:** Each data point is assigned to the closest centroid, effectively creating a preliminary clustering.
29+
3. **Centroid Repositioning:** Once data points are assigned, the centroids are recalculated by averaging the positions of the points within their respective clusters. These new centroids represent the refined centers of the clusters.
30+
4. **Iteration Until Convergence:** Steps 2 and 3 are repeated iteratively until a stopping criterion is met. This criterion can be either:
31+
- **Centroid Stability:** No significant change occurs in the centroids' positions, indicating successful clustering.
32+
- **Reaching Maximum Iterations:** A predefined number of iterations is completed.
33+
34+
## Code
35+
Following is a simple implementation of K-Means.
36+
37+
```python
38+
# Generate and Visualize Sample Data
39+
# import the necessary Libraries
40+
41+
import numpy as np
42+
import matplotlib.pyplot as plt
43+
44+
# Create data points for cluster 1 and cluster 2
45+
X = -2 * np.random.rand(100, 2)
46+
X1 = 1 + 2 * np.random.rand(50, 2)
47+
48+
# Combine data points from both clusters
49+
X[50:100, :] = X1
50+
51+
# Plot data points and display the plot
52+
plt.scatter(X[:, 0], X[:, 1], s=50, c='b')
53+
plt.show()
54+
55+
# K-Means Model Creation and Training
56+
from sklearn.cluster import KMeans
57+
58+
# Create KMeans object with 2 clusters
59+
kmeans = KMeans(n_clusters=2)
60+
kmeans.fit(X) # Train the model on the data
61+
62+
# Visualize Data Points with Centroids
63+
centroids = kmeans.cluster_centers_ # Get centroids (cluster centers)
64+
65+
plt.scatter(X[:, 0], X[:, 1], s=50, c='b') # Plot data points again
66+
plt.scatter(centroids[0, 0], centroids[0, 1], s=200, c='g', marker='s') # Plot centroid 1
67+
plt.scatter(centroids[1, 0], centroids[1, 1], s=200, c='r', marker='s') # Plot centroid 2
68+
plt.show() # Display the plot with centroids
69+
70+
# Predict Cluster Label for New Data Point
71+
new_data = np.array([-3.0, -3.0])
72+
new_data_reshaped = new_data.reshape(1, -1)
73+
predicted_cluster = kmeans.predict(new_data_reshaped)
74+
print("Predicted cluster for new data:", predicted_cluster)
75+
```
76+
77+
### Output:
78+
Before Implementing K-Means Clustering
79+
![Before Implementing K-Means Clustering](assets/km_2.png)
80+
81+
After Implementing K-Means Clustering
82+
![After Implementing K-Means Clustering](assets/km_3.png)
83+
84+
Predicted cluster for new data: `[0]`
85+
86+
## Conclusion
87+
**K-Means** can be applied to data that has a smaller number of dimensions, is numeric, and is continuous or can be used to find groups that have not been explicitly labeled in the data. As an example, it can be used for Document Classification, Delivery Store Optimization, or Customer Segmentation.
88+
89+
## References
90+
91+
- [Survey of Machine Learning and Data Mining Techniques used in Multimedia System](https://www.researchgate.net/publication/333457161_Survey_of_Machine_Learning_and_Data_Mining_Techniques_used_in_Multimedia_System?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)
92+
- [A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database](https://www.researchgate.net/publication/339267868_A_Clustering_Approach_for_Outliers_Detection_in_a_Big_Point-of-Sales_Database?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)

0 commit comments

Comments
 (0)