Python Program to Find Duplicate Sets in a List of Sets17 Mar 2025 | 9 min read In this article, we'll examine a variety of Python programs that let us quickly spot duplicate sets among a list of sets. To complete this task, we will make use of Python's robust set operations and functional programming features. We will also go over several techniques and approaches for dealing with duplicates according to particular needs. In this article, we will discover effective methods for comparing sets for equality, identifying duplicate sets based on their elements, and removing or modifying duplicate sets from a list of sets. Introduction to the Problem: The given list of sets may contain sets with duplicate elements, and the task is to efficiently identify these duplicate sets based on their contents and take appropriate actions, such as removing or modifying them. The problem can be divided into the following essential steps:
Python program to remove duplicate sets from a list of sets: Uising frozenset() and set comprehensionWe can use the set() function, which removes duplicates from a list, to remove the duplicate sets from the Set_List, However, since sets are not hashable, we need to convert each set in Set_List to a frozenset object which is hashable. Output: Unique Sets: [frozenset({1, 2, 3}), frozenset({5, 6, 7}), frozenset({9, 2, 4, 7}), frozenset({4, 5, 6, 7})] Now let's get started and explore the Python programs that find duplicate sets in a list of sets. Method 1 - Using for-Loop and frozenset()In this method, we will iterate over each set in Set_List and find the duplicate sets using the set lookup operation. If we find any duplicate set, we will store it in a new list. Output: Duplicate sets found: [{1, 2, 3}, {5, 6, 7}] In this program, we create an empty set called seen_sets to keep track of sets we've seen before and duplicate_sets to store duplicate sets. We then iterate over each set s in Set_List. For each set, we create a frozenset object 'frozenset_s', which is hashable. We then check if frozenset_s is already in seen_sets. If it is, then s is a duplicate set, and we append it to duplicate_sets. If it isn't, we add frozenset_s to seen_sets to check for duplicates in future iterations. Finally, if duplicate_sets is not empty, we print out each duplicate set. Otherwise, we print out a message indicating no duplicate sets were found. Time Complexity: O(n) - The program uses a single for-loop to iterate through each set in Set_List. So, the time complexity of the loop is O(n), where n is the length of Set_List. Within the loop, there is a set lookup operation to check if a frozenset has already been seen, which takes O(1) time. Where n is the number of sets in the set_list. Space Complexity: O(m) - where m is the number of unique sets. Since each set is stored as a frozenset in seen_sets, the maximum size of seen_sets is equal to the number of unique sets in the input list. In the worst case, it can be O(n) when m becomes equal to n, and in the best case, it can be O(1) when m becomes equal to 1. Method 2 - Using Counter() from collectionsIn this method, we will use the Counter from collections to count the occurrences of each set. Then, we will filter out the set with an occurrence count of more than 1 and store them into the duplicate_sets as usual. Output: Duplicate sets found: [{1, 2, 3}, {9, 2, 4, 7}, {5, 6, 7}] In this method, we have used list comprehension to convert each set into a frozen sets. It is necessary because sets are not hashable and cannot be used as a key in the dictionary. Then we passed this list of frozen sets to the counter class, which returns a Counter object with each frozen set as a key and their count as the key's value. We have then filtered out the keys which have a count value greater than 1 and stored them in duplicate_sets after converting them again into a set. And at the end of the program, we printed the result. Time complexity: O(n) - The for-loop iterates over the set_list once and performs constant-time lookups and comparisons of frozen sets. Where n is the number of sets in the set_list. Space Complexity: O(m) - where m is the number of unique sets. The counter object stores the count of each set, and the space required by it depends on the number of unique sets. And the duplicate sets also require some space which is comparatively very small. So, the overall space complexity of the program becomes O(m). Method 3 - Using defaultdict() from collectionsIn this method, we will use defaultdict from collections to keep track of the number of occurrences of each set in set_list. After counting the frequency, we will use list comprehension to filter out duplicate sets. Output: Duplicate sets found: [{1, 12, 5}, {4, 5, 6, 7}] In the above program, we have created a defaultdict object, 'set_counts', a subclass of dict in Python, to keep track of the frequency of each set in set_list. We iterate over each set in set_list, and increase the frequency count by 1 using set_counts[frozenset(s)] += 1. After that, we filtered out the sets which have a frequency count greater than 1 indicating duplicate sets using list comprehension. And finally, we have printed the result. Time Complexity: O(n) - Here also, we have iterated over the set_list, which takes O(n) time, and we have again iterated over set_list in the list comprehension method, which also takes O(n) time. Overall, the time complexity becomes O(n) + O(n) = O(n), where n is the number of sets in set_list. Space Complexity: O(m) - where m is the number of unique sets in the set_list. The set_counts dictionary stores the frequency of unique sets in the set_list, and the duplicate_sets list stores the duplicate sets. In the worst case, if m @ n means all the sets are unique, the space complexity will rise to O(n), and in the best case, where all the sets are duplicates (m = 1), then the space complexity will become constant, O(1). Method 4 - Using Hashing with DictionaryHashing refers to the method of converting a non-hashable object into a hashable and immutable form. This is done so that we can use the sets in set_list as keys in a dictionary that maps each key to a different hash value using a hash function. Output: Duplicate sets found: [{11, 22, 23}, {4, 5, 6, 7}] In this method, we have used a dictionary to store the hash value of each set and its frequency. We first iterate over each set in set_list and get the hash value using the hash() function. Then we check whether the hash value is already in the dictionary or not. If yes, we increase the frequency of the set. Otherwise, we set the frequency as 1 for that set. We have again iterated over the sets in set_list to find the duplicate sets and check their frequencies. The set having a frequency greater than one means it is a duplicate set and needs to be added to the duplicate_sets list. And at the end of the program, we have printed the result. Time Complexity: O(n) - In the above program, we have iterated over the set_list twice O(n) + O(n) = O(n), where n is the number of sets in the set_list. Other operations are performed in constant time. Space Complexity: O(n) - The size of the freq_dict and duplicate_sets list is proportional to the size of the set_list, O(n) + O(n) = O(n), where n is the number of sets in the set_list. Next TopicAugmented Reality (AR) in Python |
What is HTTP? HTTP stands for HyperText Transfer Protocol which has some set of rules which determines the communication or data transfer into client-server architecture. The client is generally a browser, and the Server is the source where information is available already, and the Client requests the...
3 min read
If we are given a linked list with each node being a separate linked list and two pointers of the same type: One pointer points towards the node of the primary linked list (referred to as the 'prim' pointer in the program below). One pointer points towards...
11 min read
Do you consider how Netflix proposes motion pictures that adjust your inclinations to such an extent? Or, on the other hand, perhaps you need to fabricate a framework that can make such ideas to its clients as well? If your response was true, you've come to the...
11 min read
In this article, we will automate the operation of sending Instagram Messages using Python. First of all, let us look at what is Instagram. Instagram is a prominent social media platform that focuses on the photo and video sharing. It's been operating since 2010 and has maintained...
19 min read
Python Projects (Advanced) tutorial is for expanding and amplifying your ambition to become the most successful person in the world. Python is a programming language that, most of the time, can be quite simple to use in order to do tasks more quickly and effectively. Python...
23 min read
Introduction Python is a widely used high-level programming language with versatile applications in multiple fields, such as web development, data science, artificial intelligence, machine learning, and many more. Python has gained tremendous popularity in the scientific community for its simplicity, ease of use, and compatibility with multiple...
3 min read
The operator overloading in Python means provide extended meaning beyond their predefined operational meaning. Such as, we use the "+" operator for adding two integers as well as joining two strings or merging two lists. We can achieve this as the "+" operator is overloaded by...
5 min read
What is Box Plot? A Box plot is a way to visualize the distribution of the data by using a box and some vertical lines. It is known as the whisker plot. The data can be distributed between five key ranges, which are as follows: Minimum: Q1-1.5*IQR 1st quartile...
3 min read
In this tutorial, we will learn how we can implement and use the . In 1996, DBSCAN or Density-Based Spatial Clustering of Applications with Noise, a clustering algorithm, was first proposed, and it was awarded the 'Test of Time' award in the year 2014. The 'Test of...
9 min read
In this tutorial, we will discuss how we can print the Pascal triangle using the Python program. But first, let's understand what the Pascal triangle is. Introduction Pascal triangle is an exciting concept of mathematics where a triangular array is formed by summing adjacent elements in the preceding...
5 min read
We request you to subscribe our newsletter for upcoming updates.
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India