Get Unique Values from a Column in Pandas DataFrame in Python5 Jan 2025 | 4 min read IntroductionOne of the most powerful data manipulation libraries in Python is Pandas. In addition, it provides a range of structured data functions. Actually about the DataFrames in particular, one often just needs to consider only unique values for a certain column. In this chapter we examine some of the methods for obtaining all those elements you require. Understanding Pandas DataFrameSo first off, we'll skip ahead a bit and quickly cover some basic facts about Pandas DataFrames. That is, before getting into the technical details of how to get unique values. A DataFrame is a two-dimensional labeled data table, with rows and columns. It was custom made for data work, sitting on the shoulders of NumPy. Output: Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Alice 25 New York 3 Charlie 35 Los Angeles 4 Bob 30 San Francisco Method 1: Using 'unique()' MethodPandas unique() method is an efficient way to get the unique elements of a column. It returns an array containing only unique values, in the order that they appear in DataFrame. Output: Unique Names: ['Alice' 'Bob' 'Charlie'] In this piece of code, 'unique()' in Pandas gets the unique values from df['Name'] column. unique_names is an array of the original names displayed in order. This print statement displays these special names. Method 2: Using 'value_counts()' MethodIn addition to giving unique values, the 'value_counts()' method also counts their occurrences. If you want to know how many times each unique element occurs in a given column, it can be very useful. Output: Name Counts: Bob 2 Alice 2 Charlie 1 Name: Name, dtype: int64 Here, the 'value_counts()' method is used to extract both unique names and their counts from that Name column. The result of name_counts is a Pandas Series providing the frequency distribution for all unique names. Method 3: Using 'drop_duplicates()' MethodA second way to get unique values is the 'drop_duplicates()' method. Unlike unique(), this method returns a new DataFrame containing no duplicates. Output: DataFrame with Unique Names: Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 3 Charlie 35 Los Angeles Drop duplicate rows based on the 'Name' column (unique_df) using drop_duplicates(). As a result, our DataFrame retains only the first instance of each unique name and we have set which is clean. Method 4: Applying a SetBy definition, Python's set stores only unique elements. If we change a column to set, then finding all the distinct values is easy. Output: Unique Cities: {'San Francisco', 'Los Angeles', 'New York'} This tiny piece of code turns the 'City' column into a set (unique_cities). Since sets, by definition, contain only non-repeated elements this procedure finds the city names that are different from DataFrame and prints them. Method 5: Using 'nunique()' MethodThe method 'nunique()' returns the number of unique elements in a column. It's especially good when what you want is a count of unique values but without having to enumerate them. Output: Number of Unique Names: 3 'nunique()' calculates the number of unique names in that column, returning a single numeric value (num_unique_names). The print statement shows the number of unique names. Method 6: Custom Functions for Unique ValuesIn other cases, you will have to introduce custom logic of your own in order determine unique values. It could also involve using a function which checks for uniqueness based on certain criteria. Output: Unique Names based on Custom Logic: [] A custom function ('custom_unique_check') is defined to check uniqueness according to a specific standard, for example that the name be even in length. This function is then applied to 'Name' using the 'apply() method, and the resulting DataFrame contains all values meeting our custom condition. The names that meet the criterion are then printed in a unique form. ConclusionIn this exhaustive guide we went over how to extract novel values from a column in Pandas DataFrame. Whether your precision requirements dictate the use of built-in methods such as 'unique()', 'value_counts()' and/or, drop duplicates (), or you choose to write custom functions, Pandas offers a range options for meeting all manner of needs. Knowing these skills are essential for data cleansing, preprocessing and analysis work which enable you to appreciate what makes your datasets special. As you continue your work with Pandas DataFrames, learning these methods will make it easier to break down and extract information from your data. Next TopicGet-utc-timestamp-in-python |
An Introduction to input() Function in Python Getting user input is a basic component of interactive programming in Python. The input() function allows the user to enter data using the keyboard by prompting them for input. By default, it records user responses as texts. These can subsequently...
3 min read
When working with Python's requests library, we often make HTTP requests to specific URIs (Uniform Resource Identifiers). These requests return a response object that contains various properties and methods to interact with the data received from the server. One of these properties is response.text. It provides...
2 min read
Introduction Web scraping has evolved to another level, with the need to extract data from dynamic websites. While traditional websites are commonly built in HTML and just display fixed content, dynamic websites can build their content on the fly with the help of client-side scripting languages...
9 min read
Python is a high-level, interpreted programming language recognized for its simplicity and readability, making it perfect for beginners and experienced builders. Created via Guido van Rossum and primarily released in 1991, Python emphasizes code readability with its use of widespread indentation. It helps with a...
4 min read
Python is a high-level language with the advantages of easy learning and understandability to implement programs on computers, whether for new learners and old learners. The development of this program began in the year 1991 by a man called Guido Van Rossum. Is compatible with multiple...
4 min read
Introduction The two functions json.load() and json.loads() in the Python json module are used to parse JSON data into Python objects. Their input sources are what separates them from one another. json.load() is useful when working with JSON data included in files since it can read the...
6 min read
Understanding the Social Media Automation Automation refers to using tools scripts to manage repetitive tasks Posting content engaging with followers, managing accounts. This is done without manual interaction. For marketers influencers Businesses this can provide the ability to Consistency: It's possible to maintain Regular...
6 min read
A Sudoku is a type of puzzle with number placement. The objective of this game is to complete a square grid of n size with numbers from 0 - 9 or 1 - n. The number in the Sudoku must be placed in each column,...
23 min read
? Introduction The Python programming language is renowned for its simplicity, readability, and versatility, continuously evolving to address the issues of engineers around the world. Among the various improvements presented throughout the long term, one of the most outstanding is the walrus operator (:=), an assignment expression...
7 min read
Introduction: In this tutorial, we are learning about finding a path to the given file using Python. Python users work with data frequently, especially when modifying, reading, or writing data to a file. But before you start working on the data, you need to define the...
5 min read
We request you to subscribe our newsletter for upcoming updates.
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India