Python - How to select rows that do not start with some str in pandas?

To select rows in a Pandas DataFrame where a specific column does not start with a particular string, you can use the ~ (tilde) operator along with the .str accessor to negate the condition. Here's how you can achieve this:

Example:

Suppose you have a DataFrame like this:

import pandas as pd # Sample DataFrame data = { 'Column1': ['apple', 'banana', 'orange', 'kiwi', 'grape'], 'Column2': [10, 20, 15, 25, 30] } df = pd.DataFrame(data) print("Original DataFrame:") print(df)

Output:

 Column1 Column2 0 apple 10 1 banana 20 2 orange 15 3 kiwi 25 4 grape 30

Select Rows where Column1 does not start with 'b'

To select rows where values in Column1 do not start with 'b', you can use the following approach:

# Select rows where Column1 does not start with 'b' filtered_df = df[~df['Column1'].str.startswith('b')] print("\nFiltered DataFrame:") print(filtered_df)

Output:

 Column1 Column2 0 apple 10 2 orange 15 3 kiwi 25 4 grape 30

Explanation:

df['Column1'].str.startswith('b'): This condition creates a boolean mask where each element in Column1 is checked to see if it starts with 'b'.
~ (tilde operator): Negates the boolean mask, selecting rows where the condition is False.
df[~df['Column1'].str.startswith('b')]: Filters the DataFrame rows based on the negated condition.

Notes:

Adjust 'b' in startswith('b') to any other string prefix as per your requirement.
This method works efficiently with string columns in Pandas DataFrames and allows for flexible filtering based on string conditions.

By using this approach, you can effectively filter rows in a Pandas DataFrame based on whether a specific column's string values do not start with a specified substring.

Examples

Select rows not starting with a specific string using str.startswith() and negation (~):

Description: This query focuses on excluding rows from a pandas DataFrame where a specific column does not start with a given string.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.startswith('a')] print(filtered_df)

Exclude rows based on multiple starting strings using str.startswith() and | (OR operator):

Description: This query extends the previous one to exclude rows where a column starts with any of multiple specified strings.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' or 'b' filtered_df = df[~df['col1'].str.startswith(('a', 'b'))] print(filtered_df)

Case-insensitive exclusion using str.startswith() with case=False:

Description: This query shows how to perform case-insensitive exclusion based on a starting string.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['Apple', 'banana', 'orange', 'Grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' (case-insensitive) filtered_df = df[~df['col1'].str.startswith('a', case=False)] print(filtered_df)

Exclude rows where the starting string is NaN or missing:

Description: This query addresses excluding rows where the starting string in a column is NaN or missing.

Code:

import pandas as pd import numpy as np # Sample DataFrame with NaN values data = {'col1': ['apple', np.nan, 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with NaN filtered_df = df[~df['col1'].str.startswith('nan', na=False)] print(filtered_df)

Select rows where the entire column does not start with a string:

Description: This query demonstrates how to exclude rows where the entire content of a column does not start with a specified string.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple pie', 'banana split', 'orange juice', 'grape soda'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'apple' filtered_df = df[~df['col1'].str.startswith('apple')] print(filtered_df)

Exclude rows based on a regular expression pattern using str.contains() and negation (~):

Description: This query shows how to use regular expressions to exclude rows based on a pattern that does not start with a string.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.contains(r'^(?!a)')] print(filtered_df)

Exclude rows where the string starts with whitespace characters using str.lstrip():

Description: This query addresses excluding rows where a column starts with whitespace characters.

Code:

import pandas as pd # Sample DataFrame data = {'col1': [' apple', 'banana', ' orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with whitespace characters filtered_df = df[~df['col1'].str.lstrip().str.startswith('')] print(filtered_df)

Exclude rows where the starting string is empty using str.len():

Description: This query demonstrates how to exclude rows where a column starts with an empty string.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with an empty string filtered_df = df[~(df['col1'].str.len() == 0)] print(filtered_df)

Exclude rows based on a condition using a lambda function with apply():

Description: This query illustrates using a lambda function with apply() to exclude rows based on a custom condition.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' using apply with a lambda function filtered_df = df[df.apply(lambda x: not x['col1'].startswith('a'), axis=1)] print(filtered_df)

Exclude rows where the starting string matches a list of values using str.startswith() and ~:

Description: This query demonstrates excluding rows where a column starts with any value from a specified list.

Code:

import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with any value in the list exclude_list = ['a', 'o'] filtered_df = df[~df['col1'].str.startswith(tuple(exclude_list))] print(filtered_df)

More Tags

predict crontrigger supercsv inno-setup redirectstandardoutput github ijson devexpress postconstruct innerhtml

Python - How to select rows that do not start with some str in pandas?

Example:

Output:

Select Rows where Column1 does not start with 'b'

Output:

Explanation:

Notes:

Examples

More Tags

More Programming Questions

More Financial Calculators

More Trees & Forestry Calculators

More Mixtures and solutions Calculators

More Transportation Calculators

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators