To select rows in a Pandas DataFrame where a specific column does not start with a particular string, you can use the ~ (tilde) operator along with the .str accessor to negate the condition. Here's how you can achieve this:
Suppose you have a DataFrame like this:
import pandas as pd # Sample DataFrame data = { 'Column1': ['apple', 'banana', 'orange', 'kiwi', 'grape'], 'Column2': [10, 20, 15, 25, 30] } df = pd.DataFrame(data) print("Original DataFrame:") print(df) Column1 Column2 0 apple 10 1 banana 20 2 orange 15 3 kiwi 25 4 grape 30
To select rows where values in Column1 do not start with 'b', you can use the following approach:
# Select rows where Column1 does not start with 'b' filtered_df = df[~df['Column1'].str.startswith('b')] print("\nFiltered DataFrame:") print(filtered_df) Column1 Column2 0 apple 10 2 orange 15 3 kiwi 25 4 grape 30
df['Column1'].str.startswith('b'): This condition creates a boolean mask where each element in Column1 is checked to see if it starts with 'b'.
~ (tilde operator): Negates the boolean mask, selecting rows where the condition is False.
df[~df['Column1'].str.startswith('b')]: Filters the DataFrame rows based on the negated condition.
'b' in startswith('b') to any other string prefix as per your requirement.By using this approach, you can effectively filter rows in a Pandas DataFrame based on whether a specific column's string values do not start with a specified substring.
Select rows not starting with a specific string using str.startswith() and negation (~):
import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.startswith('a')] print(filtered_df) Exclude rows based on multiple starting strings using str.startswith() and | (OR operator):
import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' or 'b' filtered_df = df[~df['col1'].str.startswith(('a', 'b'))] print(filtered_df) Case-insensitive exclusion using str.startswith() with case=False:
import pandas as pd # Sample DataFrame data = {'col1': ['Apple', 'banana', 'orange', 'Grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' (case-insensitive) filtered_df = df[~df['col1'].str.startswith('a', case=False)] print(filtered_df) Exclude rows where the starting string is NaN or missing:
import pandas as pd import numpy as np # Sample DataFrame with NaN values data = {'col1': ['apple', np.nan, 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with NaN filtered_df = df[~df['col1'].str.startswith('nan', na=False)] print(filtered_df) Select rows where the entire column does not start with a string:
import pandas as pd # Sample DataFrame data = {'col1': ['apple pie', 'banana split', 'orange juice', 'grape soda'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'apple' filtered_df = df[~df['col1'].str.startswith('apple')] print(filtered_df) Exclude rows based on a regular expression pattern using str.contains() and negation (~):
import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.contains(r'^(?!a)')] print(filtered_df) Exclude rows where the string starts with whitespace characters using str.lstrip():
import pandas as pd # Sample DataFrame data = {'col1': [' apple', 'banana', ' orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with whitespace characters filtered_df = df[~df['col1'].str.lstrip().str.startswith('')] print(filtered_df) Exclude rows where the starting string is empty using str.len():
import pandas as pd # Sample DataFrame data = {'col1': ['', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with an empty string filtered_df = df[~(df['col1'].str.len() == 0)] print(filtered_df) Exclude rows based on a condition using a lambda function with apply():
apply() to exclude rows based on a custom condition.import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' using apply with a lambda function filtered_df = df[df.apply(lambda x: not x['col1'].startswith('a'), axis=1)] print(filtered_df) Exclude rows where the starting string matches a list of values using str.startswith() and ~:
import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with any value in the list exclude_list = ['a', 'o'] filtered_df = df[~df['col1'].str.startswith(tuple(exclude_list))] print(filtered_df) predict crontrigger supercsv inno-setup redirectstandardoutput github ijson devexpress postconstruct innerhtml