python - How to select rows that do not start with some str in pandas?

Python - How to select rows that do not start with some str in pandas?

To select rows in a Pandas DataFrame where a specific column does not start with a particular string, you can use the ~ (tilde) operator along with the .str accessor to negate the condition. Here's how you can achieve this:

Example:

Suppose you have a DataFrame like this:

import pandas as pd # Sample DataFrame data = { 'Column1': ['apple', 'banana', 'orange', 'kiwi', 'grape'], 'Column2': [10, 20, 15, 25, 30] } df = pd.DataFrame(data) print("Original DataFrame:") print(df) 

Output:

 Column1 Column2 0 apple 10 1 banana 20 2 orange 15 3 kiwi 25 4 grape 30 

Select Rows where Column1 does not start with 'b'

To select rows where values in Column1 do not start with 'b', you can use the following approach:

# Select rows where Column1 does not start with 'b' filtered_df = df[~df['Column1'].str.startswith('b')] print("\nFiltered DataFrame:") print(filtered_df) 

Output:

 Column1 Column2 0 apple 10 2 orange 15 3 kiwi 25 4 grape 30 

Explanation:

  • df['Column1'].str.startswith('b'): This condition creates a boolean mask where each element in Column1 is checked to see if it starts with 'b'.

  • ~ (tilde operator): Negates the boolean mask, selecting rows where the condition is False.

  • df[~df['Column1'].str.startswith('b')]: Filters the DataFrame rows based on the negated condition.

Notes:

  • Adjust 'b' in startswith('b') to any other string prefix as per your requirement.
  • This method works efficiently with string columns in Pandas DataFrames and allows for flexible filtering based on string conditions.

By using this approach, you can effectively filter rows in a Pandas DataFrame based on whether a specific column's string values do not start with a specified substring.

Examples

  1. Select rows not starting with a specific string using str.startswith() and negation (~):

    • Description: This query focuses on excluding rows from a pandas DataFrame where a specific column does not start with a given string.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.startswith('a')] print(filtered_df) 
  2. Exclude rows based on multiple starting strings using str.startswith() and | (OR operator):

    • Description: This query extends the previous one to exclude rows where a column starts with any of multiple specified strings.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' or 'b' filtered_df = df[~df['col1'].str.startswith(('a', 'b'))] print(filtered_df) 
  3. Case-insensitive exclusion using str.startswith() with case=False:

    • Description: This query shows how to perform case-insensitive exclusion based on a starting string.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['Apple', 'banana', 'orange', 'Grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' (case-insensitive) filtered_df = df[~df['col1'].str.startswith('a', case=False)] print(filtered_df) 
  4. Exclude rows where the starting string is NaN or missing:

    • Description: This query addresses excluding rows where the starting string in a column is NaN or missing.
    • Code:
      import pandas as pd import numpy as np # Sample DataFrame with NaN values data = {'col1': ['apple', np.nan, 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with NaN filtered_df = df[~df['col1'].str.startswith('nan', na=False)] print(filtered_df) 
  5. Select rows where the entire column does not start with a string:

    • Description: This query demonstrates how to exclude rows where the entire content of a column does not start with a specified string.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple pie', 'banana split', 'orange juice', 'grape soda'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'apple' filtered_df = df[~df['col1'].str.startswith('apple')] print(filtered_df) 
  6. Exclude rows based on a regular expression pattern using str.contains() and negation (~):

    • Description: This query shows how to use regular expressions to exclude rows based on a pattern that does not start with a string.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' filtered_df = df[~df['col1'].str.contains(r'^(?!a)')] print(filtered_df) 
  7. Exclude rows where the string starts with whitespace characters using str.lstrip():

    • Description: This query addresses excluding rows where a column starts with whitespace characters.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': [' apple', 'banana', ' orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with whitespace characters filtered_df = df[~df['col1'].str.lstrip().str.startswith('')] print(filtered_df) 
  8. Exclude rows where the starting string is empty using str.len():

    • Description: This query demonstrates how to exclude rows where a column starts with an empty string.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with an empty string filtered_df = df[~(df['col1'].str.len() == 0)] print(filtered_df) 
  9. Exclude rows based on a condition using a lambda function with apply():

    • Description: This query illustrates using a lambda function with apply() to exclude rows based on a custom condition.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with 'a' using apply with a lambda function filtered_df = df[df.apply(lambda x: not x['col1'].startswith('a'), axis=1)] print(filtered_df) 
  10. Exclude rows where the starting string matches a list of values using str.startswith() and ~:

    • Description: This query demonstrates excluding rows where a column starts with any value from a specified list.
    • Code:
      import pandas as pd # Sample DataFrame data = {'col1': ['apple', 'banana', 'orange', 'grape'], 'col2': [10, 15, 8, 12]} df = pd.DataFrame(data) # Select rows where 'col1' does not start with any value in the list exclude_list = ['a', 'o'] filtered_df = df[~df['col1'].str.startswith(tuple(exclude_list))] print(filtered_df) 

More Tags

predict crontrigger supercsv inno-setup redirectstandardoutput github ijson devexpress postconstruct innerhtml

More Programming Questions

More Financial Calculators

More Trees & Forestry Calculators

More Mixtures and solutions Calculators

More Transportation Calculators