Read XLSB File in Pandas Python

Read XLSB File in Pandas Python

To read XLSB (Excel Binary Workbook) files in pandas, you can use the pyxlsb library, which is specifically designed to handle binary Excel files. Here's how you can do it:

  • Install the pyxlsb library:

You can install the pyxlsb library using pip:

pip install pyxlsb 
  • Read XLSB file using pandas and pyxlsb:
import pandas as pd from pyxlsb import open_workbook # Specify the path to the XLSB file xlsb_path = 'path/to/your/file.xlsb' # Open the XLSB file with open_workbook(xlsb_path) as wb: # Specify the sheet name or index sheet_name = 'Sheet1' # Change this to your sheet name or index # Get the specified sheet with wb.get_sheet(sheet_name) as sheet: # Read the sheet's rows into a list of dictionaries rows = [] for row in sheet.rows(): rows.append([item.v for item in row]) # Create a DataFrame from the list of dictionaries df = pd.DataFrame(rows) # Print the DataFrame print(df) 

Replace 'path/to/your/file.xlsb' with the actual path to your XLSB file, and update sheet_name with the name or index of the sheet you want to read.

The code uses the pyxlsb library to open the XLSB file, retrieve the specified sheet, and read its rows into a list of dictionaries. Finally, it converts the list of dictionaries into a pandas DataFrame.

Please note that the pyxlsb library may have limitations compared to reading regular Excel files (XLSX) due to the complexity of binary Excel formats. If your use case involves more advanced manipulations, you might consider converting the XLSB file to XLSX format using Microsoft Excel or other tools, and then reading it with pandas using pd.read_excel().

Examples

  1. Read XLSB File into Pandas DataFrame

    • This snippet demonstrates how to read an XLSB file into a Pandas DataFrame.
    !pip install pyxlsb 
    import pandas as pd from pyxlsb import open_workbook # Open the XLSB file and read the specified sheet into a DataFrame with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # First row as columns print(df.head()) 
  2. Read Specific Sheet from XLSB File with Pandas

    • This snippet shows how to read a specific sheet from an XLSB file into a Pandas DataFrame.
    import pandas as pd from pyxlsb import open_workbook # Read the second sheet (1-based index) with open_workbook("example.xlsb") as wb: with wb.get_sheet(2) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Exclude the header row print("Data from Sheet 2:", df.head()) 
  3. Read XLSB File with openpyxl and Pandas

    • This snippet demonstrates reading XLSB files with openpyxl and loading them into Pandas DataFrame.
    import pandas as pd from openpyxl import load_workbook # Load the workbook and read the data workbook = load_workbook("example.xlsb") sheet = workbook.active data = [] for row in sheet.iter_rows(values_only=True): data.append(list(row)) df = pd.DataFrame(data[1:], columns=data[0]) # Use first row as headers print("DataFrame:", df.head()) 
  4. Read XLSB File with pyxlsb and Handle Dates Correctly

    • This snippet shows how to handle dates properly when reading from an XLSB file.
    import pandas as pd from pyxlsb import open_workbook from datetime import datetime, timedelta # Read the XLSB file with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Convert Excel date to Python datetime (Excel date 0 is 1899-12-30) excel_date = df['DateColumn'][0] python_date = datetime(1899, 12, 30) + timedelta(days=excel_date) print("Converted date:", python_date) 
  5. Read Multiple Sheets from XLSB into Pandas

    • This snippet demonstrates how to read multiple sheets from an XLSB file into separate Pandas DataFrames.
    import pandas as pd from pyxlsb import open_workbook sheet_names = ["Sheet1", "Sheet2"] dataframes = {} with open_workbook("example.xlsb") as wb: for index, sheet_name in enumerate(sheet_names, start=1): with wb.get_sheet(index) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) dataframes[sheet_name] = df # Display the data from each sheet for sheet_name, df in dataframes.items(): print(f"Data from {sheet_name}:", df.head()) 
  6. Read XLSB File and Filter Data with Pandas

    • This snippet demonstrates how to read an XLSB file and filter the DataFrame based on certain conditions.
    import pandas as pd from pyxlsb import open_workbook with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Filter rows where the value in "Column1" is greater than 50 filtered_df = df[df["Column1"] > 50] print("Filtered DataFrame:", filtered_df.head()) 
  7. Read XLSB File with Specified Columns

    • This snippet shows how to read specific columns from an XLSB file into a Pandas DataFrame.
    import pandas as pd from pyxlsb import open_workbook with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Select specific columns to create a new DataFrame selected_columns = ["Column1", "Column2"] df_selected = df[selected_columns] print("DataFrame with selected columns:", df_selected.head()) 
  8. Read XLSB File and Handle Missing Data in Pandas

    • This snippet demonstrates how to handle missing data when reading from an XLSB file.
    import pandas as pd from pyxlsb import open_workbook with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Fill NaN with default values df_filled = df.fillna(0) # Replace NaN with 0 print("DataFrame with NaN filled:", df_filled.head()) 
  9. Read XLSB File and Pivot Data in Pandas

    • This snippet demonstrates how to create a pivot table in Pandas from data read from an XLSB file.
    import pandas as pd from pyxlsb import open_workbook with open_workbook("example.xlsb") as wb: with wb.get_sheet(1) as sheet: data = [] for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data[1:], columns=data[0]) # Create a pivot table pivot_df = df.pivot_table(index="Column1", columns="Column2", values="Column3", aggfunc="sum") print("Pivot Table:", pivot_df) 

More Tags

ssl ncdf4 mysql-error-1170 bitset digital-persona-sdk applescript points roi increment feature-selection

More Python Questions

More Chemical thermodynamics Calculators

More Entertainment Anecdotes Calculators

More Physical chemistry Calculators

More Investment Calculators