Python Forum
painfully slow runtime when handling data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
painfully slow runtime when handling data
#1
Hi everyone,

I am new to Python, and just completed some codes as below that take forever to run. It seemed working if I only run the first 10 rows in the main DataFrame, I hope someone can assist me in improving the codes, so that it can run much faster. Many thanks

from datetime import timedelta import numpy as np import pandas as pd # Get data from the given excel file filepath = r'C:\Users\User\Desktop\Power BI\All.xlsx' df_CoC = pd.read_excel(filepath, usecols=['ID, 'Is there time loss?', 'MC Capacity Date From', 'MC Capacity Date To'], sheet_name='CoC') #18000 rows df_WRS = pd.read_excel(filepath, usecols=['ID', 'Date From', 'Date To'], sheet_name='WRS') #22000 rows df_Open = pd.read_excel(filepath, sheet_name='Open') #2500 rows Count_Open = df_Open['ID'].count() df_Open['CoC_MissingWRS'] = None df_Open['WRS_MissingCoC'] = None ####### Fill new 'CoC_MissingWRS' column in open query ######## for i in range(0, Count_Open): if i > 90: break imn = df_Open.loc[i, 'Injury Management: Injury Management Number'] #Fill WRS date array arrWRSDates = np.array([]) filtered_df = df_WRS[df_WRS['ID'] == imn] if not filtered_df.empty: Count_WRS = filtered_df['ID'].count()-1 for w in range(0, Count_WRS): #Date_From is always available Date_From = filtered_df.iloc[w]['Date From'].date() if filtered_df.iloc[w]['Date To'] is None: Date_To = Date_From else: Date_To = filtered_df.iloc[w]['Date To'].date() if Date_To < Date_From: Date_To = Date_From adate = Date_From while adate <= Date_To: if adate not in arrWRSDates: arrWRSDates = np.append(arrWRSDates, adate) adate += timedelta(days=1) # Fill Timeloss & all CoC date array arrTLCoCDates = np.array([]) arrAllCoCDates = np.array([]) filtered_df = df_CoC[df_CoC['ID'] == imn] if not filtered_df.empty: Count_CoC= filtered_df['ID'].count() - 1 for c in range(0, Count_CoC): Date_From = filtered_df.iloc[c]['MC Capacity Date From'].date() if filtered_df.iloc[c]['MC Capacity Date To'] is None: Date_To = Date_From else: Date_To = filtered_df.iloc[c]['MC Capacity Date To'].date() if Date_To < Date_From or Date_To is None: Date_To = Date_From adate = Date_From while adate <= Date_To: if df_CoC.iloc[c]['Is there time loss?'] == 'Yes': if adate not in arrTLCoCDates: arrTLCoCDates = np.append(arrTLCoCDates, adate) else: if adate not in arrAllCoCDates: arrAllCoCDatesCoCDates = np.append(arrAllCoCDates, adate) adate += timedelta(days=1) ##### Check if CoC missing WRS ##### bMissingWRS = 'No' if len(arrTLCoCDates) > 0: adate = min(arrTLCoCDates) while adate <= max(arrTLCoCDates) and bMissingWRS == 'No': if adate not in arrWRSDates and adate.weekday() < 5: bMissingWRS = 'Yes' df_Open.loc[i, 'CoC_MissingWRS'] = bMissingWRS ##### Check if WRS missing CoC ##### bMissingCoC = 'No' if len(arrWRSDates) > 0: adate = min(arrWRSDates) while adate <= max(arrWRSDates) and bMissingCoC == 'No': if adate not in arrAllCoCDates and adate.weekday() < 5: bMissingCoC = 'Yes' df_Open.loc[i, 'WRS_MissingCoC'] = bMissingCoC
Reply
#2
Where is it slow?
Suggest you put sections into cells in a Jupyter notebook and execute the code segments individually. That should narrow it down. Then we can more likely help
Reply
#3
Hi jefsummers,

Thank you for your advice, I tried to run the code in Power BI and PyCharm Community, I never got to see the result, unless I run it for the first 10 rows or something. I will look Jupyter notebook up, and will get back to you. Thank you again.
Reply
#4
The way you do it will be very slow,every time write a loop in Pandas it often the wrong approch.
The way Pands is build should try to use Vectorization it will be a lot faster,look at this blog post.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract args=, value that was passed to Multiprocessing.Proc object during runtime? haihal 1 1,089 Dec-08-2024, 07:04 AM
Last Post: Gribouillis
  change dataclass to frozen at runtime jpanico 2 2,299 Oct-06-2024, 09:38 AM
Last Post: snippsat
  class and runtime akbarza 4 2,390 Mar-16-2024, 01:32 PM
Last Post: deanhystad
  Big O runtime nested for loop and append yarinsh 4 3,886 Dec-31-2022, 11:50 PM
Last Post: stevendaprano
Star python exception handling handling .... with traceback mg24 3 4,364 Nov-09-2022, 07:29 PM
Last Post: Gribouillis
  Reducing runtime memory usage in Cpython interpreter david_the_graower 2 3,553 Oct-18-2021, 09:56 PM
Last Post: david_the_graower
  Object reference in Dict - not resolved at runtime benthomson 2 3,122 Apr-02-2020, 08:50 AM
Last Post: benthomson
  PyCharm asking for VC++ runtime 14.0 whereas I have already installed VC++ 19.0 SarmadiRizvi 1 2,912 Apr-02-2020, 06:17 AM
Last Post: snippsat
  EOFError: EOF when reading a line - Runtime Error RavCOder 6 12,798 Sep-27-2019, 12:22 PM
Last Post: RavCOder
  Different runtime programs mhvozdesky 1 66,968 Oct-24-2018, 02:50 PM
Last Post: ichabod801

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.