Skip to content

Rounding errors in to_dataframe #444

@bemoody

Description

@bemoody

The wfdb.Record.to_dataframe function generates a DataFrame from a Record object. The index of the resulting DataFrame is the elapsed or absolute time of each sample.

This code, however, will have significant rounding errors over a long record:

 if self.base_datetime is not None: index = pd.date_range( start=self.base_datetime, periods=self.sig_len, freq=pd.Timedelta(seconds=1 / self.fs), ) else: index = pd.timedelta_range( start=pd.Timedelta(0), periods=self.sig_len, freq=pd.Timedelta(seconds=1 / self.fs), ) 

For example:

$ python3 >>> import wfdb >>> r = wfdb.rdrecord('81739927', pn_dir='mimic4wdb/0.1.0/waves/p100/p10014354/81739927') >>> str(r.base_datetime) '2148-08-16 09:00:17.566000' >>> r.fs 62.4725 >>> r.sig_len 6661120 >>> r.to_dataframe() I II III V aVR Pleth Resp 2148-08-16 09:00:17.566000 NaN NaN NaN NaN NaN NaN -0.751374 2148-08-16 09:00:17.582007 NaN NaN NaN NaN NaN NaN -0.751374 2148-08-16 09:00:17.598014 NaN NaN NaN NaN NaN NaN -0.751374 2148-08-16 09:00:17.614021 NaN NaN NaN NaN NaN NaN -0.751374 2148-08-16 09:00:17.630028 NaN NaN NaN NaN NaN NaN -0.751374 ... .. ... ... ... ... ... ... 2148-08-17 14:37:22.033805 NaN -0.220 -0.285 -0.025 NaN 0.404297 0.487477 2148-08-17 14:37:22.049812 NaN -0.030 0.005 0.025 NaN 0.396484 0.530238 2148-08-17 14:37:22.065819 NaN -0.065 -0.030 -0.015 NaN 0.386475 0.574832 2148-08-17 14:37:22.081826 NaN -0.265 -0.255 -0.125 NaN 0.375977 0.621258 2148-08-17 14:37:22.097833 NaN -0.550 -0.610 -0.355 NaN 0.366211 0.664020 [6661120 rows x 7 columns] >>> str(r.get_absolute_time(6661119) '2148-08-17 14:37:22.384920' $ wfdbtime -r mimic4wdb/0.1.0/waves/p100/p10014354/81739927/ s6661119 s6661119 29:37:04.819 [14:37:22.385 17/08/2148] 

Here, get_absolute_time is correct to the nearest microsecond and the wfdbtime command is correct to the nearest millisecond. to_dataframe, however, is off by 0.287 seconds.

I think this would be avoided by using start and end arguments to date_range or timedelta_range, rather than using start and freq.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions