Skip to content

Conversation

@Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Jul 4, 2024

I've recently needed to read files from buffers rather than files (streamed from file selection pickers). I've opened up a Pull Request with a quick fix allowing a buffer override: #491

However, a simple buffer override is suboptimal. Ideally, there are as few diverging paths in the code as possible. Additionally, it makes sense to separate concerns of parsing and I/O, because being agnostic to the buffer source enables flexibility and decreases complexity on both sides. Hence, merging this PR is much preferred to #491.

This Pull Request implements my recommendation for a restructure of rdrecord, rdheader and rdann. rdann was easy to modify, but rdrecord and rdheader's ability to read multi-segment records does not easily translate to reads using streams. This results in code that isn't as nice as I was hoping - so someone needing to read data from buffers must therefore implement surrounding functionality, such as pre-reading the header or acting on multi-segment records, themselves. Still, streamlining the functions addressed in this PR already facilitate accomplishing this goal at all, without needing to re-write all of the parsing functionality of wfdb.

Before Merging

This pull request needs:

  • Unit Tests passing
  • Updated documentation

Concerns and observations

  • Previously, the header parser differentiated between remote and local headers. The new implementation makes it impossible to distinguish between the two. I would appreciate input on whether this differentiation was intentional. If so, we can allow an argument to _rdheader specifying the encoding.
    • Remote headers were decoded using iso-8859-1
    • Local headers were decoded using ascii
    • I'm also open to _rdheader accepting a pre-decoded string, since the file is plaintext. I advocate a binary input though to avoid inconsistent coding behavior.
  • rdheader previously ignored file read errors for local files. I would like input on whether that was intentional, because a file read error could result in corrupted file reads.
  • rdheader previously allowed passing a directory even if pn_dir was supplied. In this case, the file_name directory was cut from the record name and silently ignored. I don't think this is the right call, so I cut this functionality. Let me know if there's a reason to keep it.
  • Some buffering arguments are, as of now, removed. It should be easy to re-introduce them into the new enclosing functions, if needed.
  • This PR removes a lot of now unneeded private functions. While it's unlikely (and discouraged due to internality) they were used before, it could break dependents relying on them.
  • The no_file parameter could be deprecated, instead using a None test against sig_data. This removes a potential failure case because it's unlikely someone passing sig_data doesn't intend it to be used.
@Ivorforce Ivorforce force-pushed the read-stream-streamlined branch 2 times, most recently from e0f7da4 to 52054f2 Compare July 4, 2024 15:02
@Ivorforce Ivorforce force-pushed the read-stream-streamlined branch 3 times, most recently from 25c3704 to e6277a4 Compare July 4, 2024 15:17
@Ivorforce Ivorforce changed the title Streamline file reading to use buffers Streamline file reading to use io streams Jul 4, 2024
@Ivorforce Ivorforce force-pushed the read-stream-streamlined branch 7 times, most recently from d7fc7f0 to 9de2d92 Compare July 30, 2024 16:50
…d annotation files. These methods all now stack between the public functions, handling mainly i/o, and the parsing functions.
@Ivorforce Ivorforce force-pushed the read-stream-streamlined branch from 9de2d92 to 7a83307 Compare July 30, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant