Skip to content

Conversation

TomBaxter
Copy link
Member

Ticket

SVCS-531

Purpose

[SVCS-531] Improve tabular renderer to handle more TSV cases

Changes

mfr/extensions/tabular/libs/init.py
mfr/extensions/tabular/libs/stdlib_tools.py
mfr/extensions/tabular/settings.py

  • separate tsv_stdlib from csv_stdlib
  • improve exceptions

tests/extensions/tabular/files/invalid_null.csv
tests/extensions/tabular/test_stdlib_tools.py

  • add tests

tests/extensions/ipynb/files/no_metadata.ipynb - quiet test (incidental)

tests/extensions/zip/test_renderer.py - quiet test (incidental)

Side effects

Possible increase in system resource use due to full file being sniffed

QA Notes

In addition to the tsv files found in https://osf.io/pexrv/?view_only=af771fef666049a9aefa85642230e80a also test json_test.csv attached to ticket.
This ticket only effects .tsv and .csv files rendered in MFR.

Deployment Notes

None

AddisonSchiller and others added 4 commits November 22, 2017 13:12
Csv.sniff could cause random characters or spaces to be used as the delimiter. Separating these functions and using a hard coded dialect fixes this display problem.
SVCS-531 Our handling of CVS/TSV files was convoluted and unnecessary, this cleaned up and simplified. Collateral- Updated two tests that were working fine, but producing unnecessary noise in the test logs.
@coveralls
Copy link

coveralls commented Dec 21, 2017

Coverage Status

Coverage increased (+0.5%) to 68.501% when pulling 3048f77 on TomBaxter:feature/SVCS-531 into 8bb2dd4 on CenterForOpenScience:develop.

@cslzchen
Copy link
Contributor

cslzchen commented Jan 5, 2018

@felliott The previous PR was already Phase 2. Do you want to review this or should I do another round of phase 1?

@felliott
Copy link
Member

felliott commented Jan 8, 2018

I'll take it. Phase 2 is good.

@cslzchen
Copy link
Contributor

cslzchen commented Jan 8, 2018

Sounds good! Added Phase 2 label to the PR.

@jonathon-love
Copy link
Contributor

hey guys,

i think you may have some issues with this:

dialect = csv.Sniffer().sniff(fp.read(), ',') 

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR13

i remember having issues with the jamovi csv importer when the data returned didn't end with a complete line. i worked around it thus:

some_data = csvfile.read(4096) if len(some_data) == 4096: # csv sniffer doesn't like partial lines some_data = trim_after_last_newline(some_data) dialect = csv.Sniffer().sniff(some_data, ', \t;') 

and

def trim_after_last_newline(text): index = text.rfind('\r\n') if index == -1: index = text.rfind('\n') if index == -1: index = text.rfind('\r') if index != -1: text = text[:index] return text 

https://github.com/jamovi/jamovi/blob/master/server/jamovi/server/formatio/csv.py#L85

i'd also take exception to this:

# CSVs are always values seperated by commas 

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR11

i don't think excel saves .csv files separated by commas by default

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @jonathon-love, there are some issues with out implementation and we need to double check our assumptions about CSV and TSV files.

Credits goes to @AddisonSchiller and @TomBaxter for the original work. I will take over for the update/fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants