[SVCS-531] Improve tabular renderer to handle more TSV cases #308

TomBaxter · 2017-12-21T19:30:55Z

Ticket

SVCS-531

Purpose

[SVCS-531] Improve tabular renderer to handle more TSV cases

Changes

mfr/extensions/tabular/libs/init.py
mfr/extensions/tabular/libs/stdlib_tools.py
mfr/extensions/tabular/settings.py

separate tsv_stdlib from csv_stdlib
improve exceptions

tests/extensions/tabular/files/invalid_null.csv
tests/extensions/tabular/test_stdlib_tools.py

add tests

tests/extensions/ipynb/files/no_metadata.ipynb - quiet test (incidental)

tests/extensions/zip/test_renderer.py - quiet test (incidental)

Side effects

Possible increase in system resource use due to full file being sniffed

QA Notes

In addition to the tsv files found in https://osf.io/pexrv/?view_only=af771fef666049a9aefa85642230e80a also test json_test.csv attached to ticket.
This ticket only effects .tsv and .csv files rendered in MFR.

Deployment Notes

None

Csv.sniff could cause random characters or spaces to be used as the delimiter. Separating these functions and using a hard coded dialect fixes this display problem.

SVCS-531 Our handling of CVS/TSV files was convoluted and unnecessary, this cleaned up and simplified. Collateral- Updated two tests that were working fine, but producing unnecessary noise in the test logs.

coveralls · 2017-12-21T19:37:38Z

Coverage increased (+0.5%) to 68.501% when pulling 3048f77 on TomBaxter:feature/SVCS-531 into 8bb2dd4 on CenterForOpenScience:develop.

cslzchen · 2018-01-05T19:52:25Z

@felliott The previous PR was already Phase 2. Do you want to review this or should I do another round of phase 1?

felliott · 2018-01-08T14:47:52Z

I'll take it. Phase 2 is good.

cslzchen · 2018-01-08T14:49:53Z

Sounds good! Added Phase 2 label to the PR.

jonathon-love · 2018-01-25T07:03:41Z

hey guys,

i think you may have some issues with this:

dialect = csv.Sniffer().sniff(fp.read(), ',')

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR13

i remember having issues with the jamovi csv importer when the data returned didn't end with a complete line. i worked around it thus:

some_data = csvfile.read(4096) if len(some_data) == 4096: # csv sniffer doesn't like partial lines some_data = trim_after_last_newline(some_data) dialect = csv.Sniffer().sniff(some_data, ', \t;')

and

def trim_after_last_newline(text): index = text.rfind('\r\n') if index == -1: index = text.rfind('\n') if index == -1: index = text.rfind('\r') if index != -1: text = text[:index] return text

https://github.com/jamovi/jamovi/blob/master/server/jamovi/server/formatio/csv.py#L85

i'd also take exception to this:

# CSVs are always values seperated by commas

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR11

i don't think excel saves .csv files separated by commas by default

cslzchen

As mentioned by @jonathon-love, there are some issues with out implementation and we need to double check our assumptions about CSV and TSV files.

Credits goes to @AddisonSchiller and @TomBaxter for the original work. I will take over for the update/fix.

AddisonSchiller and others added 4 commits November 22, 2017 13:12

Separate csv and tsv function and remove use of sniff

617306f

Csv.sniff could cause random characters or spaces to be used as the delimiter. Separating these functions and using a hard coded dialect fixes this display problem.

Add tests

0ff8589

Better error handling

b1083c5

Simplify CVS/TVS handling

3048f77

SVCS-531 Our handling of CVS/TSV files was convoluted and unnecessary, this cleaned up and simplified. Collateral- Updated two tests that were working fine, but producing unnecessary noise in the test logs.

TomBaxter mentioned this pull request Dec 21, 2017

[SVCS-531] Separate csv and tsv function and remove use of sniff #285

Closed

cslzchen added the Final Review label Jan 8, 2018

cslzchen requested changes Jun 18, 2018

View reviewed changes

cslzchen added Add'l Dev and removed Final Review labels Jun 18, 2018

cslzchen self-assigned this Jun 18, 2018

felliott unassigned cslzchen Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SVCS-531] Improve tabular renderer to handle more TSV cases #308

[SVCS-531] Improve tabular renderer to handle more TSV cases #308

Uh oh!

TomBaxter commented Dec 21, 2017

coveralls commented Dec 21, 2017 •

edited

Loading

cslzchen commented Jan 5, 2018

felliott commented Jan 8, 2018

cslzchen commented Jan 8, 2018

jonathon-love commented Jan 25, 2018

cslzchen left a comment

[SVCS-531] Improve tabular renderer to handle more TSV cases #308

Are you sure you want to change the base?

[SVCS-531] Improve tabular renderer to handle more TSV cases #308

Uh oh!

Conversation

TomBaxter commented Dec 21, 2017

Ticket

Purpose

Changes

Side effects

QA Notes

Deployment Notes

coveralls commented Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cslzchen commented Jan 5, 2018

felliott commented Jan 8, 2018

cslzchen commented Jan 8, 2018

jonathon-love commented Jan 25, 2018

cslzchen left a comment

Choose a reason for hiding this comment

coveralls commented Dec 21, 2017 •

edited

Loading