Skip to content

pd.read_json(file, lines=True) does not work if json has quotes inside it #15132

@Bigbrd

Description

@Bigbrd

Code Sample, a copy-pastable example if possible

{"errors":["This check-in does not exist, it may have been deleted."]}, {"list":{"id":487004,"description":"foo.”\r\n\r\n* “I am aware that I’m drafting an email responding to a complaint.”\r\n\r\n* “I am aware that I’m wondering who will win.”\r\n\r\nThe great thing about this exercise is that it is generalizable. You practice during the meditation, but then you use it for your own goals during your day to day."..........} INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 32.3.1 Cython: None numpy: 1.11.3 scipy: 0.17.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None

Problem description

List has quotes inside the json data. Expected to read this data line by line, but we get a UnicodeDecodeError at the position of that inner quote in the description

Expected Output

read successful

Output:

Traceback (most recent call last):
data = pd.read_json(fileName, lines=True)
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 275, in r
ead_json
json = u'[' + u','.join(lines) + u']'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4924: ordina
l not in range(128)

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizeUnicodeUnicode strings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions