Posts: 170 Threads: 43 Joined: May 2019 So im playing around with parsing a json file in python. Im able to read in the file and print it to the console, but now i want to extract 3 values from each "section" not sure what the proper terminology is. Here is a example of the data structure..: "messages": [ { "sender_name": "Me", "timestamp_ms": 1653260883178, "content": "There are plenty of leftovers", "type": "Generic", "is_unsent": false, "is_taken_down": false, "bumped_message_metadata": { "bumped_message": "There are plenty of leftovers", "is_bumped": false } }, { "sender_name": "Me", "timestamp_ms": 1653260872966, "content": "Watching the new scream movie", "type": "Generic", "is_unsent": false, "is_taken_down": false, "bumped_message_metadata": { "bumped_message": "Watching the new scream movie", "is_bumped": false } },I basically need to pull out only the first 3 sets of values and save it into a CSV file. "sender_name": "Me", "timestamp_ms": 1653260883178, "content": "There are plenty of leftovers", Right now i have this basic simple code, but need to figure out how to get within the "message" section and pull out those 3 values per group import json f = open('message_1.json') data = json.load(f) for i in data['messages']: print(i) f.close() Posts: 1,144 Threads: 114 Joined: Sep 2019 Posts: 6,920 Threads: 22 Joined: Feb 2020 import json from datetime import datetime json_str = """ { "messages": [ { "sender_name": "Me", "timestamp_ms": 1653260883178, "content": "There are plenty of leftovers", "type": "Generic", "is_unsent": false, "is_taken_down": false, "bumped_message_metadata": { "bumped_message": "There are plenty of leftovers", "is_bumped": false } }, { "sender_name": "Me", "timestamp_ms": 1653260872966, "content": "Watching the new scream movie", "type": "Generic", "is_unsent": false, "is_taken_down": false, "bumped_message_metadata": { "bumped_message": "Watching the new scream movie", "is_bumped": false } } ] } """ data = json.loads(json_str) for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) print( f"""{timestamp} from {message["sender_name"]}\n{message["content"]}\n""" )Output: 2022-05-22 18:08:03.178000 from Me There are plenty of leftovers 2022-05-22 18:07:52.966000 from Me Watching the new scream movie Posts: 170 Threads: 43 Joined: May 2019 So here is what i have and seems to work, now im trying to save this to a CSV so i can test importing it into my excel report import json from datetime import datetime f = open('messages.json') data = json.load(f) for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) if 'content' not in message: print( f"""{timestamp} from {message["sender_name"]}\n""" ) else: print( f"""{timestamp} from {message["sender_name"]}\n{message["content"]}\n""" ) f.close() Posts: 170 Threads: 43 Joined: May 2019 May-25-2022, 07:38 PM (This post was last modified: May-25-2022, 07:38 PM by cubangt.) What am i doing wrong? import json from datetime import datetime import pandas as pd f = open('message_1.json') data = json.load(f) for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) if 'content' not in message: rw = pd.DataFrame([timestamp,message["sender_name"],pd.NA], columns=['Date', 'Name', 'Comment']) else: rw = pd.DataFrame([timestamp,message["sender_name"], message["content"]], columns=['Date', 'Name', 'Comment']) rw.to_csv('igMess.csv',columns=["Date", "Name", "Comment"], header=None, index=None, mode='a') f.close()I get this error: Error: ValueError: Shape of passed values is (3, 1), indices imply (3, 3) Posts: 170 Threads: 43 Joined: May 2019 Ok got past the error and a file generated, BUT not sure how to split out the timestamp so that i have a date and a time separated in the csv import json from datetime import datetime import pandas as pd f = open('message_1.json') data = json.load(f) lv = [] for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) date_val = timestamp.strftime('%Y-%m-%d') if 'content' not in message: st = date_val +","+message["sender_name"]+","+"" lv.append(st) else: st = date_val +","+message["sender_name"]+","+message["content"] lv.append(st) df = pd.DataFrame(lv) df.to_csv('igMess.csv', header=None, index=None, mode='a') f.close()the file that was generated when the above was run produced this output: "2022-05-22,Me,There are plenty of leftovers" "2022-05-22,Me,Watching the new scream movie" expected results should be like so: 5/17/22, 5:28 PM,Me: There are plenty of leftovers 5/17/22, 5:28 PM,Me: Watching the new scream movie If you notice, the generated results have "" around each row and missing the 5:28 PM time.. Posts: 170 Threads: 43 Joined: May 2019 ok got the time added and working, so now the only question is how to remove the " " around each row in the file here is the currently working code: import json from datetime import datetime import pandas as pd f = open('message_1.json') data = json.load(f) lv = [] for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) date_val = timestamp.strftime('%Y-%m-%d') time_val = timestamp.strftime("%I:%M %p") if 'content' not in message: st = date_val + "," + time_val + "," + message["sender_name"] + "," + "" lv.append(st) else: st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"] lv.append(st) df = pd.DataFrame(lv) df.to_csv('igMess.csv', header=None, index=None, mode='a') f.close() Posts: 170 Threads: 43 Joined: May 2019 So i have been running this a few times since the above post and found a few things, im hoping i can fix in the above code. So i noticed that if a message is very large that it gets split up in my csv file., i only want my csv to have 4 columns Here is the current code the does work, just needs some adjustments to make sure my "content" column is all inclusive and not split out. When i ran this code today against the newest json file, i found data in 4 or 6 other columns, basically had data for certain rows spread across columns A thru M import json from datetime import datetime import pandas as pd import os import csv f = open('message_1.json') data = json.load(f) lv = [] for message in data["messages"]: timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000) lv.append([ timestamp.strftime("%m/%d/%Y"), timestamp.strftime("%I:%M:%S %p"), message["sender_name"], message["content"] if "content" in message else "Media Link"]) df = pd.DataFrame(lv, columns=["Date", "Time", "Sender", "Content"]) df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, escapechar=",", mode='a') f.close() Posts: 6,920 Threads: 22 Joined: Feb 2020 What are all the possible keys that contain content values? How should the content values be combined? |