Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Too big CSV file management
#1
Hey!

I am pretty new to "pandas" in python and I would like to ask for some help. I don't think it's complicated, I just can't figure it out. I have a huge CSV file (around 2 gigabytes, 4,4 million lines), excel cant open it fully. There is a very small part of it that I need, and everything else could be deleted. I only need the rows where "PUBLIC LIMITED COMPANY" or "PLC" appears as a substring in column A (I need the whole row where it does appear). These could be added to a new csv/excel file or it could be done in a way that everything else gets deleted in this one besides the ones we need. The filename is "AllCompanies.csv".

Thank you for your help!
Reply
#2
Here's an outline of the cod you would want:

with open('AllCompanies.csv') as in_file: with open('PLCCompanies.csv', 'w') as out_file: for line in in_file: if line_matches_criteria: out_file.write(line)
The for loop will read the file one line at a time, so it doesn't clog your memory.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
maybe you can cobvert it to sqlite (-x $'\t' means delimiter tab) with csv-to-sqlite

csv-to-sqlite -x $'\t' -f /path/to/file.csv -o /path/to/file.db
Reply
#4
That was so simple I started to wonder how that not came to my mind lol. Thank you, it worked like a charm. Had to add UTF8 encoding to it ,in the end it looked like this:

with open('AllCompanies.csv', encoding="utf-8") as in_file: with open('PLCCompanies.csv', 'w', encoding="utf-8") as out_file: for line in in_file: if "PLC" in str(line) or "PUBLIC LIMITED COMPANY" in str(line): out_file.write(line)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  code management trix 3 1,968 Oct-23-2023, 05:29 PM
Last Post: buran
  Can Python Do This? Asset Management mbaker_wv 4 3,842 Oct-28-2020, 01:37 PM
Last Post: mbaker_wv
  User management library? MuntyScruntfundle 0 2,181 Jan-14-2020, 02:01 PM
Last Post: MuntyScruntfundle
  flux management chris_thibault 3 4,327 Sep-10-2018, 10:23 AM
Last Post: chris_thibault

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.