Python Forum
formatting string and returning as geojson
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
formatting string and returning as geojson
#1
I have records like this (I cannot alter the origional dataset). The Primary key is always 64 characters, followed by a 3 letter code, three sets of WGS84 coordinates (lat,long,altitutde), then a site name, in this case De Vrijheid.

('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid')

First I try to format this:

 cur.execute( "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'") print("Rows returned: ", cur.rowcount) dashmap=(66, 3, 12, 12, 12, ) row = cur.fetchone() while row is not None: #formatting each row data = row data = str(data).encode('latin1').decode('unicode_escape').replace("', '","") data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data) data = str(data) data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",")
This returns
['b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo', '52.764285201', '4.9228121084', '000000000000']

I get geojson this way.
geodata = geojson.Point((float(data[3]), float(data[2])))
And write to file

with open('points.json', 'a') as outfile: geojson.dump(geodata, outfile, indent=2) outfile.close()
And finish of
 cur.close() except (Exception, psycopg2.DatabaseError) as error: print(error) # execute a statement print('PostgreSQL database version:') cur.execute('SELECT version()') # display the PostgreSQL database server version db_version = cur.fetchone() print(db_version) # close the communication with the PostgreSQL cur.close() except (Exception, psycopg2.DatabaseError) as error: print(error) finally: if conn is not None: conn.close() print('Database connection closed.')
This gives a geojson file formatted like this, which is not geojson, it seems like it is writing one line at a time as an induvidual point not a multipoint file. I'm a noob to python, so any amendments/alternative solutions are welcome.

{ "type": "Point", "coordinates": [ 45.506527899, 12.081455199 ] }{ "type": "Point", "coordinates": [ 25.4094136, 11.9158859 ] }
Reply
#2
could you post your full code (at least part after executing the sql) as well as sample data (few points) as returned by the database.
Reply
#3
The data straight from the db
read in to data
type(data) tuple
Connecting to the PostgreSQL database... Rows returned: 17 ('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test') ('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test') ('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova') ('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo') ('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000') ('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000') ('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000') ('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!') ('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid') ('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker') ('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000') ('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test') ('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000') ('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000') ('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man') ('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova') ('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test') Database connection closed.
The full code
#!/usr/bin/python import psycopg2 from config import config import re import geojson import tempfile def get_vendorfield(): """ Connect to the PostgreSQL database server """ global cur, data conn = None try: # read connection parameters params = config() # connect to the PostgreSQL server print('Connecting to the PostgreSQL database...') conn = psycopg2.connect(**params) # create a cursor cur = conn.cursor() cur.execute( "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'") print("Rows returned: ", cur.rowcount) dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length row = cur.fetchone() while row is not None: #formatting each row - this feels over complicated data = row data = str(data).encode('latin1').decode('unicode_escape').replace("', '","") data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data) data = str(data) data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",") with open('points.json', 'a') as outfile: geojson.dump(geodata, outfile, indent=2) outfile.close() cur.close() except (Exception, psycopg2.DatabaseError) as error: print(error) # execute a statement print('PostgreSQL database version:') cur.execute('SELECT version()') # display the PostgreSQL database server version db_version = cur.fetchone() print(db_version) # close the communication with the PostgreSQL cur.close() except (Exception, psycopg2.DatabaseError) as error: print(error) finally: if conn is not None: conn.close() print('Database connection closed.')
Reply
#4
something like this (note that I'm not able to test it)

#!/usr/bin/python import psycopg2 from config import config import re import geojson import tempfile def parse_row(row): # extract lat, lon # you may need to round the float number to # correct number of decimal digits lon = float(row[3:15]) lat = float(row[15:27]) return (lon, lat) def get_vendorfield(): """ Connect to the PostgreSQL database server """ # read connection parameters params = config() # connect to the PostgreSQL server print('Connecting to the PostgreSQL database...') with psycopg2.connect(**params) as conn: # create a cursor with conn.cursor() as cur: try: cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'") print("Rows returned: ", cur.rowcount) #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall() except psycopg2.DatabaseError as db_error: print(error) geo_data = [] print('Database connection closed.') if geo_data: multipoint_json = geojson.Multipoint(geo_data) with open('points.json', 'w') as outfile: geojson.dump(multipoint_json, outfile, indent=2)
My understanding was you want geojson.Multipoint file
there are number of things you need to consider, e.g. N/S E/W of the coordinates (usually it is represented with -) does it affect the length? also decimal precision - you may need to round the float number for lon/lat to correct precision)
Reply
#5
I'm trying to adapted your code a little to include properties.
I have added the altitude data to the geojson

Also there is an altitude, row[1] alt []

#!/usr/bin/python import psycopg2 from config import config import re import geojson from geojson import Feature, Point import tempfile def parse_row(row): # extract lat, lon, alt # you may need to round the float number to # correct number of decimal digits lon = float(row[3:15]) lat = float(row[15:27]) alt = float(row[28:39]) return (lon, lat, alt) def get_vendorfield(): """ Connect to the PostgreSQL database server """ # read connection parameters params = config() # connect to the PostgreSQL server print('Connecting to the PostgreSQL database...') with psycopg2.connect(**params) as conn: # create a cursor with conn.cursor() as cur: try: cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'") print("Rows returned: ", cur.rowcount) geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall() except psycopg2.DatabaseError as db_error: print(re.error) geo_data = [] print('Database connection closed.') print(geo_data) if geo_data: feature_json = geojson.Feature(geometry = geo_data) #Feature(geometry=my_point, properties={"country": "Spain"}) #{"geometry": {"coordinates": [-3.68..., 40.4...], "type": "Point"}, "id": null, "properties": {"country": "Spain"}, "type": "Feature"} with open('points.json', 'w') as outfile: geojson.dump(feature_json, outfile, indent=2)
I'm having problems returning the geojson as a feature class, I get a TypeError: list indices must be integers or slices, not list.

Once this is achieved I plan the following:
The row[0] is a 64 character primary key, this can be the "id" of the geojson
In row[1] there are the code [0:2] and the site_name, this is the rest of the line from character 41 (to max 64) <-- I guess this might be solved through a (len(row) - 39) to establish the length of the remaining characters?

I'm thinking I will need to return the data like so:
def parse_key(row): #extract key key = str(row[0:64]) return (key) def parse_data(row): #extract three letter code i.e. geo and the site name code = str(row[0:2]) site_name = str(row[3:4]) return (code, site_name)
and them amend geo_data = [parse_row(row[1]) for row in cur] to

geo_data = [parse_row(row[1]) for row in cur] data = [parse_data(row[1]) for row in cur] key = [parse_key(row[0]) for row in cur]
Secondly I will try to incorporate the properties values including the key to the geojson id.
Thank you for your guidance.
Reply
#6
key is just row[0]
parse row[1] and return dict (for readability). Or you may return Feature object
For Feature class I need to do some reading, but I will do tomorrow as it is late here. What properties shall Feature has?
Do you plan on using Feature collection?
By the way, I think Point does not have elevation property. What about Multipoint? did you drop the idea?
Reply
#7
#!/usr/bin/python import psycopg2 from config import config import geojson def parse_row(row): # extract data from row, return geojson.Feature key = row[0] lon = float(row[1][3:15]) lat = float(row[1][15:27]) if len(row[1]) > 39: elevation = float(row[1][27:39]) descr = row[1][39:] else: elevation = float(row[1][27:]) descr = None return geojson.Feature(geometry=geojson.Point((lon, lat)), id=key, properties={'elevation':elevation, 'description':descr}) def db_query(): """ Connect to the PostgreSQL database server """ # read connection parameters params = config() # connect to the PostgreSQL server print('Connecting to the PostgreSQL database...') with psycopg2.connect(**params) as conn: # create a cursor with conn.cursor() as cur: try: cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'") print("Rows returned: ", cur.rowcount) #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall() except psycopg2.DatabaseError as db_error: print(error) geo_data = [] finally: print('Database connection closed.') return geo_data def write_to_file(geo_data): with open('points.json', 'w') as outfile: geojson.dump(geo_data, outfile, indent=2) def test_data(): """Just to return test data""" data = (('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test'), ('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test'), ('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova'), ('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo'), ('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000'), ('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000'), ('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000'), ('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!'), ('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid'), ('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker'), ('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000'), ('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test'), ('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000'), ('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000'), ('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man'), ('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova'), ('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test')) return data if __name__ == '__main__': my_data = db_query() my_data = test_data() #just for test, comment out this line in production if my_data: my_feature_collection = geojson.FeatureCollection([parse_row(row) for row in my_data]) write_to_file(my_feature_collection)
output with test data
Reply
#8
I would like MultiPoint but with properties, I couldn't see MultiPoint with properties, so choose feature class http://python-geojson.readthedocs.io/en/...multipoint

Altitude can be included https://bl.ocks.org/fogonwater/340afe26d...0f73eb10f2

So mainly, I want to return the data row[0] (the key as id) and add the 'geo' (the code) and the site_name as two properties, e.g code: 'geo'. I'll look into dictionaries.
Reply
#9
look into my last post, I implemented Feature and FeaturesCollection. You need to adjust parse_row function to include the elevation to Point and then code to properties. as you can see I already added elevation to properties, which is one of the approaches in your link
if you have trouble, I'll look into it tomorrow.
here is the new parse_row


def parse_row(row): # extract lat, lon # you may need to round the float number to # correct number of decimal digits key = row[0] code=row[1][:3] lon = float(row[1][3:15]) lat = float(row[1][15:27]) elevation = float(row[1][27:39]) descr = row[1][39:] return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key, properties={'code':code, 'elevation':elevation, 'description':descr})
new output with test data

Reply
#10
I tested using the test_data()
As you know that works fine, but when I use db_query() instead it throws an error
ValueError: could not convert string to float

I went through the debugging (step through) and in the def parse_row(row)
key: 'g'
code: 'e'
at geo_data = [parse_row(row[1]) for row in cur]
row: <class 'tuple'>: ('9fde ... 246b', 'geo12.08 ... 45.50 ... 0.000000test')

So it looks like something is going wrong in the following, for some reason the key (row[0])is being ignored and the row[1] is behaving strangely, perhaps tuples behave differently? I've spent today looking into this, but I'm struggling to find the solution, strange that it works for the test_data() but not the database data. Should I add I'm using Python 3?

def parse_row(row): # extract lat, lon # you may need to round the float number to # correct number of decimal digits key = row[0] code=row[1][:3] lat = float(row[1][3:15]) lon = float(row[1][15:27]) elevation = float(row[1][27:39]) descr = row[1][39:] return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key, properties={'code':code, 'elevation':elevation, 'description':descr})
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Formatting DateTime string and and converting it from AM/PM to 24 hours tester_V 2 1,864 Jun-08-2024, 05:16 PM
Last Post: tester_V
  Formatting a date time string read from a csv file DosAtPython 5 6,974 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  String formatting (strptime) issues Henrio 2 2,538 Jan-06-2023, 06:57 PM
Last Post: deanhystad
  geojson to json --missing multiple row output yoshi 9 6,693 Mar-06-2022, 08:34 PM
Last Post: snippsat
  confused about string formatting barryjo 7 4,318 Mar-06-2022, 02:03 AM
Last Post: snippsat
  string formatting barryjo 7 4,452 Jan-02-2022, 02:08 AM
Last Post: snippsat
  Help with string formatting in classes brthurr 6 14,924 Dec-17-2021, 04:35 PM
Last Post: Jeff900
  Question on HTML formatting with set string in message Cknutson575 3 5,698 Mar-09-2021, 08:11 AM
Last Post: Cknutson575
  smtplib: string formatting not carrying over to email ClassicalSoul 1 4,172 Apr-22-2020, 09:58 PM
Last Post: bowlofred
  String formatting difficulties mmk1995 3 4,344 Aug-09-2019, 11:18 AM
Last Post: wavic

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.