Delimited Values to ROW - Lucky Train ?

karthi_python · (This post was last modified: May-29-2019, 11:28 AM by karthi_python.)

All,

I have an XML file which contains a tag called Relation. This tag will have Custom SQL Query something like below.

<relation connection='oraRLe.1vt9plg0hto4k31akb7rk07dlyaz' name='Custom SQL Query' type='text'>Select i.RL_PROGRAM_ID ,i.RL_INVESTMENT_ID ,p.INVESTMENT_ID||': '||p.INVESTMENT_NAME as "Program" ,p.INVESTMENT_ID as "Program ID" ,p.INVESTMENT_NAME as "Program Name" ,CASE WHEN p.IS_ACTIVE = 1 THEN 'Yes' ELSE 'No' END as "Program Is Active?" ,p.MANAGER_FULL_NAME as "Program Manager" ,p.TECH_OWNER_FULL_NAME as "Program Tech Owner" ,sRBS.RBS_NAME1 as "Program Sponsor Lvl1" ,sRBS.RBS_NAME2 as "Program Sponsor Lvl2" ,sRBS.RBS_NAME3 as "Program Sponsor Lvl3" ,p.MC_SPONSOR_RBS_FULL_PATH as "Program Sponsor Path" ,p.INVESTMENT_RBS_FULL_PATH as "Program Investment RBS" ,p.RD_DEMAND_REGION_DESC as "Program Demand Region" ,p.RD_FUND_SGMNT_DESC as "Program Funding Segment" ,p.ACCOUNTING_REGION_DESC as "Program Accounting Region" From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = 'RD_notinarelease') UNION Select i.RL_PROGRAM_ID ,i.RL_PROGRAM_ID as RL_INVESTMENT_ID ,p.INVESTMENT_ID||': '||p.INVESTMENT_NAME as Program ,p.INVESTMENT_ID as "Program ID" ,p.INVESTMENT_NAME as "Program Name" ,CASE WHEN p.IS_ACTIVE = 1 THEN 'Yes' ELSE 'No' END as "Program Is Active?" ,p.MANAGER_FULL_NAME as "Program Manager" ,p.TECH_OWNER_FULL_NAME as "Program Tech Owner" ,sRBS.RBS_NAME1 as "Program Sponsor Lvl1" ,sRBS.RBS_NAME2 as "Program Sponsor Lvl2" ,sRBS.RBS_NAME3 as "Program Sponsor Lvl3" ,p.MC_SPONSOR_RBS_FULL_PATH as "Program Sponsor Path" ,p.INVESTMENT_RBS_FULL_PATH as "Program Investment RBS" ,p.RD_DEMAND_REGION_DESC as "Program Demand Region" ,p.RD_FUND_SGMNT_DESC as "Program Funding Segment" ,p.ACCOUNTING_REGION_DESC as "Program Accounting Region" From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = 'RD_notinarelease')</relation>

My requirement is to get TABLE and its relevant COLUMN NAMES used in he query. I have 120 XML file like this.

Expected Output:

FileName TableName ColumnName
F1 T1 C1
F1 T1 C2
F1 T2 C1

F2 T1 C1
F2 T2 C1

There are 2 cases:

#1 - If alias found in the column name then we have to find the same alias name after the table name and match it with that particular table.
#2 - If alias not found, then we have to map each and very column to every tables used after the FROM clause ( It is like a cartesian product)
I will take the actual table name & columns name from ALL_TAB_COLUMNS( Oracle) and match with the output point#2 to eliminate the unwanted combinations.

Inputs are welcome!

This is my 2nd post and 2nd task in Python...

Let me tell you what are all the options i explored so far.. I dont know is this the right approach to get the desired result.

Note the below result is somehow ok, but i tried to replace JUNK character, with the help of REPLACE(), But no luck.

from pandas import DataFrame a = DataFrame([{'var1': ' c.Type

 ,c.RL_INVESTMENT_ID
-- ,c.RL_RESOURCE_ID
', 'var2': 1}, {'var1': 'd,e,f', 'var2': 2}]) a= a.replace("
","") a= a.replace("&quot;",'"') def tidy_split(df, column, sep='|', keep=False): """ Split the values of a column and expand so the new DataFrame has one split value per row. Filters rows where the column is missing. Params ------ df : pandas.DataFrame dataframe with the column to split and expand column : str the column to split and expand sep : str the string used to split the column's values keep : bool whether to retain the presplit value as it's own row Returns ------- pandas.DataFrame Returns a dataframe with the same columns as `df`. """ indexes = list() new_values = list() df = df.dropna(subset=[column]) print ("df as ", df) for i, presplit in enumerate(df[column].astype(str)): values = presplit.split(sep) print ("values as ", values) print ("i as" , i) if keep and len(values) > 1: indexes.append(i) new_values.append(presplit) for value in values: indexes.append(i) new_values.append(value) new_df = df.iloc[indexes, :].copy() new_df[column] = new_values print( new_df[column]) def main(): print (a) tidy_split(a, 'var1', sep=',') if __name__ == '__main__': main()

karthi_python · May-30-2019, 06:40 AM

any thoughts pls?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Tab Delimited Strings?	johnywhy	7	3,229	Jan-13-2024, 10:34 PM Last Post: sgrey
	How to fix With n_samples=0, test_size=0.2 and train_size=None, the resulting train s	MrSonoa	2	7,680	Apr-15-2023, 12:02 PM Last Post: MrSonoa
	dataframe write to tab delimited differs from Excel	koh	0	3,057	Aug-01-2021, 02:46 AM Last Post: koh
	How do I split a dataset into test/train/validation according to a particular group?	69195Student	1	3,669	May-12-2021, 08:27 PM Last Post: bowlofred
	Appending data into a file in tab delimited format	metro17	1	6,043	Aug-06-2019, 07:34 AM Last Post: fishhook
	Creating a delimited file from DB Table	anubhav2020	9	11,668	Sep-19-2018, 05:22 PM Last Post: Axel_Erfurt
	Load Comma Delimited csv to Nested Dictionary	Huck	2	10,137	Apr-30-2018, 04:21 PM Last Post: Huck

Delimited Values to ROW - Lucky Train ?

User Panel Messages

Announcements