Python Forum
Delimited Values to ROW - Lucky Train ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Delimited Values to ROW - Lucky Train ?
#1
All,

I have an XML file which contains a tag called Relation. This tag will have Custom SQL Query something like below.


<relation connection='oraRLe.1vt9plg0hto4k31akb7rk07dlyaz' name='Custom SQL Query' type='text'>Select i.RL_PROGRAM_ID ,i.RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as &quot;Program&quot; ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;) UNION Select i.RL_PROGRAM_ID ,i.RL_PROGRAM_ID as RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as Program ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;)</relation>


My requirement is to get TABLE and its relevant COLUMN NAMES used in he query. I have 120 XML file like this.

Expected Output:

FileName TableName ColumnName
F1 T1 C1
F1 T1 C2
F1 T2 C1

F2 T1 C1
F2 T2 C1

There are 2 cases:

#1 - If alias found in the column name then we have to find the same alias name after the table name and match it with that particular table.
#2 - If alias not found, then we have to map each and very column to every tables used after the FROM clause ( It is like a cartesian product)
I will take the actual table name & columns name from ALL_TAB_COLUMNS( Oracle) and match with the output point#2 to eliminate the unwanted combinations.

Inputs are welcome!

This is my 2nd post and 2nd task in Python...

Let me tell you what are all the options i explored so far.. I dont know is this the right approach to get the desired result.

Note the below result is somehow ok, but i tried to replace JUNK character, with the help of REPLACE(), But no luck.

from pandas import DataFrame a = DataFrame([{'var1': ' c.Type

 ,c.RL_INVESTMENT_ID
-- ,c.RL_RESOURCE_ID
', 'var2': 1}, {'var1': 'd,e,f', 'var2': 2}]) a= a.replace("
","") a= a.replace("&quot;",'"') def tidy_split(df, column, sep='|', keep=False): """ Split the values of a column and expand so the new DataFrame has one split value per row. Filters rows where the column is missing. Params ------ df : pandas.DataFrame dataframe with the column to split and expand column : str the column to split and expand sep : str the string used to split the column's values keep : bool whether to retain the presplit value as it's own row Returns ------- pandas.DataFrame Returns a dataframe with the same columns as `df`. """ indexes = list() new_values = list() df = df.dropna(subset=[column]) print ("df as ", df) for i, presplit in enumerate(df[column].astype(str)): values = presplit.split(sep) print ("values as ", values) print ("i as" , i) if keep and len(values) > 1: indexes.append(i) new_values.append(presplit) for value in values: indexes.append(i) new_values.append(value) new_df = df.iloc[indexes, :].copy() new_df[column] = new_values print( new_df[column]) def main(): print (a) tidy_split(a, 'var1', sep=',') if __name__ == '__main__': main()
Reply
#2
any thoughts pls?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tab Delimited Strings? johnywhy 7 3,229 Jan-13-2024, 10:34 PM
Last Post: sgrey
  How to fix With n_samples=0, test_size=0.2 and train_size=None, the resulting train s MrSonoa 2 7,680 Apr-15-2023, 12:02 PM
Last Post: MrSonoa
  dataframe write to tab delimited differs from Excel koh 0 3,057 Aug-01-2021, 02:46 AM
Last Post: koh
  How do I split a dataset into test/train/validation according to a particular group? 69195Student 1 3,669 May-12-2021, 08:27 PM
Last Post: bowlofred
  Appending data into a file in tab delimited format metro17 1 6,043 Aug-06-2019, 07:34 AM
Last Post: fishhook
  Creating a delimited file from DB Table anubhav2020 9 11,668 Sep-19-2018, 05:22 PM
Last Post: Axel_Erfurt
  Load Comma Delimited csv to Nested Dictionary Huck 2 10,137 Apr-30-2018, 04:21 PM
Last Post: Huck

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.