Mar-08-2020, 07:48 PM (This post was last modified: Mar-08-2020, 07:48 PM by interjectdirector.)
I've searched pretty well for a preexisting topic but I can't find anything. I know the answer must be out there, I just don't think I perfectly understand what it is I'm trying to achieve. Learning regex has been a rocky road for me, and although I have the basics mostly down, I can't figure out how to achieve this particular goal.
I am trying to use list comprehension to find a regex match for each index in list
I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help.
EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain.
I am trying to use list comprehension to find a regex match for each index in list
firstList. For each index, the exact matching regex should be written to list secondList. If there is no matching regex, the index from firstList will not be written to secondList. However, I also want this list comprehension to strip the path following the domain name and write it to secondList (e.g. "https://gmail.com/test123" at firstList[1] should be written to secondList as "https://gmail.com/")import re regex = re.compile(r'^http[s]?:\/?\/?([^:\/\s]+)/') firstList = ['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images'] secondList = [i for i in firstList if regex.match(i)] print(firstList) print(secondList)Output:
Output:['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images'] ['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch']As desired, my list comprehension is eliminating list index values that do not have URL components, but it is still including the path following the domain. Why is this? If I use print(re.match(regex, firstList[1]))My output shows the match is only https://gmail.com/ through output
Output:<re.Match object; span=(0, 18), match='https://gmail.com/'>I understand that my list comprehension method is adding to secondList if there is any regex match at all, but how do I get it to write the match output as seen in re.match to secondList instead of the entirety of the index that has a match?I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help.
EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain.
