0

I am writing a Windows PowerShell script to split a text file based on a delimiter, and create output filenames with an incrementing number and a captured identifier string.

I've checked for syntax errors and the script runs without errors. The splitting, the output file formation and filename numbering all works ok, but the script is not populating the captured identifier value into the filename as expected. Instead of using the extracted string, it's using 'True'.

I've been using $idValue to extract the character strings I want, but all I get from it is "True", as in, it sees the regex match I've used, and agrees the logic condition is true (there's a match).

But before I go on, here's a simplified version of the input file, so you can see the structure:

\id EUCAL \v Text \v more text \z Endsect \id LEUCO \v Text \v more text \z Endsect 

The strings I want to see in the filename are in the "\id" field. So here, it's EUCAL and LEUCO. All these "id" strings are exactly 5 characters in length. So I've taken advantage of that in the regular expression I've used to extract them. The regex definitely works. But instead of getting output filenames 01-EUCAL.txt and 02-LEUCO.txt (etc.), I get 01-True.txt and 02-True.txt, which are puzzling.

I've verified that the regular expression works correctly. I suspect the issue might be with how I'm accessing the captured group value in $idValue. Any help in resolving this would be appreciated.

Here's the PowerShell script:

$Text = Get-Content -Path "D:\Test_input.txt" -raw # Read the file content $SplitText = $Text -split "z Endsect\r?\n?" # Split with empty lines included $SplitText = $SplitText -notmatch '^$' # Filter out empty lines $i = 1 foreach ($File in $SplitText) { # Append "z Endsect" to every entry $File += "z Endsect" # Extract the identifier (assuming 5 characters after \id ) using regex $idValue = $File -match "\\id\s(.{5})" -replace '(?<=\\id\s)', '' # Capture 5 characters after \id and space/tab # Pad the incrementing number with a leading zero if necessary $paddedNumber = "{0:00}" -f $i # Construct the new filename with separator and padded number $NewFilename = "D:\$paddedNumber-$idValue.txt" # Write the content to the new files $File | Out-File -FilePath $NewFilename $i++ } 
0

3 Answers 3

1

The TRUE value comes from the -match.

You'd better separate the two instructions -match and -replace

if ($File -match "\\id\s(.{5})") { $idValue = $File -replace '\\id\s', '' } 
2
  • Ok, that was a good suggestion but hasn't solved the whole problem. The powershell regex -match operator seems to be a problem too, at least how I've used it. The capture seems to run on beyond the expected capture group, until until it encounters the next \id\s pattern. So I've also tried -match "\\id\s(.{5,5})(?=\r|\n)" to try and limit the capture. But it still seems to run on. I can see this with a Write-Host "Extracted ID: $idValue", which prints the identifier as covering multiple lines, until the next \id\s pattern. This causes the filename to fail from illegal characters. Commented Mar 21, 2024 at 13:43
  • @IanS that has nothing to do with -match operator, which as the name suggests, matches. I'll add a more full answer Commented Mar 21, 2024 at 16:21
1

There are few problems with your script:

  • -match operator, as the other answer mentioned, returns a boolean value. You then used -replace on the bool, which is of course never going to work. If you got type mismatch, checking documentation is always a good idea.

  • $idValue assignment assumes that $File variable is a single line, but it's not. It's a chunk of text. And you're only removing \id part from it with the -replace operator, leaving the rest in tact.

What you actually need to do is to simply extract the ID. No double match/replace logic is needed, only one. There are many ways to do so, e.g. with Select-String:

$idValue = $File | Select-String -Pattern "^\\id\s(\w{5})$" | % { $_.Matches.Groups[1].Value } | Select -First 1 

Explantation:

  • the regex matches start of the line (^), then \id followed by single whitespace (\s), then we capture 5 word characters into a group (\w{5}), lastly we make sure that the line ends ($)
  • % { } is ForEach-Object shorthand in case a chunk would end up multiple match objects. You claim it's not the case, but it's good to cover potential edge cases regardless. You could skip this and last bulelt if you're 100% sure only 1 match per chunk would exist.
  • $_.Matches.Groups[1].Value gets the 1st capture group value with the ID and puts it into new array
  • Select -First 1 ensures that we get only the 1st ID

This could also be optimized a bit to:

$idValue = ($File | Select-String -Pattern "^\\id\s(\w{5})$" | Select -First 1).Matches.Groups[1].Value 

I would also recomend naming variables in a more clear way, e.g. $File is confusing and I'm not surprising you forgot it's whole chunk because of it.

0

For completeness of this post, I'll copy in the resolved script, which now works. It relies on Destroy666's advice. Thank you to Destroy666 and Toto for their attention and time.

This Powershell script is complete and could be useful for anyone wishing to split a large text file at a keyword delimiter. The sample input file is the same as in the original question above. Note that in the script below I have included both code fragments suggested by Destroy666 for extracting the ID string. In practice, only one is needed; choose one and delete the other.

$Fulltext = Get-Content -Path "D:\Test_input.txt" -Raw # Read file content $SplitText = $Fulltext -split "Endsect\r?\n?" # Split at keyword $SplitText = $SplitText -notmatch '^$' # Filter out empty lines $i = 1 foreach ($Section in $SplitText) { # Append "Endsect" to every entry, to replace the keyword that split omits $Section += "Endsect" # Create number for new filename, pad the incrementing number with a leading zero if necessary $paddedNumber = "{0:00}" -f $i # Extract ID string for new filename (coding alternative version 1) $idValue = $Section | Select-String -Pattern "\\id\s(\w{5})" | % { $_.Matches.Groups[1].Value } | Select -First 1 # Extract ID string for new filename (coding alternative version 2) $idValue = ($Section | Select-String -Pattern "\\id\s(\w{5})" | Select -First 1).Matches.Groups[1].Value # Write new filenames with padded number and ID string $Section | Out-File "D:\Output\$paddedNumber-$idValue.txt" # Write the content to new files $i++ } 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.