1

I created a powershell script to do certain tasks for our sales/finance department.

Our system creates a large PDF file with several pages where each page is one invoice.

The PDF file needs to be split into single pages and renamed with their invoice number.

One of our plants creates 5 digit invoice numbers which is a bit problematic because my regex-search finds the ZIP code of our plant first before it finds the invoice number in the document.

Right now I have created the script in a way that it excludes the 3 ZIP codes that our plants have. But if we ever get to the point where Invoice # = ZIP code, then my script will fail.

If I could get the 2nd match of the regex-pattern, it would work every time. The overall structure of our invoices are all the same. So the invoice will always be the 2nd match.

This is the current snippet of my script:

$regexPattern = "((?!11111|22222|33333)\d{5,6})" if ($documentcontent -match $regexpattern) { $invoiceNumber = $matches[1] } 

$documencontent is basically a Word com-object filled with the content of a single page PDF. The splitting I do beforehand before I get to this section.

The "d{5,6}" is because only 1 plant has 5 digit invoice number, the other 2 have 6 digit invoice numbers. The script in its entirety works fine as of now, but I run the risk of 3 invoices to fail once the invoice number matches one of our ZIP codes.

Is there a way to change the "-match" or "matches[1]" so it always skips the first match it finds?

Thanks in advance and please tell me if I can provide you any more information!

Edit: A bit more information

$documentcontent = $document.content.text $document = $word.Documents.Open($invoice.Fullname) $word = New-Object -ComObject Word.Application 

$invoice is the current single page invoice (this is happening in a foreach loop).

11
  • 2
    Could you post sample text? Is there some significant text before/after ZIP or invoice? Commented Aug 1 at 15:56
  • There isn't much text to be honest, at least what powershell is concerned about. The reason is that during the process of converting the PDF to a word object, most fields become word-shapes with text-fields in them. I had someone create an additional field that duplicates the invoice number in the top right corner of the PDF invoices and change its colour to white. So I can search for it. In the same line there is our company name, address, zip code, phone (xxx) 123-4567, Fax (xxx) 123-4567. 2nd line has our website and the word invoice. That's it. The rest are shapes. Commented Aug 1 at 16:08
  • 1
    You need some conditional logic with a count of matches and -skip 1 -first 1 to do this I believe. I'm not sure how complex your requirements truly are, but explore this for a starting point. If there are exceptions and any match is only one ever, then those matches of 2nd one only would bring back a null value. You may need additional conditional logic with $matches.Count -ge 1 and $matches.Count -ge 2 with some if logic to process accordingly and differently based on the matches count. Commented Aug 1 at 18:23
  • 1
    is the invoice number alone on its own line? if so, you could disambiguate it from the zips using line markers like ^ and $: "^((?!11111|22222|33333)\d{5,6})$" and then take the first result. Commented Aug 1 at 21:27
  • 1
    @padawan_IT It is ok, go ahead and add the basic logic examples for a self-answer. Most of these questions are just trivial, I'm glad to give pointers via comments too without answers to help sometimes. Please self-answer with generic and simple example of the type of logic you used to suffice for your needs. I’m just glad you found a fix—that’s what matters! Keep coding, keep asking, keep answering, and keep leveling up. We’re all human, we all make mistakes, and that’s how we learn. Commented Aug 4 at 1:04

1 Answer 1

1

I ended up doing 2 foreach loops, one for 5 digit invoices, one for 6 digit invoices. I am sure there are better ways to do it, but seems to work.

Since there is only one 6-digit number in the documents, I don't need to exclude the zip codes in the regex expression.

During my loop through the 5 digit invoices, I put in a word-search for each zip code. If any one those sum up to 2, it means that the zip code and invoice number are the same by coincidence. That way I was able to cover the logical mistake in my code.

$destination = Enter your destination folder here $splitpath = Enter your folder with the PDFs here $ZIPone = 11111 $ZIPtwo = 22222 $ZIPthree = 33333 $countZIPone = 0 $countZIPtwo = 0 $countZIPthree = 0 $invoices = Get-Childitem -Path $splitPath -Filter "*.pdf" $word = New-Object -ComObject Word.Application $document = $word.Documents.Open($invoice.Fullname) $documentcontent = $document.content.text $regexPatternFive = "((?!11111|22222|33333)\d{5,5})" $regexPatternSix = "(\d{6,6})" foreach ($invoice in $invoices) { if ($documentcontent -match $regexpatternSix) { $invoiceNumber = $matches[1] } $document.Close() $word.quit() Rename-Item -Path $invoice.fullname -NewName "$invoiceNumber.pdf" Move-Item -Path $splitpath\$invoiceNumber.pdf -Destinatin $Destination } #All 6 digit invoices are moved away, refilling the variable and starting anew $invoices = Get-Childitem -Path $splitPath -Filter "*.pdf" foreach ($invoice in $invoices) { if ($documentcontent -match $regexpatternFive) { $invoiceNumber = $matches[1] } $countZIPone = 0 $countZIPtwo = 0 $countZIPthree = 0 $find = $word.selection.find $find.Text = $ZIPone $find.Replacement.Text = "" $find.Forward = $True $while ($find.execute()) { $countZIPone++ } $find = $word.selection.find $find.Text = $ZIPtwo $find.Replacement.Text = "" $find.Forward = $True $while ($find.execute()) { $countZIPtwo++ } $find = $word.selection.find $find.Text = $ZIPthree $find.Replacement.Text = "" $find.Forward = $True $while ($find.execute()) { $countZIPthree++ } if ($countZIPone -eq 2) { $invoiceNumber = $ZIPone } if ($countZIPtwo -eq 2) { $invoiceNumber = $ZIPtwo } if ($countZIPthree -eq 2) { $invoiceNumber = $ZIPthree } $document.Close() $word.quit() Rename-Item -Path $invoice.fullname -NewName "$invoiceNumber.pdf" Move-Item -Path $splitpath\$invoiceNumber.pdf -Destinatin $Destination } 

I have the actual script in an environment where it is hard to copy-paste out of it, so I hope I have no typos or mistakes in it. Basically retyped the whole code in here.

For anyone who wants to copy this, this is very important:

This only works if you have no other 5 or 6 digit numbers in your document! If you do, you have to find a different solution!

As mentioned in the comments, my documents are filled with shapes that have text-fields in them. Their contents wont be found with my search-terms. So it works perfectly for my needs.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.