1

Been struggling with this and wondering if someone can help. I have a large text file that have extra data in it I want to strip out. Here is a sample of the input file:

Text In Page - 1 S Dept l<m RKB) "1915 slightly 234234 "sil dsf 56 "gr gl 1920 100 1925 100 1930 100 Cls "1935 100 Cl Text In Page - 2 l<m RKB) "1915 slightly "sil "gr glauc 1920 100 1925 100 1930 100 Cls "1935 100 Cl 

I want to remove the following:

  • Any blank lines
  • Any " at the beginning of lines
  • Any lines that begin with a letter A-Z, a-z

So with the above example I'd be left with

1915 1920 100 1925 100 1930 100 Cls 1935 100 Cl 1915 1920 100 1925 100 1930 100 Cls 1935 100 Cl 
3
  • Would it be equivalent to say that you only want to keep lines that begin with a digit? Commented Oct 15, 2015 at 17:44
  • 1
    @roaima no, because he wants to remove the character " at the beginning of lines, not get rid of those lines. Commented Oct 15, 2015 at 17:53
  • I would have liked to have seen a little independent research from the asker. However, RegEx is typically a programmatic concept and might not have been easily researched. Commented Oct 16, 2015 at 13:40

1 Answer 1

5

I'm thinking:

(gc D:\test.txt) -replace '^"' | sls '\S' | sls -NotMatch '^[A-Za-z]' | sc out.txt 

Which does:

  • get the lines of the file, and if the first character is a quote, replace it with nothing
  • select lines which match "not whitespace" (i.e. empty lines get filtered out)
  • select lines which don't start with A-Za-z
  • writes the results to out.txt

There are various ways to write the long version depending on how much you like chaining things with the pipeline versus working with variables over and over, but it's doing this:

$lines = Get-Content D:\test.txt $lines = $lines -replace '^"' $lines = $lines | Select-String '\S' $lines = $lines | Select-String -NotMatch '^[A-Za-z]' $lines | Set-Content out.txt 
2
  • Last one could also be shortened to (gc D:\test.txt)-replace'^"'|sls '\S'|sls '^\d'|sc out.txt :) Commented Oct 15, 2015 at 23:21
  • 2
    @MathiasR.Jessen If you are going with "remove lines starting A-Z" means the same as "include lines starting with a digit", then you can get rid of sls '\S', because sls '^\d' will filter out empty lines. Commented Oct 15, 2015 at 23:45

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.