0

I'm on macOS Mojave and trying to use regex to output the results of a match in 2 different columns

I have a file that contains these strings:

JJ1111-Aaaaaa-AB-22222222-f_2-777777_S1_L000_trtrt JJ1111-Baaaaa-AB-22322222-f_2-777777_S1_L000_trtrt JJ1111-Caaaaa-AB-22222322-f_2-777777_S1_L000_trtrt 

I want to extract the "Aaaaaa" (or the string of 6 consecutive charaters) and the String of 2 capital letters "AB".

Now the command

egrep -oh '[a-zA-Z]{6}' my.txt 

will return

Aaaaaa Baaaaa Caaaaa 

And

egrep -oh '\-[A-Z]{2}' my.txt | sed 's/-//g' 

will return

AB AB AB 

Is there a way (I'm thinking using awk), to output the two matches in a new file with 2 columns that are separated with tabs? I've tried this:

awk '{$1 ~ /[a-zA-Z]{6}/; print $1}' my.txt 

But only gives me the original string of characters

1
  • There isn't any separator in the input. Try making - a separator awk -F'-' '$2 ~ /[a-zA-Z]{6}/ && $3 ~ /[A-Z]{2}/ {print $2"\t"$3}' Commented Feb 14, 2020 at 23:45

2 Answers 2

1

I think the most straightforward tool here is cut:

cut -sf 2,3 -d '-' --output-delimiter=$'\t' my.txt > output.txt

As you can see, using - as delimiter, it fetches the 2nd and 3rd fields and converts the dash in a tab. The output is written to output.txt.

0

As you have already made up working regex'es... why not continue use them:
If you have the data in 'inputfile'

 sed -rne 's/.*([a-zA-Z]{6})\-([A-Z]{2}).*/\1\t\2/p' <inputfile 

add
| od -t x1z -w10

... at the end and you will see this, to verify what you get:

 0000000 41 61 61 61 61 61 09 41 42 0a >Aaaaaa.AB.< 0000012 42 61 61 61 61 61 09 41 42 0a >Baaaaa.AB.< 0000024 43 61 61 61 61 61 09 41 42 0a >Caaaaa.AB.< 0000036 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.