0

I have some very long strings of text where the first 30 characters are all identical but the following characters are all random except that the last character is always a '>'. How can I use RegEx to search for all of these strings?

ie. In the example below the Lots of Garbage Text Here shown with leading & trailing asterisks is essentially any standard printable character except for a trailing >. When the > character is found, that is the final character of the string that I am searching for.

<?php if(!isset($GLOBALS["\x61***Lots of Garbage Text Here***> 
4
  • What text editor are you using? What operating system? Can you not just paste the text that you want into the Find/Replace box and leave the replace box empty? Does the text span lines? It might be useful to show us the full text that you want to remove so that people can suggest complete options. Commented Dec 21, 2014 at 9:47
  • 1
    Restore the PHP files from a backup, or re-install the scripts from a known-good source, such as version control? Commented Dec 21, 2014 at 9:57
  • First work out how the malware got there in the first place and fix the security hole(s). Then restore from your backups ... Commented Dec 21, 2014 at 14:14
  • Lovely to hear about finding malware, restoring from backups, etc. This is not about an OS, nor is it about a text editor. This question is only about RegEx. Nothing more. Now if anyone knows RegEx, I'd like to know how to search for the string that starts with a left pointing < and ends with a right pointing >, but has the string above from the beginning of the string through the x61 plus more garbage. Commented Dec 21, 2014 at 14:57

1 Answer 1

0

For:

<?php if(!isset($GLOBALS["\x61***Lots of Garbage Text Here***> 

The regex is:

^\<\?php if\(\!isset\(\$GLOBALS\[\"\\x61([^\>]*)\> 

The various back-slashes above escape each special character.

(Not all of those characters may need to be escaped, but it does no harm to escape them).

The key part is: ([^\>]*)

which translates as:

Match any character which isn't >, any number of times.

5
  • Awesome. I've been studying on this and have a question if I may... With the <,>," not being special characters, why was there any need to escape them? Also, why the user of the asterisk instead of the + with the first portion always being wanted? Commented Dec 22, 2014 at 20:41
  • What is the purpose of the leading ^, & why does the asterisk belong inside of the trailing paren instead following it? Commented Dec 22, 2014 at 21:00
  • I'll answer your questions as best as I can, @Robert - though do bear in mind I am an "intermediate" at Regex, rather than an expert. 1) I am not yet familiar with the entire library of Regex command characters, so I tend to escape everything that isn't A-Za-z0-9, just to be safe. It does no harm to escape a character that doesn't require escaping. 2) Yes, you're quite right - given that we know that the Garbage text is never 0 characters, then + is probably more appropriate than *. 3) The ^ indicates the start of the match. A $ indicates the end. You can use either or both. Commented Dec 22, 2014 at 21:31
  • 4) The * belongs inside the parentheses because it immediately follows the ] - and the parentheses wrap around the whole thing. Basically, the * (or +) says match as many of these [^\>] (ie. characters which aren't >) as you find. The parentheses says: "Whatever you've just found, remember it." I understand that you didn't necessarily need to record what you'd matched, but I tend to record match-strings by default (again, just in case). Commented Dec 22, 2014 at 21:37
  • If this response answers your question, @Robert, please accept it, so that others with similar questions may find it more easily. Many thanks! Commented Dec 23, 2014 at 16:27

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.