The kinda way I use to modify HTML with RegExp using PHP.
HTML Tag Regex
/\<(?<tag>[a-z][a-z0-9\-]*)(\s+([\s\S]*?))?\/?\>(([\s\S]*?)\<\/(?P=tag)\>)?/ The above RegExp can be broken down as:
-
/: The regex opening delimiter. -
\<: Matches the character<of an opening tag. -
(?<tag>[a-z][a-z0-9\-]*): Matches HTML valid tag name, which should start with a character betweenaandz, could contain another characters betweenatoz, and numbers between0and9, and could also contain the character-in it. -
(\s+([\s\S]*?))?: Matches the entire attributes of the tag including spaces between them, but only if they were present. -
\/?: Matches the character/of self closing tags. -
\>: Matches the character>, which is supposed to be the closing character of the opening tag. -
(([\s\S]*?)\<\/(?P=tag)\>)?: Matches the content or HTML inside the tag and the closing tag, but only if the tag is not self closing tag. -
/: The regex closing delimiter.
HTML Tag Attributes Regex
/([\w\-]+)(\s*\=\s*(?|(?<quot>[\'"])([\s\S]*?)(?P=quot)|(?<quot>)([\w\-]+)))?/ The above RegExp can be broken down as:
-
/: The regex opening delimiter. -
([\w\-]+): Matches the attributes key/name. -
(\s*\=\s*(?|(?<quot>[\'"])([\s\S]*?)(?P=quot)|(?<quot>)([\w\-]+)))?: Matches the value of the attribute, which could be anything wrapped in a single-quote (') or in a double-quote ("). Also, could be naked (not wrapped in a quote). If not wrapped, the value must only contain characters in the rangeatozor the capitalsAtoZ, and numbers in the range0to9, and_(underscore), and-(hyphen). This could also match nothing for boolean attributes. -
/: The regex closing delimiter.
Example Usage
<?php // HTML elements $content = <<<EOL <p>Text paragraph.</p> <img src="http://example.com/image-200x320.png" width="200" height="320"> EOL; // Tags matching RegExp $tags_regexp = '/\<(?<tag>[a-z][a-z0-9\-]*)(\s+([\s\S]*?))?\/?\>(([\s\S]*?)\<\/(?P=tag)\>)?/'; // Attributes matching RegExp $atts_regexp = '/([\w\-]+)(\s*\=\s*(?|(?<quot>[\'"])([\s\S]*?)(?P=quot)|(?<quot>)([\w\-]+)))?/'; // Match all the valid elements in the HTML preg_match_all( $tags_regexp, $content, $matches, PREG_SET_ORDER ); // Loop through and make the necessary changes foreach ( $matches as $match ) { // We are going to modify only image tags if ( 'img' !== $match[ 'tag' ] ) continue; // Match all the attributes preg_match_all( $atts_regexp, $match[2], $atts_match ); // Combine the keys and the values $atts_match = array_combine( $atts_match[1], $atts_match[4] ); // Build back a HTML valid attributes $atts = ''; foreach ( $atts_match as $name => $value ) { $atts .= sprintf( ' %s="%s"', $name, $value ); } // Replacement for the tag $amp = sprintf( '<amp-img%s></amp-img>', $atts ); // Replace the complete tag match with the new replacement $content = str_replace( $match[0], $amp, $content ); } // The AMPifyed HTML /** * <p>Text paragraph.</p> * <amp-img src="http://example.com/image-200x320.png" width="200" height="320"></amp-img> */ echo $content; The above could also be improved to make a complete HTML-to-AMP converter for simple pages.
Top comments (0)