-   Notifications  You must be signed in to change notification settings 
- Fork 887
Tutorial 2 Altering Markdown Rendering
While many extensions to Python-Markdown add new syntax, occasionally, you want to simply alter the way Markdown renders the existing syntax. For example, you may want to display some images inline, but require externally hosted images to simply be links which point to the image.
Suppose the following Markdown was provided:
  We would like Python-Markdown to return the following HTML:
<p><img alt="a local image" src="/path/to/image.jpg" /></p> <p><a href="http://example.com/image.jpg">a remote image</a></p>Note: This tutorial is very generic and assumes a basic Python 3 development environment. A basic understanding of Python development is expected.
Let's consider the options available to us:
-  Override the image related inline patterns. While this would work, we don't need to alter the existing patterns. The parser is recognizing the syntax just fine. All we need to do is alter the HTML output. We also want to support both inline image links and reference style image links, which would require redefining both inline patterns, doubling the work. 
-  Leave the existing pattern alone and use a Treeprocessor to alter the HTML. This does not alter the tokenization of the Markdown syntax in any way. We can be sure that anything which represents an image will be included, even any new image syntax added by other third-party extensions. 
Given the above, let's use option two.
To begin, let's create a new Treeprocessor:
from markdown.treeprocessors import Treeprocessor class InlineImageProcessor(Treeprocessor): def run(self, root): # Modify the HTML hereThe run method of a Treeprocessor receives a root argument which contains an ElementTree object. We need to iterate over all of the img elements within that object and alter those which contain external URLs. Therefore, add the following code to the run method:
# Iterate over img elements only for element in root.iter('img'): # copy the element's attributes for later use attrib = element.attrib # Check for links to external images if attrib['src'].startswith('http'): # Save the tail tail = element.tail # Reset the element element.clear() # Change the element to a link element.tag = 'a' # Copy src to href element.set('href', attrib.pop('src')) # Copy alt to label element.text = attrib.pop('alt') # Reassign tail element.tail = tail # Copy all remaining attributes to element for k, v in attrib.items(): element.set(k, v)A few things to note about the above code:
- We make a copy of the element's attributes so that we don't loose them when we later reset the element with element.clear(). The same applies for thetail. Asimgelements don't havetext, we don't need to worry about that.
- We explicitly set the hrefattribute and theelement.textas those are assigned to different attribute names onaelements that onimgelements. When doing so, wepopthesrcandaltattributes fromattribso that they are no longer present when we copy all remaining attributes in the last step.
- We don't need to make changes to imgelements which point to internal images, so there no need to reference them in the code (they simply get skipped).
- The test for external links (startswith('http')) could be improved and is left as an exercise for the reader.
Now we need to inform Markdown of our new Treeprocessor with an Extension subclass:
from markdown.extensions import Extension class ImageExtension(Extension): def extendMarkdown(self, md): # Register the new treeprocessor md.treeprocessors.register(InlineImageProcessor(md), 'inlineimageprocessor', 15)We register the Treeprocessor with a priority of 15, which ensures that it runs after all inline processing is done.
Let's see that all together:
ImageExtension.py
from markdown.treeprocessors import Treeprocessor from markdown.extensions import Extension class InlineImageProcessor(Treeprocessor): def run(self, root): for element in root.iter('img'): attrib = element.attrib if attrib['src'].startswith('http'): tail = element.tail element.clear() element.tag = 'a' element.set('href', attrib.pop('src')) element.text = attrib.pop('alt') element.tail = tail for k, v in attrib.items(): element.set(k, v) class ImageExtension(Extension): def extendMarkdown(self, md): md.treeprocessors.register(InlineImageProcessor(md), 'inlineimageprocessor', 15)Now, pass our extension to Markdown:
Test.py
import markdown input = """    """ from ImageExtension import ImageExtension html = markdown.markdown(input, extensions=[ImageExtension()]) print(html)And running python Test.py correctly returns the following output:
<p><img alt="a local image" src="/path/to/image.jpg" title="A title."/></p> <p><a href="http://example.com/image.jpg" title="A title.">a remote image</a></p>Success! Note that we included a title for each image, which was also properly retained.
Suppose we want to allow the user to provide a list of know image hosts. Any img tags which point at images in those hosts may be inlined, but any other images should be external links. Of course, we want to keep the existing behavior for internal (relative) links.
First we need to add the configuration option to our Extension subclass:
class ImageExtension(Extension): def __init__(self, **kwargs): # Define a config with defaults self.config = {'hosts' : [[], 'List of approved hosts']} super(ImageExtension, self).__init__(**kwargs)We defined a hosts configuration setting which defaults to an empty list. Now, we need to pass that option on to our treeprocessor in the extendMarkdown method:
def extendMarkdown(self, md): # Pass host to the treeprocessor md.treeprocessors.register(InlineImageProcessor(md, hosts=self.getConfig('hosts')), 'inlineimageprocessor', 15)Next, we need to modify our treeprocessor to accept the new setting:
class InlineImageProcessor(Treeprocessor): def __init__(self, md, hosts): self.md = md # Assign the setting to the hosts attribute of the class instance self.hosts = hostsThen, we can add a method which uses the setting to test a URL:
from urllib.parse import urlparse class InlineImageProcessor(Treeprocessor): ... def is_unknown_host(self, url): url = urlparse(url) # Return False if network location is empty or an known host return url.netloc and url.netloc not in self.hostsFinally, we can make use of the test method by replacing the if attrib['src'].startswith('http'): line of the run method with if self.is_unknown_host(attrib['src']):.
The final result should look like this:
ImageExtension.py
from markdown.treeprocessors import Treeprocessor from markdown.extensions import Extension from urllib.parse import urlparse class InlineImageProcessor(Treeprocessor): def __init__(self, md, hosts): self.md = md self.hosts = hosts def is_unknown_host(self, url): url = urlparse(url) return url.netloc and url.netloc not in self.hosts def run(self, root): for element in root.iter('img'): attrib = element.attrib if self.is_unknown_host(attrib['src']): tail = element.tail element.clear() element.tag = 'a' element.set('href', attrib.pop('src')) element.text = attrib.pop('alt') element.tail = tail for k, v in attrib.items(): element.set(k, v) class ImageExtension(Extension): def __init__(self, **kwargs): self.config = {'hosts' : [[], 'List of approved hosts']} super(ImageExtension, self).__init__(**kwargs) def extendMarkdown(self, md): md.treeprocessors.register(InlineImageProcessor(md, hosts=self.getConfig('hosts')), 'inlineimageprocessor', 15)Let's test that out:
Test.py
import markdown input = """      """ from ImageExtension import ImageExtension html = markdown.markdown(input, extensions=[ImageExtension(hosts=['example.com'])]) print(html)And running python Test.py returns the following output:
<p><img alt="a local image" src="/path/to/image.jpg"/></p> <p><img alt="a remote image" src="http://example.com/image.jpg"/></p> <p><a href="http://exclude.com/image.jpg">an excluded remote image</a></p>Wrapping the above extension up into a package for distribution is left as an exercise for the reader.
