Domain Specific Languages in Python Siddharta Govindaraj siddharta@silverstripesoftware.com
What are DSLs? Specialized mini-languages for specific problem domains that make it easier to work in that domain
Example: SQL SQL is a mini language specialized to retrieve data from a relational database
Example: Regular Expressions Regular Expressions are mini languages specialized to express string patterns to match
Life Without Regular Expressions def is_ip_address(ip_address): components = ip_address_string.split(".") if len(components) != 4: return False try: int_components = [int(component) for component in components] except ValueError: return False for component in int_components: if component < 0 or component > 255: return False return True
Life With Regular Expressions def is_ip(ip_address_string): match = re.match(r"^(d{1,3}).(d{1,3}).(d{1,3}). (d{1,3})$", ip_address_string) if not match: return False for component in match.groups(): if int(component) < 0 or int(component) > 255: return False return True
The DSL that simplifies our life ^(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})$
Why DSL - Answered When working in a particular domain, write your code in a syntax that fits the domain. When working with patterns, use RegEx When working with RDBMS, use SQL When working in your domain – create your own DSL
The two types of DSLs External DSL – The code is written in an external file or as a string, which is read and parsed by the application
The two types of DSLs Internal DSL – Use features of the language (like metaclasses) to enable people to write code in python that resembles the domain syntax
Creating Forms – No DSL <form> <label>Name:</label><input type=”text” name=”name”/> <label>Email:</label><input type=”text” name=”email”/> <label>Password:</label><input type=”password” name=”name”/> </form>
Creating Forms – No DSL – Requires HTML knowledge to maintain – Therefore it is not possible for the end user to change the structure of the form by themselves
Creating Forms – External DSL UserForm name->CharField label:Username email->EmailField label:Email Address password->PasswordField This text file is parsed and rendered by the app
Creating Forms – External DSL + Easy to understand form structure + Can be easily edited by end users – Requires you to read and parse the file
Creating Forms – Internal DSL class UserForm(forms.Form): username = forms.RegexField(regex=r'^w+$', max_length=30) email = forms.EmailField(maxlength=75) password = forms.CharField(widget=forms.PasswordInput()) Django uses metaclass magic to convert this syntax to an easily manipulated python class
Creating Forms – Internal DSL + Easy to understand form structure + Easy to work with the form as it is regular python + No need to read and parse the file – Cannot be used by non-programmers – Can sometimes be complicated to implement – Behind the scenes magic → debugging hell
Creating an External DSL UserForm name:CharField -> label:Username size:25 email:EmailField -> size:32 password:PasswordField Lets write code to parse and render this form
Options for Parsing Using string functions → You have to be crazy Using regular expressions → Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski Writing a parser → ✓ (we will use PyParsing)
Step 1: Get PyParsing pip install pyparsing
Step 2: Design the Grammar form ::= form_name newline field+ field ::= field_name colon field_type [arrow property+] property ::= key colon value form_name ::= word field_name ::= word field_type ::= CharField | EmailField | PasswordField key ::= word value ::= alphanumeric+ word ::= alpha+ newline ::= n colon ::= : arrow ::= ->
Quick Note Backus-Naur Form (BNF) is a syntax for specifying grammers
Step 3: Implement the Grammar newline = "n" colon = ":" arrow = "->" word = Word(alphas) key = word value = Word(alphanums) field_type = oneOf("CharField EmailField PasswordField") field_name = word form_name = word field_property = key + colon + value field = field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) + newline form = form_name + newline + OneOrMore(field)
Quick Note PyParsing itself implements a neat little internal DSL for you to describe the parser grammer Notice how the PyParsing code almost perfectly reflects the BNF grammer
Output > print form.parseString(input_form) ['UserForm', 'n', 'name', ':', 'CharField', '->', 'label', ':', 'Username', 'size', ':', '25', 'n', 'email', ':', 'EmailField', '->', 'size', ':', '25', 'n', 'password', ':', 'PasswordField', 'n'] PyParsing has neatly parsed our form input into tokens. Thats nice, but we can do more.
Step 4: Suppressing Noise Tokens newline = Suppress("n") colon = Suppress(":") arrow = Suppress("->")
Output > print form.parseString(input_form) ['UserForm', 'name', 'CharField', 'label', 'Username', 'size', '25', 'email', 'EmailField', 'size', '25', 'password', 'PasswordField'] All the noise tokens are now removed from the parsed output
Step 5: Grouping Tokens field_property = Group(key + colon + value) field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline)
Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', [['label', 'Username'], ['size', '25']]], ['email', 'EmailField', [['size', '25']]], ['password', 'PasswordField',[]]] Related tokens are now grouped together in a list
Step 6: Give Names to Tokens form_name = word.setResultsName("form_name") field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline).setResultsName("form_field")
Output > parsed_form = form.parseString(input_form) > print parsed_form.form_name UserForm > print parsed_form.fields[1].field_type EmailField Now we can refer to parsed tokens by name
Step 7: Convert Properties to Dict def convert_prop_to_dict(tokens): prop_dict = {} for token in tokens: prop_dict[token.property_key] = token.property_value return prop_dict field = Group(field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) .setParseAction(convert_prop_to_dict) + newline).setResultsName("form_field")
Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', {'size': '25', 'label': 'Username'}], ['email', 'EmailField', {'size': '32'}], ['password', 'PasswordField', {}] ] Sweet! The field properties are parsed into a dict
Step 7: Generate HTML Output We need to walk through the parsed form and generate a html string out of it
def get_field_html(field): properties = field[2] label = properties["label"] if "label" in properties else field.field_name label_html = "<label>" + label + "</label>" attributes = {"name":field.field_name} attributes.update(properties) if field.field_type == "CharField" or field.field_type == "EmailField": attributes["type"] = "text" else: attributes["type"] = "password" if "label" in attributes: del attributes["label"] attributes_html = " ".join([name+"='"+value+"'" for name,value in attributes.items()]) field_html = "<input " + attributes_html + "/>" return label_html + field_html + "<br/>" def render(form): fields_html = "".join([get_field_html(field) for field in form.fields]) return "<form id='" + form.form_name.lower() +"'>" + fields_html + "</form>"
Output > print render(form.parseString(input_form)) <form id='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
It works, but.... Yuck! The output rendering code is an UGLY MESS
Wish we could do this... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Neat, clean syntax that matches the output domain well. But how do we create this kind of syntax?
Lets create an Internal DSL
class HtmlElement(object): default_attributes = {} tag = "unknown_tag" def __init__(self, *args, **kwargs): self.attributes = kwargs self.attributes.update(self.default_attributes) self.children = args def __str__(self): attribute_html = " ".join(["{}='{}'".format(name, value) for name,value in self.attributes.items()]) if not self.children: return "<{} {}/>".format(self.tag, attribute_html) else: children_html = "".join([str(child) for child in self.children]) return "<{} {}>{}</{}>".format(self.tag, attribute_html, children_html, self.tag)
> print HtmlElement(id=”test”) <unknown_tag id='test'/> > print HtmlElement(HtmlElement(name=”test”), id=”id”) <unknown_tag id='id'><unknown_tag name='test'/></unknown_tag>
class Input(HtmlElement): tag = "input" def __init__(self, *args, **kwargs): HtmlElement.__init__(self, *args, **kwargs) self.label = self.attributes["label"] if "label" in self.attributes else self.attributes["name"] if "label" in self.attributes: del self.attributes["label"] def __str__(self): label_html = "<label>{}</label>".format(self.label) return label_html + HtmlElement.__str__(self) + "<br/>"
> print InputElement(name=”username”) <label>username</label><input name='username'/><br/> > print InputElement(name=”username”, label=”User ID”) <label>User ID</label><input name='username'/><br/>
class Form(HtmlElement): tag = "form" class CharField(Input): default_attributes = {"type":"text"} class EmailField(CharField): pass class PasswordField(Input): default_attributes = {"type":"password"}
Now... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Nice!
Step 7 Revisited: Output HTML def render(form): field_dict = {"CharField": CharField, "EmailField": EmailField, "PasswordField": PasswordField} fields = [field_dict[field.field_type] (name=field.field_name, **field[2]) for field in form.fields] return Form(*fields, id=form.form_name.lower()) Now our output code uses our Internal DSL!
INPUT UserForm name:CharField -> label:Username size:25 email:EmailField -> size:32 password:PasswordField OUTPUT <form id='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
Get the whole code http://bit.ly/pyconindia_dsl
Summary + DSLs make your code easier to read + DSLs make your code easier to write + DSLs make it easy to for non-programmers to maintain code + PyParsing makes is easy to write External DSLs + Python makes it easy to write Internal DSLs

Creating Domain Specific Languages in Python

  • 1.
    Domain Specific Languagesin Python Siddharta Govindaraj siddharta@silverstripesoftware.com
  • 2.
    What are DSLs? Specializedmini-languages for specific problem domains that make it easier to work in that domain
  • 3.
    Example: SQL SQL isa mini language specialized to retrieve data from a relational database
  • 4.
    Example: Regular Expressions RegularExpressions are mini languages specialized to express string patterns to match
  • 5.
    Life Without RegularExpressions def is_ip_address(ip_address): components = ip_address_string.split(".") if len(components) != 4: return False try: int_components = [int(component) for component in components] except ValueError: return False for component in int_components: if component < 0 or component > 255: return False return True
  • 6.
    Life With RegularExpressions def is_ip(ip_address_string): match = re.match(r"^(d{1,3}).(d{1,3}).(d{1,3}). (d{1,3})$", ip_address_string) if not match: return False for component in match.groups(): if int(component) < 0 or int(component) > 255: return False return True
  • 7.
    The DSL thatsimplifies our life ^(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})$
  • 8.
    Why DSL -Answered When working in a particular domain, write your code in a syntax that fits the domain. When working with patterns, use RegEx When working with RDBMS, use SQL When working in your domain – create your own DSL
  • 9.
    The two typesof DSLs External DSL – The code is written in an external file or as a string, which is read and parsed by the application
  • 10.
    The two typesof DSLs Internal DSL – Use features of the language (like metaclasses) to enable people to write code in python that resembles the domain syntax
  • 11.
    Creating Forms –No DSL <form> <label>Name:</label><input type=”text” name=”name”/> <label>Email:</label><input type=”text” name=”email”/> <label>Password:</label><input type=”password” name=”name”/> </form>
  • 12.
    Creating Forms –No DSL – Requires HTML knowledge to maintain – Therefore it is not possible for the end user to change the structure of the form by themselves
  • 13.
    Creating Forms –External DSL UserForm name->CharField label:Username email->EmailField label:Email Address password->PasswordField This text file is parsed and rendered by the app
  • 14.
    Creating Forms –External DSL + Easy to understand form structure + Can be easily edited by end users – Requires you to read and parse the file
  • 15.
    Creating Forms –Internal DSL class UserForm(forms.Form): username = forms.RegexField(regex=r'^w+$', max_length=30) email = forms.EmailField(maxlength=75) password = forms.CharField(widget=forms.PasswordInput()) Django uses metaclass magic to convert this syntax to an easily manipulated python class
  • 16.
    Creating Forms –Internal DSL + Easy to understand form structure + Easy to work with the form as it is regular python + No need to read and parse the file – Cannot be used by non-programmers – Can sometimes be complicated to implement – Behind the scenes magic → debugging hell
  • 17.
    Creating an ExternalDSL UserForm name:CharField -> label:Username size:25 email:EmailField -> size:32 password:PasswordField Lets write code to parse and render this form
  • 18.
    Options for Parsing Usingstring functions → You have to be crazy Using regular expressions → Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski Writing a parser → ✓ (we will use PyParsing)
  • 19.
    Step 1: GetPyParsing pip install pyparsing
  • 20.
    Step 2: Designthe Grammar form ::= form_name newline field+ field ::= field_name colon field_type [arrow property+] property ::= key colon value form_name ::= word field_name ::= word field_type ::= CharField | EmailField | PasswordField key ::= word value ::= alphanumeric+ word ::= alpha+ newline ::= n colon ::= : arrow ::= ->
  • 21.
    Quick Note Backus-Naur Form(BNF) is a syntax for specifying grammers
  • 22.
    Step 3: Implementthe Grammar newline = "n" colon = ":" arrow = "->" word = Word(alphas) key = word value = Word(alphanums) field_type = oneOf("CharField EmailField PasswordField") field_name = word form_name = word field_property = key + colon + value field = field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) + newline form = form_name + newline + OneOrMore(field)
  • 23.
    Quick Note PyParsing itselfimplements a neat little internal DSL for you to describe the parser grammer Notice how the PyParsing code almost perfectly reflects the BNF grammer
  • 24.
    Output > print form.parseString(input_form) ['UserForm','n', 'name', ':', 'CharField', '->', 'label', ':', 'Username', 'size', ':', '25', 'n', 'email', ':', 'EmailField', '->', 'size', ':', '25', 'n', 'password', ':', 'PasswordField', 'n'] PyParsing has neatly parsed our form input into tokens. Thats nice, but we can do more.
  • 25.
    Step 4: SuppressingNoise Tokens newline = Suppress("n") colon = Suppress(":") arrow = Suppress("->")
  • 26.
    Output > print form.parseString(input_form) ['UserForm','name', 'CharField', 'label', 'Username', 'size', '25', 'email', 'EmailField', 'size', '25', 'password', 'PasswordField'] All the noise tokens are now removed from the parsed output
  • 27.
    Step 5: GroupingTokens field_property = Group(key + colon + value) field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline)
  • 28.
    Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', [['label', 'Username'], ['size', '25']]], ['email', 'EmailField', [['size', '25']]], ['password', 'PasswordField',[]]] Related tokens are now grouped together in a list
  • 29.
    Step 6: GiveNames to Tokens form_name = word.setResultsName("form_name") field = Group(field_name + colon + field_type + Group(Optional(arrow + OneOrMore(field_property))) + newline).setResultsName("form_field")
  • 30.
    Output > parsed_form =form.parseString(input_form) > print parsed_form.form_name UserForm > print parsed_form.fields[1].field_type EmailField Now we can refer to parsed tokens by name
  • 31.
    Step 7: ConvertProperties to Dict def convert_prop_to_dict(tokens): prop_dict = {} for token in tokens: prop_dict[token.property_key] = token.property_value return prop_dict field = Group(field_name + colon + field_type + Optional(arrow + OneOrMore(field_property)) .setParseAction(convert_prop_to_dict) + newline).setResultsName("form_field")
  • 32.
    Output > print form.parseString(input_form) ['UserForm', ['name', 'CharField', {'size': '25', 'label': 'Username'}], ['email', 'EmailField', {'size': '32'}], ['password', 'PasswordField', {}] ] Sweet! The field properties are parsed into a dict
  • 33.
    Step 7: GenerateHTML Output We need to walk through the parsed form and generate a html string out of it
  • 34.
    def get_field_html(field): properties = field[2] label = properties["label"] if "label" in properties else field.field_name label_html = "<label>" + label + "</label>" attributes = {"name":field.field_name} attributes.update(properties) if field.field_type == "CharField" or field.field_type == "EmailField": attributes["type"] = "text" else: attributes["type"] = "password" if "label" in attributes: del attributes["label"] attributes_html = " ".join([name+"='"+value+"'" for name,value in attributes.items()]) field_html = "<input " + attributes_html + "/>" return label_html + field_html + "<br/>" def render(form): fields_html = "".join([get_field_html(field) for field in form.fields]) return "<form id='" + form.form_name.lower() +"'>" + fields_html + "</form>"
  • 35.
    Output > print render(form.parseString(input_form)) <formid='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
  • 36.
    It works, but.... Yuck! The output rendering code is an UGLY MESS
  • 37.
    Wish we coulddo this... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Neat, clean syntax that matches the output domain well. But how do we create this kind of syntax?
  • 38.
    Lets create anInternal DSL
  • 39.
    class HtmlElement(object): default_attributes = {} tag = "unknown_tag" def __init__(self, *args, **kwargs): self.attributes = kwargs self.attributes.update(self.default_attributes) self.children = args def __str__(self): attribute_html = " ".join(["{}='{}'".format(name, value) for name,value in self.attributes.items()]) if not self.children: return "<{} {}/>".format(self.tag, attribute_html) else: children_html = "".join([str(child) for child in self.children]) return "<{} {}>{}</{}>".format(self.tag, attribute_html, children_html, self.tag)
  • 40.
    > print HtmlElement(id=”test”) <unknown_tagid='test'/> > print HtmlElement(HtmlElement(name=”test”), id=”id”) <unknown_tag id='id'><unknown_tag name='test'/></unknown_tag>
  • 41.
    class Input(HtmlElement): tag = "input" def __init__(self, *args, **kwargs): HtmlElement.__init__(self, *args, **kwargs) self.label = self.attributes["label"] if "label" in self.attributes else self.attributes["name"] if "label" in self.attributes: del self.attributes["label"] def __str__(self): label_html = "<label>{}</label>".format(self.label) return label_html + HtmlElement.__str__(self) + "<br/>"
  • 42.
    > print InputElement(name=”username”) <label>username</label><inputname='username'/><br/> > print InputElement(name=”username”, label=”User ID”) <label>User ID</label><input name='username'/><br/>
  • 43.
    class Form(HtmlElement): tag = "form" class CharField(Input): default_attributes = {"type":"text"} class EmailField(CharField): pass class PasswordField(Input): default_attributes = {"type":"password"}
  • 44.
    Now... > print Form(CharField(name=”user”,size=”25”,label=”ID”), id=”myform”) <form id='myform'> <label>ID</label> <input type='text' name='name' size='25'/><br/> </form> Nice!
  • 45.
    Step 7 Revisited:Output HTML def render(form): field_dict = {"CharField": CharField, "EmailField": EmailField, "PasswordField": PasswordField} fields = [field_dict[field.field_type] (name=field.field_name, **field[2]) for field in form.fields] return Form(*fields, id=form.form_name.lower()) Now our output code uses our Internal DSL!
  • 46.
    INPUT UserForm name:CharField -> label:Usernamesize:25 email:EmailField -> size:32 password:PasswordField OUTPUT <form id='userform'> <label>Username</label> <input type='text' name='name' size='25'/><br/> <label>email</label> <input type='text' name='email' size='32'/><br/> <label>password</label> <input type='password' name='password'/><br/> </form>
  • 47.
    Get the wholecode http://bit.ly/pyconindia_dsl
  • 48.
    Summary + DSLs makeyour code easier to read + DSLs make your code easier to write + DSLs make it easy to for non-programmers to maintain code + PyParsing makes is easy to write External DSLs + Python makes it easy to write Internal DSLs