Sandboxing HTML pages with API wrappers as a countermeasure to malicious 3rd party code attacks. A lesson learned. Stefano Di Paola, CTO & Chief Scientist Minded Security
$ Whoami  Security Since 1999  OWASP Italy Director of Research  Server Side (HPP, Expression Language Injection...)  Client Side (UXSS, SWFIntruder, DOMinator)  Work @ Minded Security  CTO & Chief Scientist
Agenda  Intro  The Problem  Attempts  Drawbacks  Conclusion
Browser Features History Same Origin Policy 1995 XMLHttpRequest 2002
Subverting SOP – Traditional way Solution is easy Encode ALL dangerous inputs to HTML Entities, right? Not quite. <html>.. <script>evilJs</script> ..</html> taintedInput=<script>evilJs</script>
The Problem  Browser SOP  Industry needs to:  Safely Allow external 3rd Party code (Advertising et al)  Let users customize pages (Facebook et al.)
Industry Needs - Server Side Filtering  Social portals want their users to be free to customize their home page.  SOP is too loose.  Solution?  Create a server side filters that allow only a HTML subset It's 2005 MySpace is the most used Social Site 40 Millions of unique Visitors
Server Side HTML Filtering - MySpace  MySpace Approach:  Whitelisted Tags/Attributes  only img,embed,div  (blocked <script>,on* etc )  Style Allowed? Yes  Word blacklisting:  E.g.: Javascript
MySpace Worm – Samy is my hero  Samy bypasses:  JavaScript: stripped from EVERYWHERE. Replaced with '..' http://namb.la/popular/tech.html
Server Side HTML Filtering - OWASP Anti Samy  Attempts to solve the MySpace approach on the server side.  Author: Arshan Dabirsiaghi  Translates HTML to Well Formed XML  Whitelist of Tags and Attributes  Everything else is encoded  One bypass so far (usual problem) – Fixed: <![CDATA[ .. ]]> ← AntiSamy Expects this, but ... <![CDATA[ .. ]> ← works on every browser. https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project
Server Side JS/HTML Filter - Caja  ~2008  Author: Google  The 3rd party code problem Aka Sandboxed Advertising [...The Caja Compiler is a tool for making third party HTML, CSS and JavaScript safe to embed in your website. ..] https://developers.google.com/caja/docs/about/
Google Attempt – Server Side Filter Takes JS/ES5 strict mode, HTML, and CSS input and rewrites it into: * safe subset of HTML & CSS * JavaScript function with no free variables. (Code Rewrite)
Google Caja – Some Bypass  2009 Issues:  Arbitrary code execution via DOM wrappers.  Flaw in JSON parsing.  JavaScript URLs in style attributes not sanitized (Samy anyone?).  2013:  JavaScript parsers differ on whether they interpret escaped sequences of letters spelling a reserved word, such as "deu006Cete", as an identifier or a reserved word. https://github.com/google/caja/wiki/SecurityAdvisories
Server Side Filtering – Lession Learned  Negative security models are error prone  Different Browsers behave differently:  Hard to make general assumption  Browsers do not always strictly implement RFC and siblings  Filtering something that is going to be parsed by a different parser is hard (Models Impedance Mismatch)
Client side HTML Filtering - HTML Data Bindings  2007 HTML Data Binding – Author: Stefano Di Paola  Tries to overcome the server side problem  HTML Sanitizer using JS + SQL Prepared Statements approach  and uses the browser native parser. http://www.wisec.it/sectou.php?id=46c5843ea4900 // URL Native Parsing var a = document.createElement('A'); return a.protocol var el=document.getElementById('someid') var val = document.createTextNode(binding['id']) el.appendChild( val );
HTML Data Bindings - Bypass  Variable Width Charset could mess up every tag <plaintext> tag! Leading to bypass the binding area (Half-Fixed)  (Fixed...sort of...)  Had to face browser complexity + some wrong assumption!  Anyway, it was just a proof of concept.. :) http://www.wisec.it/ph/test.php
Client Side JS Filtering - JSReg.*  ~2010  Author: Gareth Heyes  JS Sandbox. Uses a JS Tokenizer based on complex  RegExp.  Code: https://code.google.com/p/jsreg/source/browse/trunk/JSReg/JSReg.js
Client Side Filtering - JSReg.*  Code Rewriting Approach + Sandboxed Checks http://www.businessinfo.co.uk/labs/jsreg/jsreg.html https://code.google.com/p/jsreg/
JSReg.* – Bypasses 2010-2011 (/[/]/)[/(/))/+alert(top)+"/"/i] .. first is the failure to strip the single line comment which then fools the regex rule into thinking that the code is a regex object and not function calls .. http://marc.info/?l=websecurity&m=126855547523766 http://www.thespanner.co.uk/2010/10/31/jsreg-bypasses/
Client Side JS Filtering - MentalJS  2011  Author: Gareth Heyes  ES5 JS Sandbox Parser based No RegExp this time  Rewrites the JS code. https://github.com/hackvertor/MentalJS
MentalJS - Bypasses  2014-2015  whitelisted attribute innerHTML for Script  insertBefore with null/undefined, bypasses unexpected Browser behavior.  Other very interesting bypasses: x=document.createElement('script'); x.innerHTML='alert(location)'; document.body.appendChild(x); s=document.createElement('script'); s.insertBefore(document.createTextNode('alert(location)'),null); document.body.appendChild(s); http://www.thespanner.co.uk/2015/05/03/how-i-smashed-mentaljs/
Client Side JS Filtering – Evel (Secure Eval)  2013  Author: Nathan Vander Wilt  JS Sandbox using Browser Parser and Environment redefinition (runtime sandbox). https://github.com/natevw/evel/
Evel - Bypasses 2013  All of them defeated it by trying to access the window object.  If one reach Function() this object is always the window.  Function is the constructor of all functions. http://perfectionkills.com/global-eval-what-are-the-options/#indirect_eval_call_theory
Client Side HTML Filtering - DOMPurify  2014  Author: Mario Heiderich  Uses internal browser parser to create a DOM model and then sanitize untrusted HTML: https://github.com/cure53/DOMPurify
Client Side HTML Filtering - DOMPurify  Sanitize over a whitelist of tags and attributes:  Uses JS to access the DOM:  As seen in Data Binding there are several browsers subtelties. https://github.com/cure53/DOMPurify
DOMPurify – Bypasses  DOM Clobbering checks bypass:  Attack: oroush.secproject.com/blog/2014/04/how-did-i-bypass-everything-in-modsecurity-evasion-c http://www.thespanner.co.uk/2013/05/16/dom-clobbering/
AngularJS - Sandbox  2013 -  Author: Google
AngularJS – Bypasses  Blacklisted Functions call prevention bypass: https://code.google.com/p/mustache-security/wiki/AngularJS
JS Sandbox Approach - Attacks  Fool the parser into a wrong state (Impedance Mismatch) (different parsers) – code rewrite  Fool the sandbox via unexpected Browser behavior. Eg. DOM Clobbering et al.  Access unsanitized members (constructor, prototype etc)
HTML Sandbox Approach - Attacks  Fool the parser into a wrong state (Impedance Mismatch) (different parsers) – different parser  Fool the native parser (bad assumptions) as browsers 5+ parsers.  Eg: is createHTMLDocument implemented as expected?  Fool the sandbox via unexpected Browser DOM behavior.  Eg: are attribute names and values correctly normalized?  Access unsanitized members (constructor, prototype etc)
Client Side – Lession Learned  It may really be the right direction but:  JS: Code Rewrite it's hard in the JS context as it requires a separate parser.  HTML: Using browsers parsers allows to automatically identify, with no particular effort, the right context , at the right time, with the right charset  HTML: Browser still have 5+ (!!) parsers (HTML, URL,CSS,JS,SVG,...). Need to apply them at the right time!  HTML + JS: Intricate echosystem (DOM Clobbering)
Disclaimer  The bypasses of the sandboxes where all fixed  Not all sandboxes are still maintained, but most of them are.  It's still unproven that the proposed solutions are completely safe  ..but without breakers feedback, no bypass would have been found.( The world needs good brains to break things – and the fix them!)  Authors are brave and smart people trying to solve a complex problem with passion and reasoning
Conclusions  The filtering approach to sanitize/sandbox untrusted code is a hard problem  Several attempts during the years have been made  Using a different layer to sanitize code that'll be interpreted in complex environment is usually a bad idea. (Eg. Server Side → Client Side)  A fully functional, unbreakable solution is yet to be released  Browser vendors and Sandbox builders should join together to solve the problem
Questions? Stefano Di Paola Mail: Stefano.dipaola@mindedsecurity.com Twitter: @WisecWisec Blog: blog.mindedsecurity.com Site: www.mindedsecurity.com Thanks!  

Sandboxing JS and HTML. A lession Learned

  • 1.
    Sandboxing HTML pageswith API wrappers as a countermeasure to malicious 3rd party code attacks. A lesson learned. Stefano Di Paola, CTO & Chief Scientist Minded Security
  • 2.
    $ Whoami  Security Since1999  OWASP Italy Director of Research  Server Side (HPP, Expression Language Injection...)  Client Side (UXSS, SWFIntruder, DOMinator)  Work @ Minded Security  CTO & Chief Scientist
  • 3.
  • 4.
    Browser Features History SameOrigin Policy 1995 XMLHttpRequest 2002
  • 6.
    Subverting SOP –Traditional way Solution is easy Encode ALL dangerous inputs to HTML Entities, right? Not quite. <html>.. <script>evilJs</script> ..</html> taintedInput=<script>evilJs</script>
  • 7.
    The Problem  Browser SOP  Industryneeds to:  Safely Allow external 3rd Party code (Advertising et al)  Let users customize pages (Facebook et al.)
  • 8.
    Industry Needs -Server Side Filtering  Social portals want their users to be free to customize their home page.  SOP is too loose.  Solution?  Create a server side filters that allow only a HTML subset It's 2005 MySpace is the most used Social Site 40 Millions of unique Visitors
  • 9.
    Server Side HTMLFiltering - MySpace  MySpace Approach:  Whitelisted Tags/Attributes  only img,embed,div  (blocked <script>,on* etc )  Style Allowed? Yes  Word blacklisting:  E.g.: Javascript
  • 10.
    MySpace Worm –Samy is my hero  Samy bypasses:  JavaScript: stripped from EVERYWHERE. Replaced with '..' http://namb.la/popular/tech.html
  • 11.
    Server Side HTMLFiltering - OWASP Anti Samy  Attempts to solve the MySpace approach on the server side.  Author: Arshan Dabirsiaghi  Translates HTML to Well Formed XML  Whitelist of Tags and Attributes  Everything else is encoded  One bypass so far (usual problem) – Fixed: <![CDATA[ .. ]]> ← AntiSamy Expects this, but ... <![CDATA[ .. ]> ← works on every browser. https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project
  • 12.
    Server Side JS/HTMLFilter - Caja  ~2008  Author: Google  The 3rd party code problem Aka Sandboxed Advertising [...The Caja Compiler is a tool for making third party HTML, CSS and JavaScript safe to embed in your website. ..] https://developers.google.com/caja/docs/about/
  • 13.
    Google Attempt –Server Side Filter Takes JS/ES5 strict mode, HTML, and CSS input and rewrites it into: * safe subset of HTML & CSS * JavaScript function with no free variables. (Code Rewrite)
  • 14.
    Google Caja –Some Bypass  2009 Issues:  Arbitrary code execution via DOM wrappers.  Flaw in JSON parsing.  JavaScript URLs in style attributes not sanitized (Samy anyone?).  2013:  JavaScript parsers differ on whether they interpret escaped sequences of letters spelling a reserved word, such as "deu006Cete", as an identifier or a reserved word. https://github.com/google/caja/wiki/SecurityAdvisories
  • 15.
    Server Side Filtering– Lession Learned  Negative security models are error prone  Different Browsers behave differently:  Hard to make general assumption  Browsers do not always strictly implement RFC and siblings  Filtering something that is going to be parsed by a different parser is hard (Models Impedance Mismatch)
  • 16.
    Client side HTMLFiltering - HTML Data Bindings  2007 HTML Data Binding – Author: Stefano Di Paola  Tries to overcome the server side problem  HTML Sanitizer using JS + SQL Prepared Statements approach  and uses the browser native parser. http://www.wisec.it/sectou.php?id=46c5843ea4900 // URL Native Parsing var a = document.createElement('A'); return a.protocol var el=document.getElementById('someid') var val = document.createTextNode(binding['id']) el.appendChild( val );
  • 17.
    HTML Data Bindings- Bypass  Variable Width Charset could mess up every tag <plaintext> tag! Leading to bypass the binding area (Half-Fixed)  (Fixed...sort of...)  Had to face browser complexity + some wrong assumption!  Anyway, it was just a proof of concept.. :) http://www.wisec.it/ph/test.php
  • 18.
    Client Side JSFiltering - JSReg.*  ~2010  Author: Gareth Heyes  JS Sandbox. Uses a JS Tokenizer based on complex  RegExp.  Code: https://code.google.com/p/jsreg/source/browse/trunk/JSReg/JSReg.js
  • 19.
    Client Side Filtering- JSReg.*  Code Rewriting Approach + Sandboxed Checks http://www.businessinfo.co.uk/labs/jsreg/jsreg.html https://code.google.com/p/jsreg/
  • 20.
    JSReg.* – Bypasses 2010-2011 (/[/]/)[/(/))/+alert(top)+"/"/i] ..first is the failure to strip the single line comment which then fools the regex rule into thinking that the code is a regex object and not function calls .. http://marc.info/?l=websecurity&m=126855547523766 http://www.thespanner.co.uk/2010/10/31/jsreg-bypasses/
  • 21.
    Client Side JSFiltering - MentalJS  2011  Author: Gareth Heyes  ES5 JS Sandbox Parser based No RegExp this time  Rewrites the JS code. https://github.com/hackvertor/MentalJS
  • 22.
    MentalJS - Bypasses  2014-2015  whitelistedattribute innerHTML for Script  insertBefore with null/undefined, bypasses unexpected Browser behavior.  Other very interesting bypasses: x=document.createElement('script'); x.innerHTML='alert(location)'; document.body.appendChild(x); s=document.createElement('script'); s.insertBefore(document.createTextNode('alert(location)'),null); document.body.appendChild(s); http://www.thespanner.co.uk/2015/05/03/how-i-smashed-mentaljs/
  • 23.
    Client Side JSFiltering – Evel (Secure Eval)  2013  Author: Nathan Vander Wilt  JS Sandbox using Browser Parser and Environment redefinition (runtime sandbox). https://github.com/natevw/evel/
  • 24.
    Evel - Bypasses 2013  Allof them defeated it by trying to access the window object.  If one reach Function() this object is always the window.  Function is the constructor of all functions. http://perfectionkills.com/global-eval-what-are-the-options/#indirect_eval_call_theory
  • 25.
    Client Side HTMLFiltering - DOMPurify  2014  Author: Mario Heiderich  Uses internal browser parser to create a DOM model and then sanitize untrusted HTML: https://github.com/cure53/DOMPurify
  • 26.
    Client Side HTMLFiltering - DOMPurify  Sanitize over a whitelist of tags and attributes:  Uses JS to access the DOM:  As seen in Data Binding there are several browsers subtelties. https://github.com/cure53/DOMPurify
  • 27.
    DOMPurify – Bypasses  DOMClobbering checks bypass:  Attack: oroush.secproject.com/blog/2014/04/how-did-i-bypass-everything-in-modsecurity-evasion-c http://www.thespanner.co.uk/2013/05/16/dom-clobbering/
  • 29.
    AngularJS - Sandbox  2013-  Author: Google
  • 30.
    AngularJS – Bypasses  BlacklistedFunctions call prevention bypass: https://code.google.com/p/mustache-security/wiki/AngularJS
  • 31.
    JS Sandbox Approach- Attacks  Fool the parser into a wrong state (Impedance Mismatch) (different parsers) – code rewrite  Fool the sandbox via unexpected Browser behavior. Eg. DOM Clobbering et al.  Access unsanitized members (constructor, prototype etc)
  • 32.
    HTML Sandbox Approach- Attacks  Fool the parser into a wrong state (Impedance Mismatch) (different parsers) – different parser  Fool the native parser (bad assumptions) as browsers 5+ parsers.  Eg: is createHTMLDocument implemented as expected?  Fool the sandbox via unexpected Browser DOM behavior.  Eg: are attribute names and values correctly normalized?  Access unsanitized members (constructor, prototype etc)
  • 33.
    Client Side –Lession Learned  It may really be the right direction but:  JS: Code Rewrite it's hard in the JS context as it requires a separate parser.  HTML: Using browsers parsers allows to automatically identify, with no particular effort, the right context , at the right time, with the right charset  HTML: Browser still have 5+ (!!) parsers (HTML, URL,CSS,JS,SVG,...). Need to apply them at the right time!  HTML + JS: Intricate echosystem (DOM Clobbering)
  • 34.
    Disclaimer  The bypasses ofthe sandboxes where all fixed  Not all sandboxes are still maintained, but most of them are.  It's still unproven that the proposed solutions are completely safe  ..but without breakers feedback, no bypass would have been found.( The world needs good brains to break things – and the fix them!)  Authors are brave and smart people trying to solve a complex problem with passion and reasoning
  • 35.
    Conclusions  The filtering approachto sanitize/sandbox untrusted code is a hard problem  Several attempts during the years have been made  Using a different layer to sanitize code that'll be interpreted in complex environment is usually a bad idea. (Eg. Server Side → Client Side)  A fully functional, unbreakable solution is yet to be released  Browser vendors and Sandbox builders should join together to solve the problem
  • 36.
    Questions? Stefano Di Paola Mail:Stefano.dipaola@mindedsecurity.com Twitter: @WisecWisec Blog: blog.mindedsecurity.com Site: www.mindedsecurity.com Thanks!