Skip to content

Conversation

@pgomulka
Copy link
Contributor

in https://community.sonarsource.com/t/blog-post-regular-expressions-present-challenges-even-for-not-so-regular-developers/38304
an author pointed out that Elasticsearch has a bug in its tchar regex.

_

So now it should take you only one second to find a bug in this Elasticsearch code which is commented "defined by RFC7230 section 3.2.6" for this expression:

Pattern.compile("[a-zA-z0-9!#$%&'*+\\-.\\^_`|~]+"); 

Source (RestRequest.java)

_

and indeed the A-z is a mistake and includes acciental [, ], / characters

This commits fixes this.

Just to explain why we validate this way:
the correct value of a content-type is a Media type https://www.rfc-editor.org/rfc/rfc7231#section-3.1.1.5
a media-type is defined in https://www.rfc-editor.org/rfc/rfc7231#section-3.1.1.1 and uses token
and a token is defined in https://www.rfc-editor.org/rfc/rfc7230#section-3.2.6
and it uses tchar defined in the same RFC

@pgomulka pgomulka added >bug :Core/Infra/REST API REST infrastructure and utilities v8.9.0 labels May 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @pgomulka, I've created a changelog YAML for you.

Copy link
Contributor

@pugnascotia pugnascotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but there are other instances of A-z in the codebase. In particular, these look like they should be fixed or refactored:

libs/x-content/src/main/java/org/elasticsearch/xcontent/ParsedMediaType.java x-pack/plugin/sql/sql-proto/src/main/java/org/elasticsearch/xpack/sql/proto/content/ParsedMediaType.java 

I'm not sure about the other instances. I searched using rg -F A-z

@pgomulka
Copy link
Contributor Author

The changes look good, but there are other instances of A-z in the codebase. In particular, these look like they should be fixed or refactored:
libs/x-content/src/main/java/org/elasticsearch/xcontent/ParsedMediaType.java
x-pack/plugin/sql/sql-proto/src/main/java/org/elasticsearch/xpack/sql/proto/content/ParsedMediaType.java
I'm not sure about the other instances. I searched using rg -F A-z

good point on the ParsedMediaType. I fixed those too. It hurts a little to have this duplicated, but I guess we would have to create a utility class somewhere in x-content?core? lib with all http related rfc patterns.

I think we only want to fix a-zA-z as if someone is using A-z it means they probably want a quick way to mark all letters (plus some accidental [,],_ etc)

@pgomulka pgomulka requested a review from pugnascotia May 30, 2023 13:22
@pgomulka pgomulka merged commit 66a951e into elastic:main May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Core/Infra/REST API REST infrastructure and utilities Team:Core/Infra Meta label for core/infra team v8.9.0

3 participants