Skip to content

JoniRegularExpression does not work with character class escapes (\s, \w, etc) within character classes ([]) #1192

@tearfur

Description

@tearfur

Version: 1.5.8

You can reproduce this bug by trying to match the JSON string "e" with the pattern [\w]:

final String schemaData = """  {"$schema":"https://json-schema.org/draft/2020-12/schema","type":"string","pattern":"[\\\\w]"}"""; final JsonSchemaFactory factory = JsonSchemaFactory.builder(JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012)) .build(); final SchemaValidatorsConfig config = SchemaValidatorsConfig.builder() .regularExpressionFactory(JoniRegularExpressionFactory.getInstance()) .build(); final JsonSchema schema = factory.getSchema(schemaData, config); final String json = "\"e\""; for (ValidationMessage msg : schema.validate(json, InputFormat.JSON)) { System.out.println(msg.getMessage()); }

The example code above should not print any output, but in practice it prints does not match the regex pattern [\w].

This is an issue caused by JoniRegularExpression replacing character class escapes with its own custom character classes.

String s = regex
.replace("\\d", "[0-9]")
.replace("\\D", "[^0-9]")
.replace("\\w", "[a-zA-Z0-9_]")
.replace("\\W", "[^a-zA-Z0-9_]")
.replace("\\s", "[ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]")
.replace("\\S", "[^ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]");

So the pattern in the given example becomes [[a-zA-Z0-9_]], which does not match the JSON string "e".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions