Skip to content

incorrect matching of strings that contain unicode #3

@mcmtroffaes

Description

@mcmtroffaes
import Text.Regex.PCRE main :: IO () main = do putStrLn ("R X" =~ "[^ ]+") putStrLn ("ℝ X" =~ "[^ ]+")

returns

R ℝ X 

The first is correct, but the second is wrong, it should only return .

I tried fixing this, and I think (but cannot confirm) that SUPPORT_UCP, SUPPORT_UTF, and perhaps also SUPPORT_PCRE8 need to be defined in pcre/config.h to ensure that pcre is compiled with unicode support. (Unfortunately, haskell complained about an unknown symbol when I tried compiling with these options, and I failed to track down the exact cause of this problem.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions