lens-regex-pcre: A lensy interface to regular expressions

[ bsd3, library, regex ] [ Propose Tags ] [ Report a vulnerability ]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.1.0, 0.2.0.0, 0.3.0.0, 0.3.1.0, 1.0.0.0, 1.0.0.1, 1.1.0.0, 1.1.1.0, 1.1.2.0
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring, containers, lens, pcre-heavy, pcre-light (>=0.4.1.0), template-haskell, text [details]
License BSD-3-Clause
Copyright 2019 Chris Penner
Author Chris Penner
Maintainer christopher.penner@gmail.com
Category Regex
Home page https://github.com/ChrisPenner/lens-regex-pcre#readme
Bug tracker https://github.com/ChrisPenner/lens-regex-pcre/issues
Source repo head: git clone https://github.com/ChrisPenner/lens-regex-pcre
Uploaded by ChrisPenner at 2024-12-11T17:41:40Z
Distributions LTSHaskell:1.1.2.0, NixOS:1.1.2.0, Stackage:1.1.2.0
Reverse Dependencies 3 direct, 1 indirect [details]
Downloads 4315 total (42 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2024-12-11 [all 1 reports]

Readme for lens-regex-pcre-1.1.2.0

[back to package description]

lens-regex-pcre

Hackage and Docs

Based on pcre-heavy; so it should support any regexes or options which it supports.

Performance is equal, sometimes better than that of pcre-heavy alone.

Which module should you use?

If you need unicode support, use Control.Lens.Regex.Text, if not then Control.Lens.Regex.ByteString is faster.

Working with Regexes in Haskell kinda sucks; it's tough to figure out which libs to use, and even after you pick one it's tough to figure out how to use it; lens-regex-pcre hopes to replace most other solutions by being fast, easy to set up, more adaptable with a more consistent interface.

It helps that there are already HUNDREDS of combinators which interop with lenses 😄.

As it turns out; regexes are a very lens-like tool; Traversals allow you to select and alter zero or more matches; traversals can even carry indexes so you know which match or group you're working on.

Examples

import Control.Lens.Regex.Text txt :: Text txt = "raindrops on roses and whiskers on kittens" -- Search >>> has [regex|whisk|] txt True -- Get matches >>> txt ^.. [regex|\br\w+|] . match ["raindrops","roses"] -- Edit matches >>> txt & [regex|\br\w+|] . match %~ T.intersperse '-' . T.toUpper "R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens" -- Get Groups >>> txt ^.. [regex|(\w+) on (\w+)|] . groups [["raindrops","roses"],["whiskers","kittens"]] -- Edit Groups >>> txt & [regex|(\w+) on (\w+)|] . groups %~ reverse "roses on raindrops and kittens on whiskers" -- Get the third match >>> txt ^? [regex|\w+|] . index 2 . match Just "roses" -- Match integers, 'Read' them into ints, then sort them in-place -- dumping them back into the source text afterwards. >>> "Monday: 29, Tuesday: 99, Wednesday: 3" & partsOf ([regex|\d+|] . match . unpacked . _Show @Int) %~ sort "Monday: 3, Tuesday: 29, Wednesday: 99" 

Basically anything you want to do is possible somehow.

Performance

See the benchmarks.

Summary

Caveat: I'm by no means a benchmarking expert; if you have tips on how to do this better I'm all ears!

  • Search lens-regex-pcre is marginally slower than pcre-heavy, but well within acceptable margins (within 0.6%)
  • Replace lens-regex-pcre beats pcre-heavy by ~10%
  • Modify pcre-heavy doesn't support this operation at all, so I guess lens-regex-pcre wins here :)

How can it possibly be faster if it's based on pcre-heavy? lens-regex-pcre only uses pcre-heavy for finding the matches, not substitution/replacement. After that it splits the text into chunks and traverses over them with whichever operation you've chosen. The nature of this implementation makes it a lot easier to understand than imperative implementations of the same thing. This means it's pretty easy to make edits, and is also the reason we can support arbitrary traversals/actions. It was easy enough, so I went ahead and made the whole thing use ByteString Builders, which sped it up a lot. I suspect that pcre-heavy can benefit from the same optimization if anyone feels like back-porting it; it could be (almost) as nicely using simple traverse without any lenses. The whole thing is only about 25 LOC.

I'm neither a benchmarks nor stats person, so please open an issue if anything here seems fishy.

Without pcre-light and pcre-heavy this library wouldn't be possible, so huge thanks to all contributors!

Here are the benchmarks on my 2013 Macbook (2.6 Ghz i5)

benchmarking static pattern search/pcre-heavy ... took 20.78 s, total 56 iterations benchmarked static pattern search/pcre-heavy time 375.3 ms (372.0 ms .. 378.5 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 378.1 ms (376.4 ms .. 380.8 ms) std dev 3.747 ms (922.3 μs .. 5.609 ms) benchmarking static pattern search/lens-regex-pcre ... took 20.79 s, total 56 iterations benchmarked static pattern search/lens-regex-pcre time 379.5 ms (376.2 ms .. 382.4 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 377.3 ms (376.5 ms .. 378.4 ms) std dev 1.667 ms (1.075 ms .. 2.461 ms) benchmarking complex pattern search/pcre-heavy ... took 95.95 s, total 56 iterations benchmarked complex pattern search/pcre-heavy time 1.741 s (1.737 s .. 1.746 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.746 s (1.744 s .. 1.749 s) std dev 4.499 ms (3.186 ms .. 6.080 ms) benchmarking complex pattern search/lens-regex-pcre ... took 97.26 s, total 56 iterations benchmarked complex pattern search/lens-regex-pcre time 1.809 s (1.736 s .. 1.908 s) 0.996 R² (0.991 R² .. 1.000 R²) mean 1.757 s (1.742 s .. 1.810 s) std dev 42.83 ms (11.51 ms .. 70.69 ms) benchmarking simple replacement/pcre-heavy ... took 23.32 s, total 56 iterations benchmarked simple replacement/pcre-heavy time 423.8 ms (422.4 ms .. 425.3 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 424.0 ms (422.9 ms .. 426.2 ms) std dev 2.684 ms (1.239 ms .. 4.270 ms) benchmarking simple replacement/lens-regex-pcre ... took 20.84 s, total 56 iterations benchmarked simple replacement/lens-regex-pcre time 382.8 ms (374.3 ms .. 391.5 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 378.2 ms (376.3 ms .. 381.0 ms) std dev 3.794 ms (2.577 ms .. 5.418 ms) benchmarking complex replacement/pcre-heavy ... took 24.77 s, total 56 iterations benchmarked complex replacement/pcre-heavy time 448.1 ms (444.7 ms .. 450.0 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 450.8 ms (449.5 ms .. 453.9 ms) std dev 3.129 ms (947.0 μs .. 4.841 ms) benchmarking complex replacement/lens-regex-pcre ... took 21.99 s, total 56 iterations benchmarked complex replacement/lens-regex-pcre time 399.9 ms (398.4 ms .. 402.2 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 399.6 ms (399.0 ms .. 400.4 ms) std dev 1.135 ms (826.2 μs .. 1.604 ms) Benchmark lens-regex-pcre-bench: FINISH 

Behaviour

Precise Expected behaviour (and examples) can be found in the test suites: