Skip to content

Commit e20ad25

Browse files
committed
Merge pull request #5 from tom-lord/random_example
Regexp#random_example added
2 parents 5e2850e + 9196766 commit e20ad25

File tree

9 files changed

+155
-76
lines changed

9 files changed

+155
-76
lines changed

README.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
[![Build Status](https://travis-ci.org/tom-lord/regexp-examples.svg?branch=master)](https://travis-ci.org/tom-lord/regexp-examples/builds)
44
[![Coverage Status](https://coveralls.io/repos/tom-lord/regexp-examples/badge.svg?branch=master)](https://coveralls.io/r/tom-lord/regexp-examples?branch=master)
55

6-
Extends the Regexp class with the method: Regexp#examples
6+
Extends the Regexp class with the methods: `Regexp#examples` and `Regexp#random_example`
77

8-
This method generates a list of (some\*) strings that will match the given regular expression.
8+
`Regexp#examples` generates a list of all\* strings that will match the given regular expression.
9+
10+
`Regexp#random_example` returns one, random string (from all possible strings!!) that matches the regex.
911

1012
\* If the regex has an infinite number of possible srings that match it, such as `/a*b+c{2,}/`,
1113
or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.
@@ -31,6 +33,14 @@ For more detail on this, see [configuration options](#configuration-options).
3133
|
3234
\u{28}\u2310\u25a0\u{5f}\u25a0\u{29}
3335
/x.examples #=> ["(•_•)", "( •_•)>⌐■-■ ", "(⌐■_■)"]
36+
37+
###################################################################################
38+
39+
# Obviously, you will get different results if you try these yourself!
40+
/\w{10}@(hotmail|gmail)\.com/.random_example #=> "TTsJsiwzKS@gmail.com"
41+
/\p{Greek}{80}/.random_example
42+
#=> "ΖΆΧͷᵦμͷηϒϰΟᵝΔ΄θϔζΌψΨεκᴪΓΕπι϶ονϵΓϹᵦΟπᵡήϴϜΦϚϴϑ͵ϴΉϺ͵ϹϰϡᵠϝΤΏΨϹϊϻαώΞΰϰΑͼΈΘͽϙͽξΆΆΡΡΉΓς"
43+
/written by tom lord/i.random_example #=> "WrITtEN bY tOM LORD"
3444
```
3545

3646
## Installation
@@ -51,7 +61,7 @@ Or install it yourself as:
5161

5262
## Supported syntax
5363

54-
Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax)
64+
Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax).
5565

5666
Long answer:
5767

@@ -89,7 +99,7 @@ Long answer:
8999
## Bugs and Not-Yet-Supported syntax
90100

91101
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...)
92-
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.0 (see the pending pull request)!
102+
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.1 (see the pending pull request)!
93103

94104
Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
95105
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
@@ -127,33 +137,38 @@ When generating examples, the gem uses 2 configurable values to limit how many e
127137
* `[h-s]` is equivalent to `[hijkl]`
128138
* `(1|2|3|4|5|6|7|8)` is equivalent to `[12345]`
129139

140+
Rexexp#examples makes use of *both* these options; Rexexp#random_example only uses `max_repeater_variance`, since the other option is redundant!
141+
130142
To use an alternative value, simply pass the configuration option as follows:
131143

132144
```ruby
133145
/a*/.examples(max_repeater_variance: 5)
134146
#=> [''. 'a', 'aa', 'aaa', 'aaaa' 'aaaaa']
135147
/[F-X]/.examples(max_group_results: 10)
136148
#=> ['F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
149+
/.*/.random_example(max_repeater_variance: 50)
150+
#=> "A very unlikely result!"
137151
```
138152

139-
_**WARNING**: Choosing huge numbers, along with a "complex" regex, could easily cause your system to freeze!_
153+
_**WARNING**: Choosing huge numbers for `Regexp#examples`, along with a "complex" regex, could easily cause your system to freeze!_
140154

141155
For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _trillion_) examples!
142156

143157
In other words, think twice before playing around with this config!
144158

145-
A more sensible use case might be, for example, to generate one random 1-4 digit string:
159+
A more sensible use case might be, for example, to generate all 1-4 digit strings:
160+
161+
`/\d{1,4}/.examples(max_repeater_variance: 3, max_group_results: 10)`
146162

147-
`/\d{1,4}/.examples(max_repeater_variance: 3, max_group_results: 10).sample(1)`
163+
Due to code optimisation, this is not something you need to worry about (much) for `Regexp#random_example`. For instance, the following takes no more than ~ 1 second on my machine:
148164

149-
(Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
165+
`/.*\w+\d{100}/.random_example(max_repeater_variance: 1000)`
150166

151167
## TODO
152168

153169
* Performance improvements:
154170
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time. See the pending pull request.
155171
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes".
156-
* Potential future feature: `Regexp#random_example` - but implementing this properly is non-trivial, due to performance issues that need addressing first!
157172
* Write a blog post about how this amazing gem works! :)
158173

159174
## Contributing

lib/regexp-examples/constants.rb

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ class ResultCountLimiters
1717

1818
class << self
1919
attr_reader :max_repeater_variance, :max_group_results
20-
def configure!(max_repeater_variance, max_group_results)
20+
def configure!(max_repeater_variance, max_group_results = nil)
2121
@max_repeater_variance = (max_repeater_variance || MaxRepeaterVarianceDefault)
2222
@max_group_results = (max_group_results || MaxGroupResultsDefault)
2323
end
@@ -44,7 +44,8 @@ module CharSets
4444
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"]
4545
Control = (0..31).map(&:chr) | ["\x7f"]
4646
# Ensure that the "common" characters appear first in the array
47-
Any = Lower | Upper | Digit | Punct | (0..127).map(&:chr)
47+
# Also, ensure "\n" comes first, to make it obvious when included
48+
Any = ["\n"] | Lower | Upper | Digit | Punct | (0..127).map(&:chr)
4849
AnyNoNewLine = Any - ["\n"]
4950
end.freeze
5051

lib/regexp-examples/core_extensions/regexp/examples.rb

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,29 @@
11
module CoreExtensions
22
module Regexp
33
module Examples
4-
def examples(config_options={})
5-
full_examples = RegexpExamples.map_results(
6-
RegexpExamples::Parser.new(source, options, config_options).parse
4+
def examples(**config_options)
5+
RegexpExamples::ResultCountLimiters.configure!(
6+
config_options[:max_repeater_variance],
7+
config_options[:max_group_results]
78
)
8-
RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
9+
examples_by_method(:map_results)
10+
end
11+
12+
def random_example(**config_options)
13+
RegexpExamples::ResultCountLimiters.configure!(
14+
config_options[:max_repeater_variance]
15+
)
16+
examples_by_method(:map_random_result).first
917
end
18+
19+
private
20+
def examples_by_method(method)
21+
full_examples = RegexpExamples.public_send(
22+
method,
23+
RegexpExamples::Parser.new(source, options).parse
24+
)
25+
RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
26+
end
1027
end
1128
end
1229
end

lib/regexp-examples/groups.rb

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,14 @@ def result
3737
end
3838
end
3939

40+
module RandomResultBySample
41+
def random_result
42+
result.sample(1)
43+
end
44+
end
45+
4046
class SingleCharGroup
47+
include RandomResultBySample
4148
prepend GroupWithIgnoreCase
4249
def initialize(char, ignorecase)
4350
@char = char
@@ -48,17 +55,19 @@ def result
4855
end
4956
end
5057

51-
# Used as a workaround for when a grep is expected to be returned,
58+
# Used as a workaround for when a group is expected to be returned,
5259
# but there are no results for the group.
5360
# i.e. PlaceHolderGroup.new.result == '' == SingleCharGroup.new('').result
5461
# (But using PlaceHolderGroup makes it clearer what the intention is!)
5562
class PlaceHolderGroup
63+
include RandomResultBySample
5664
def result
5765
[GroupResult.new('')]
5866
end
5967
end
6068

6169
class CharGroup
70+
include RandomResultBySample
6271
prepend GroupWithIgnoreCase
6372
def initialize(chars, ignorecase)
6473
@chars = chars
@@ -74,6 +83,7 @@ def result
7483
end
7584

7685
class DotGroup
86+
include RandomResultBySample
7787
attr_reader :multiline
7888
def initialize(multiline)
7989
@multiline = multiline
@@ -94,37 +104,56 @@ def initialize(groups, group_id)
94104
@group_id = group_id
95105
end
96106

97-
# Generates the result of each contained group
98-
# and adds the filled group of each result to
99-
# itself
100107
def result
101-
strings = @groups.map {|repeater| repeater.result}
108+
result_by_method(:result)
109+
end
110+
111+
def random_result
112+
result_by_method(:random_result)
113+
end
114+
115+
private
116+
# Generates the result of each contained group
117+
# and adds the filled group of each result to itself
118+
def result_by_method(method)
119+
strings = @groups.map {|repeater| repeater.public_send(method)}
102120
RegexpExamples.permutations_of_strings(strings).map do |result|
103121
GroupResult.new(result, group_id)
104122
end
105123
end
106124
end
107125

108-
class MultiGroupEnd
109-
end
110-
111126
class OrGroup
112127
def initialize(left_repeaters, right_repeaters)
113128
@left_repeaters = left_repeaters
114129
@right_repeaters = right_repeaters
115130
end
116131

117-
118132
def result
119-
left_result = RegexpExamples.map_results(@left_repeaters)
120-
right_result = RegexpExamples.map_results(@right_repeaters)
133+
result_by_method(:map_results)
134+
end
135+
136+
def random_result
137+
# TODO: This logic is flawed in terms of choosing a truly "random" example!
138+
# E.g. /a|b|c|d/.random_example will choose a letter with the following probabilities:
139+
# a = 50%, b = 25%, c = 12.5%, d = 12.5%
140+
# In order to fix this, I must either apply some weighted selection logic,
141+
# or change how the OrGroup examples are generated - i.e. make this class work with >2 repeaters
142+
result_by_method(:map_random_result).sample(1)
143+
end
144+
145+
private
146+
def result_by_method(method)
147+
left_result = RegexpExamples.public_send(method, @left_repeaters)
148+
right_result = RegexpExamples.public_send(method, @right_repeaters)
121149
left_result.concat(right_result).flatten.uniq.map do |result|
122150
GroupResult.new(result)
123151
end
124152
end
125153
end
126154

127155
class BackReferenceGroup
156+
include RandomResultBySample
128157
attr_reader :id
129158
def initialize(id)
130159
@id = id

lib/regexp-examples/helpers.rb

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
module RegexpExamples
2-
# Given an array of arrays of strings,
3-
# returns all possible perutations,
4-
# for strings created by joining one
5-
# element from each array
2+
# Given an array of arrays of strings, returns all possible perutations
3+
# for strings, created by joining one element from each array
64
#
75
# For example:
86
# permutations_of_strings [ ['a'], ['b'], ['c', 'd', 'e'] ] #=> ['abc', 'abd', 'abe']
@@ -29,8 +27,17 @@ def self.join_preserving_capture_groups(result)
2927
end
3028

3129
def self.map_results(repeaters)
30+
generic_map_result(repeaters, :result)
31+
end
32+
33+
def self.map_random_result(repeaters)
34+
generic_map_result(repeaters, :random_result)
35+
end
36+
37+
private
38+
def self.generic_map_result(repeaters, method)
3239
repeaters
33-
.map {|repeater| repeater.result}
40+
.map {|repeater| repeater.public_send(method)}
3441
.instance_eval do |partial_results|
3542
RegexpExamples.permutations_of_strings(partial_results)
3643
end

lib/regexp-examples/parser.rb

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,19 @@ module RegexpExamples
22
IllegalSyntaxError = Class.new(StandardError)
33
class Parser
44
attr_reader :regexp_string
5-
def initialize(regexp_string, regexp_options, config_options={})
5+
def initialize(regexp_string, regexp_options)
66
@regexp_string = regexp_string
77
@ignorecase = !(regexp_options & Regexp::IGNORECASE).zero?
88
@multiline = !(regexp_options & Regexp::MULTILINE).zero?
99
@extended = !(regexp_options & Regexp::EXTENDED).zero?
1010
@num_groups = 0
1111
@current_position = 0
12-
ResultCountLimiters.configure!(
13-
config_options[:max_repeater_variance],
14-
config_options[:max_group_results]
15-
)
1612
end
1713

1814
def parse
1915
repeaters = []
20-
while @current_position < regexp_string.length
16+
until end_of_regexp
2117
group = parse_group(repeaters)
22-
break if group.is_a? MultiGroupEnd
2318
if group.is_a? OrGroup
2419
return [OneTimeRepeater.new(group)]
2520
end
@@ -35,8 +30,6 @@ def parse_group(repeaters)
3530
case next_char
3631
when '('
3732
group = parse_multi_group
38-
when ')'
39-
group = parse_multi_end_group
4033
when '['
4134
group = parse_char_group
4235
when '.'
@@ -241,10 +234,6 @@ def regexp_options_toggle(on, off)
241234
@extended = false if (off.include? "x")
242235
end
243236

244-
def parse_multi_end_group
245-
MultiGroupEnd.new
246-
end
247-
248237
def parse_char_group
249238
@current_position += 1 # Skip past opening "["
250239
chargroup_parser = ChargroupParser.new(rest_of_string)
@@ -345,6 +334,10 @@ def rest_of_string
345334
def next_char
346335
regexp_string[@current_position]
347336
end
337+
338+
def end_of_regexp
339+
next_char == ")" || @current_position >= regexp_string.length
340+
end
348341
end
349342
end
350343

0 commit comments

Comments
 (0)