Skip to content

Spark implementation that takes a set of texts and use genetic programming to discover a regex for grok that will match other similar texts

License

Notifications You must be signed in to change notification settings

chen0040/spark-ml-regex-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-ml-regex-generator

Spark implementation that takes a set of texts and use genetic programming which discover regex for grok that will match other similar texts

Install

Add the following dependency to your POM file:

<dependency> <groupId>com.github.chen0040</groupId> <artifactId>spark-ml-regex-generator</artifactId> <version>1.0.1</version> </dependency>

Usage

The sample code below shows how the gp regex cultivator discover the regex for the message "":

GpCultivator generator = new GpCultivator(); generator.setDisplayEvery(2); generator.setPopulationSize(1000); generator.setMaxGenerations(50); List<String> trainingData = new ArrayList<>(); trainingData.add("user root login at 127.0.0.1"); JavaSparkContext context = SparkContextFactory.createSparkContext("testing-1"); Grok generated_grok = generator.fit(context.parallelize(trainingData)); System.out.println("user root login at 127.0.0.1"); System.out.println(generator.getRegex()); // this is the regex generated Match matched = generated_grok.match("user root login at 127.0.0.1"); matched.captures(); System.out.println(matched.toJson());

Below is the print out from the sample code above:

... Generation: 4 (Pop: 1000), elapsed: 3 seconds Global Cost: 0.2	Current Cost: 0.2 ... Global Cost: 0.14285714285714285	Current Cost: 0.16666666666666666 user root login at 127.0.0.1 %{LOGLEVEL} %{USER} %{URIPROTO} %{URIHOST} %{IPV4} {"IPORHOST":"at","IPV4":"127.0.0.1","LOGLEVEL":"er","URIHOST":"at","URIPROTO":"login","USER":"root"}

About

Spark implementation that takes a set of texts and use genetic programming to discover a regex for grok that will match other similar texts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published