Posted on Feb 16, 2021 • Edited on Mar 1, 2021

Regex Words, Vowels, Consonants, and Sentences in Ruby

A common algorithm used in web applications is a word counter. Although with ruby, we can use regex to get a ton of information on our text in very few lines of code. I wanted to make a readable and comprehensive guide on how awesome ruby is at handling operations like this. Feel free to look at the code and test it out on repl.it

Creating a TextAnalyzer Class

The first step is to move all of our methods that deal with regexing the string/text and finding the information on it into its own class. In this instance we are going to call it TextAnalyzer.

Our class might look a little something like this at the beginning, allowing us to initialize with a text attribute and then normalizing this all to uppercase (which will help us down the road, lowercase would also work but looks less appetizing).

class TextAnalyzer attr_reader :text def initialize(text) # upcase for ease of counting @text = text.upcase self end end

Adding Our Basic Methods

Counting Words

Since we have our text as one lengthy string, we simply need to split this text at every space and then count the size of the array we split it into. Giving us the amount of words in our string, we don't care about punctuation or digits only spaces.

Our method would look something like:

def word_count text.split(' ').size end

Counting Characters

First we need to think of our pre-requisite:

A character is any text within a string (including spaces, punctuation, digits, etc)

Since we don't care about what the actual character is, we can just split all of text after every character. Which would look like:

def chars text.split('') end # and to count would be simply def character_count chars.size end

Counting Letters

Again lets think of our pre-requisites for a letter:

Can't be punctuation
Can't be a digit
Can't be whitespace

Using a sweet tool called Rubular we can test our regex before implementing it. Using the Regex \W will select any non-word character, using String#gsub we can then replace those non-word characters, with empty strings ('').

All together it would look something like:

 def letters text.gsub(/[\W]/, '') end # a little helper method to make the string into an array def letters_array letters.split('') end # counts the letters in our long string def letter_count # don't forget to split our string into an array first! letters_array.size end

Counting Consonants and Vowels

Again lets check our pre-requisites for a vowel:

Must be A, E, I, O, or U

And for consonants:

Must not be a vowel

For both operations we can make use of the String#scan method that ruby provides. And once again utilizing the regex tool to either find all vowels, or all consonants. The regex we would use would be [ ] containing the letters we want to search for.

All together the methods would look something like:

 # counts the vowels from our #letters string def vowel_count letters.scan(/[AEIOU]/).size end # counts the consonants from our #letters string def consonant_count # adding ^ before our characters within the scan # will find anything except the given characters letters.scan(/[^AEIOU]/).size end

It is important to note that because of our operations in our letters method, we don't need to check for punctuation, digits, or whitespace.

Finding the Most Common Letter(s)

In order to find the most common occurrence of a letter, many people build a large while loop or confusing regexes to find the most common occurrence (example). Ruby is simple, expandable, and flexible and we should code that way.

In the case of a tie we wouldn't want our #most_common_letters method to return just the first or last, we want to return all of our ties and then let someone outside our method decide which item they would like to choose. (Whether its first, last, or somewhere in-between!)

 # finds the most common character from our #letters string def most_common_letters char_hash = {} letters_array.each do |c| char_hash[c] ||= 0 unless char_hash[c] char_hash[c] += 1 end # finds the highest occurrence # could use char_has.max_by {|k,v| v} to get the max and character at the same time # although we would rather return ALL in the case of a tie max = char_hash.values.max # returns all in the case of a tie char_hash.map {|k,v| {k => v} unless v < max }.compact end

I've found the best way to do this by creating a hash to keep track of unique letters as the key (since we are counting them) and then update their value by 1 every time there is a new occurrence of that letter. This will return the most common letter(s) with their count as the value.

Counting the Most Common Letter

Now that we can find our most common letters with the frequency it is up to us how we want to choose our winner in the case of a tie (remember that we are returning an array of letters).

Let's say we want the first tie, we could call most_common_letters.first. Since this is returning an array of hashes (letters with their frequency) we need to specify the letter and the frequency.

Our code would look something like this:

 # holds the key, value pair for the most common letter def most_common # gets the first match from our array  most_common_letters.first end # gets the letter from the most common hash def most_common_letter most_common.keys.first end # gets the most common letters value def most_common_letter_count most_common.values.first end

Using Our TextAnalyzer

Now we are all set up and ready to go, displaying the results is as easy as initializing with the string and displaying the associated methods.

text_to_analyze = "Hey! Isn't ruby amazing?!?" text = TextAnalyzer.new(text_to_analyze) display_string = <<-STR  Word Count: #{text.word_count} Sentence Count: #{text.sentence_count} Character Count: #{text.character_count} Letter Count: #{text.letter_count} Vowel Count: #{text.vowel_count} Consonant Count: #{text.consonant_count} Most Common Letter: #{text.most_common_letter} used #{text.most_common_letter_count} times. STR puts display_string

Conclusion

Ruby is awesome! It let's use use regex and hashes to operate on strings and find occurrences under many different variables. If you find yourself looping over a string to find certain occurrences of characters you might want to use regex!

Combining regex with other ruby functionalities such as hashes and loops can make this extremely powerful for keeping track of occurrences of variables or sorting by any set of pre-requisites.

View the completed project on repl.it

A Reminder on how to Regex

One way I love to find my regex for the string I am working with is ask myself:

What do I want after the regex? What is going into the regex? Should I look for included characters or excluded characters?

It is very helpful to list out your requirements for the string you want after the regex.

Rubular - Regex Quick Reference Guide

DEV Community