Chapter 6 String Class, String Builder, and Pattern Matching Strings are common to most computer programs. Certain types of programs, such as word processors and web applications, make heavy use of strings, which forces the programmer of such applications to pay special attention to the e ciency of string processing. In this chapter, we examine how C# works with strings, how to use the String class, and finally, how to work with the StringBuilder class. The StringBuilder class is used when a program must make many changes to a String object because strings and String objects are immutable, whereas StringBuilder objects are mutable. 6.1 String Class A string is a series of characters that can include letters, numbers, and other symbols. String literals are created in C# by enclosing a series of characters within a set of double quotation marks. Here are some examples of string literals: ” Haitham El Ghareeb ” ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ”123 45 6789” ” helghareeb@mans . edu . eg ” A string can consist of any character that is part of the Unicode character set. A string can also consist of no characters. This is a special string called the empty string and it is shown by placing two double quotation marks next to each other (“ ”). Please keep in mind that this is not the string that represents a space. That string looks like this—“ ”. Strings in C# have a 105
106CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING special nature—they are both native types and objects of a class. Actually, to be more precise, we should say that we can work with strings as if they are native data values, but in reality every string created is an object of String class. 6.1.1 Creating String Object Strings are created like this: string name = ” Haitham A. El Ghareeb ” ; though you can of course, declare the variable and assign it data in two separate statements. The declaration syntax makes name look like it is just a regular variable, but it is actually an instance of a String object. C# strings also allow you to place escape characters inside the strings. Escape characters are used to place format characters such as line breaks and tab stops within a string. 6.1.2 String Class Methods Although there are many operations we can perform on strings, a small set of operations dominates. Three of the top operations are as follows: 1. finding a substring in a string 2. determining the length of a string, and 3. determining the position of a character in a string. The following short program demonstrates how to perform these operations. A String object is instantiated to the string “Haitham El-Ghareeb”. We then break the string into its two constituent pieces: the first word and the second word. Here’s the code, followed by an explanation of the String methods used: using System ; class My St r in gE xa m pl e { static void Main ( ) { string string1 = ” Haitham , El Ghareeb ! ” ; int len = string1 . Length ; int pos = string1 . IndexOf ( ” ” ) ; string firstWord , secondWord ; firstWord = string1 . Substring ( 0 , pos ) ; secondWord = string1 . Substring ( pos +1 , ( len 1) (pos +1) ) ;
6.1. STRING CLASS 107 Console . WriteLine ( ” F i r s t word : ” + firstWord ) ; Console . WriteLine ( ” Second word : ” + secondWord ) ; Console . Read ( ) ; } } The first thing we do is use Length property to determine the length of the object string. The length is simply the total number of all the characters in the string. To break up a two-word phrase into separate words, we need to know what separates the words. In a well-formed phrase, a space separates words and so we want to find the space between the two words in this phrase. We can do this with the IndexOf method. This method takes a character and returns the character’s position in the string. Strings in C# are zero-based and therefore the first character in the string is at position 0, the second character is at position 1, and so on. If the character can’t be found in the string, a-1 is returned. The IndexOf method finds the position of the space separating the two words and is used in the next method, Substring, to actually pull the first word out of the string. The Substring method takes two arguments: a starting position and the number of characters to pull. Look at the following example: string m = ”Now i s t h e time ” ; string sub = m . Substring ( 0 , 3 ) ; The value of sub is “Now”. The Substring method will pull as many char- acters out of a string as you ask it to, but if you try to go beyond the end of the string, an exception is thrown. The first word is pulled out of the string by starting at position 0 and pulling out pos number of characters. This may seem odd, since pos contains the position of the space, but because strings are zero-based, this is the correct number. The next step is to pull out the second word. Since we know where the space is, we know that the second word starts at pos+1 (again, we’re assuming we’re working with a well-formed phrase where each word is separated by exactly one space). The harder part is deciding exactly how many characters to pull out, knowing that an exception will be thrown if we try to go beyond the end of the string. There is a formula of sorts we can use for this calculation. First, we add 1 to the position where the space was found and then subtract that value from the length of the string. That will tell the method exactly how many char- acters to extract. Although this short program is interesting, it’s not very useful. What we really need is a program that will pull out the words out of a well-formed phrase of any length. There are several di↵erent algorithms we can use to do this. The algorithm we’ll use here contains the following steps:
108CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING 1. Find the position of the first space in the string. 2. Extract the word. 3. Build a new string starting at the position past the space and continuing until the end of the string. 4. Look for another space in the new string. 5. If there isn’t another space, extract the word from that position to the end of the string. 6. Otherwise, loop back to step 2. Here is the code we built from this algorithm (each word extracted from the string is stored in a collection named words): using System ; class separateWords { static void Main ( ) { string astring = ” I l o v e S t r i n g M a n i p u l a t i o n ” ; int pos ; string word ; ArrayList words = new ArrayList ( ) ; pos = astring . IndexOf ( ” ” ) ; While ( pos > 0 ) { word = astring . Substring ( 0 , pos ) ; words . Add ( word ) ; astring = astring . Substring ( pos +1 , astring . Length ( pos + 1 ) ) - ; pos = astring . IndexOf ( ” ” ) ; i f ( pos == 1) { word = astring . Substring ( 0 , asstring . Length ) ; words . Add ( word ) ; } Console . Read ( ) ; } } } Of course, if we were going to actually use this algorithm in a program we’d make it a function and have it return a collection, like this: using System ; using System . Collections ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string astring = ” I s t i l l Love S t r i n g M a n i p u l a t i o n ” ; ArrayList words = new ArrayList ( ) ; words = SplitWords ( astring ) ; f o r e a c h ( string word in words ) Console . Write ( word + ” ” ) ; Console . Read ( ) ; } static ArrayList SplitWords ( string astring ) {
6.1. STRING CLASS 109 string [ ] ws = new string [ astring . Length 1 ] ; ArrayList words = new ArrayList ( ) ; int pos ; string word ; pos = astring . IndexOf ( ” ” ) ; w h i l e ( pos > 0 ) { word = astring . Substring ( 0 , pos ) ; words . Add ( word ) ; astring = astring . Substring ( pos +1 , astring . Length (pos +1) ) ; i f ( pos == 1) { word = astring . Substring ( 0 , astring . Length ) ; words . Add ( word ) ; } } return words ; } } It turns out, though, that the String class already has a method for splitting a string into parts (the Split method) as well as a method that can take a data collection and combine its parts into a string (the Join method). 6.1.3 Join and Split Methods Breaking up strings into individual pieces of data is a very common function. Many programs, from Web applications to everyday o ce applications, store data in some type of string format. To simplify the process of breaking up strings and putting them back together, the String class provides two meth- ods to use: the Split method for breaking up strings and the Join method for making a string out of the data stored in an array. The Split method takes a string, breaks it into constituent pieces, and puts those pieces into a String array. The method works by focusing on a separating character to determine where to break up the string. In the example in the last section, the Split- Words function always used the space as the separator. We can specify what separator to look for when using the Split method. In fact, the separator is the first argument to the method. The argument must come in the form of a char array, with the first element of the array being the character used as the delimiter. Many application programs export data by writing out strings of data separated by commas. These are called comma-separated value strings or CSVs for short. Some authors use the term comma-delimited. A comma- delimited string looks like this: "Haitham, El-Ghareeb, Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Egypt,35516" Each logical piece of data in this string is separated by a comma. We can put each of these logical pieces into an array using the Split method like this:
110CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING string data = ” Haitham , El Ghareeb , I n f o r m a t i o n Systems Department , - F a c u l t y o f Computers and I n f o r m a t i o n S c i e n c e s , Mansoura U n i v e r s i t y , - Egypt , 3 5 5 1 6 ” ; string [ ] sdata ; char [ ] delimiter = new char [ ] { ' , ' } ; sdata = data . Split ( delimiter , data . Length ) ; Now we can access this data using standard array techniques: f o r e a c h ( string word in sdata ) Console . Write ( word + ” ” ) ; There is one more parameter we can pass to the Split method—the number of elements we want to store in the array. For example, if I want to put the first string element in the first position of the array and the rest of the string in the second element, I would call the method like this: sdata = data . Split ( delimiter , 2 ) ; The elements in the array are: • 0th element-Haitham • 1st element-El-Ghareeb, Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Egypt,35516 We can go the other way, from an array to a string, using the Join method. This method takes two arguments:the original array and a character to sepa- rate the elements. A string is built consisting of each array element followed by the separator element. We should also mention that this method is often called as a class method, meaning we call the method from the String class itself and not from a String instance. Here’s an example using the same data we used for the Split method: using System ; class JoinString { static void Main ( ) { string data = ” Haitham , El Ghareeb , I n f o r m a t i o n Systems Department - , F a c u l t y o f Computers and I n f o r m a t i o n S c i e n c e s , Mansoura - U n i v e r s i t y , Egypt , 3 5 5 1 6 ” ; string [ ] sdata ; char [ ] delimiter = new char [ ] { ' , ' } ; sdata = data . Split ( delimiter , data . Length ) ; f o r e a c h ( string word in sdata ) Console . Write ( word + ” ” ) ; string joined ; joined = String . Join ( ' , ' , sdata ) ; Console . Write ( joined ) ; } }
6.1. STRING CLASS 111 These methods are useful for getting data into your program from another source (the Split method) and sending data out of your program to another source (the Join method). 6.1.4 Comparing Strings There are several ways to compare String objects in C#. The most obvious ways are to use the relational operators, which for most situations will work just fine. However, there are situations where other comparison techniques are more useful, such as if we want to know if a string is greater than, less than, or equal to another string, and for situations like that we have to use methods found in the String class. Strings are compared with each other much as we compare numbers. However, since it’s not obvious if “a” is greater than or less than “H”, we have to have some sort of numeric scale to use. That scale is the Unicode table. Each character (actually every symbol) has a Unicode value, which the operating system uses to convert a character’s binary representation to that character. You can determine a character’s Unicode value by using the ASC function. ASC actually refers to the ASCII code of a number. ASCII is an older numeric code that precedes Unicode, and the ASC function was first developed before Unicode subsumed ASCII. To find the ASCII value for a character, simply convert the character to an integer using a cast, like this: int charCode ; charCode = ( int ) ' a ' ; The value 97 is stored in the variable. Two strings are compared, then, by actually comparing their numeric codes. The strings “a” and “b” are not equal because code 97 is not code 98. The compareTo method actually lets us determine the exact relationship between two String objects. We’ll see how to use that method shortly. The first comparison method we’ll examine is the Equals method. This method is called from a String object and takes another String object as its argument. It then compares the two String objects character-by-character. If they contain the same characters (based on their numeric codes), the method returns True. Otherwise, the method returns False. The method is called like this: string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; i f ( s1 . Equals ( s2 ) ) Console . WriteLine ( ”They a r e t h e same . ” ) ; else Console . WriteLine ( ”They a r e not t h e same . ” ) ;
112CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING The next method for comparing strings is CompareTo. This method also takes a String as an argument but it doesn’t return a Boolean value. Instead, the method returns either 1, -1, or 0, depending on the relationship between he passed-in string and the string instance calling the method. Here are some examples: string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 0 s2 = ” f o o f o o ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 1 s2 = ” f o o a a r ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 1 If two strings are equal, the CompareTo method returns a 0; if the passed-in string is “below” the method-calling string, the method returns a -1; if the passed-in string is “above” the method-calling string, the method returns a 1. An alternative to the CompareTo method is the Compare method, which is usually called as a class method. This method performs the same type of comparison as the CompareTo method and returns the same values for the same comparisons. The Compare method is used like this: static void Main ( ) { string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; int compVal = String . Compare ( s1 , s2 ) ; s w i t c h ( compVal ) { case 0 : Console . WriteLine ( s1 + ” ” + s2 + ” a r e e q u a l ” ) ; break ; case 1 : Console . WriteLine ( s1 + ” i s l e s s than ” + s2 ) ; break ; case 2 : Console . WriteLine ( s1 + ” i s g r e a t e r than ” + s2 ) ; break ; default : Console . WriteLine ( ”Can ' t compare ” ) ; } } Two other comparison methods that can be useful when working with strings are StartsWith and EndsWith. These instance methods take a string as an argument and return True if the instance either starts with or ends with the string argument. Following are two short programs that demonstrate the use of these methods. First, we’ll demonstrate the EndsWith method: using System ;
6.1. STRING CLASS 113 using System . Collections ; class S t r i n g C o m p a r i s o n { static void Main ( ) { string [ ] nouns = new string [ ] { ” a p p l e s ” , ” o r a n g e s ” , ” banana ” , ” c h e r r y ” - , ” tomatoes ” } ; ArrayList pluralNouns = new ArrayList ( ) ; f o r e a c h ( string noun in nouns ) i f ( noun . EndsWith ( ” s ” ) ) pluralNouns . Add ( noun ) ; f o r e a c h ( string noun in pluralNouns ) Console . Write ( noun + ” ” ) ; } } First, we create an array of nouns, some of which are in plural form. Then we loop through the elements of the array, checking to see if any of the nouns are plurals. If so, they’re added to a collection. Then we loop through the collection, displaying each plural. We use the same basic idea in the next program to determine which words start with the prefix “tri”: using System ; using System . Collections ; class StringEndings { static void Main ( ) { string [ ] words = new string [ ] { ” t r i a n g l e ” , ” d i a g o n a l ” , ” t r i m e s t e r ” , ” - bifocal ” , ” triglycerides ” }; ArrayList triWords = new ArrayList ( ) ; f o r e a c h ( string word in words ) i f ( word . StartsWith ( ” t r i ” ) ) triWords . Add ( word ) ; f o r e a c h ( string word in triWords ) Console . Write ( word + ” ” ) ; } } 6.1.5 String Manipulation String processing usually involves making changes to strings. We need to insert new characters into a string, remove characters that don’t belong any- more, replace old characters with new characters, change the case of certain characters, and add or remove space from strings, just to name a few opera- tions. There are methods in the String class for all of these operations, and in this section we’ll examine them. We’ll start with the Insert method. This method inserts a string into another string at a specified position. Insert returns a new string. The method is called like this: String1 = String0 . Insert ( Position , String )
114CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING An example of using this code is: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string s1 = ” H e l l o , . Welcome t o Data S t r u c t u r e s and A l g o r i t h m s c l a s s . - ”; string name = ”Ahmed” ; int pos = s1 . IndexOf ( ” , ” ) ; s1 = s1 . Insert ( pos +2 , name ) ; Console . WriteLine ( s1 ) ; } } The output is Hello, Ahmed. Welcome to Data Structures and Algorithms class. The program creates a string, s1, which deliberately leaves space for a name, much like you’d do with a letter you plan to run through a mail merge. We add two to the position where we find the comma to make sure there is a space between the comma and the name. The next most logical method af- ter Insert is Remove. This method takes two Integer arguments: a starting position and a count, which is the number of characters you want to remove. Here’s the code that removes a name from a string after the name has been inserted: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string s1 = ” H e l l o , . Welcome t o Data S t r u c t u r e s and A l g o r i t h m s - class .”; string name = ” Saed ” ; int pos = s1 . IndexOf ( ” , ” ) ; s1 = s1 . Insert ( pos +2 , name ) ; Console . WriteLine ( s1 ) ; s1 = s1 . Remove ( pos +2 , name . Length ) ; Console . WriteLine ( s1 ) ; } } The Remove method uses the same position for inserting a name to remove the name, and the count is calculated by taking the length of the name vari- able. This allows us to remove any name inserted into the string. The next logical method is the Replace method. This method takes two arguments: a string of characters to remove and a string of characters to replace them with. The method returns the new string. Here’s how to use Replace: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) {
6.1. STRING CLASS 115 string [ ] words = new string [ ] { ” r e c i e v e ” , ” d e c i e v e ” , ” r e c i e p t ” } ; for ( int i = 0 ; i <= words . GetUpperBound ( 0 ) ; i++) { words [ i ] = words [ i ] . Replace ( ” c i e ” , ” c e i ” ) ; Console . WriteLine ( words [ i ] ) ; } } } The only tricky part of this code is the way the Replace method is called. Since we’re accessing each String object via an array element, we have to use array addressing followed by the method name, causing us to write this fragment: words ( index ) . Replace ( ” c i e ” , ” c e i ” ) ; There is no problem with doing this, of course, because the compiler knows that words(index) evaluates to a String object. When displaying data from our programs, we often want to align the data within a printing field in order to line the data up nicely. The String class includes two methods for perform- ing this alignment: PadLeft and PadRight. The PadLeft method right-aligns a string and the PadRight method left-aligns a string. For example, if you want to print the word “Hello” in a 10-character field right-aligned, you would write this: string s1 = ” H e l l o ” ; Console . WriteLine ( s1 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( ” world ” ) ; The output is: Hello world Here’s an example using PadRight: string s1 = ” Haitham ” ; string s2 = ” Abdel Monem” ; string s3 = ” El Ghareeb ” ; Console . Write ( s1 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( s2 . PadLeft ( 1 0 ) ) ; Console . Write ( s3 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( s2 . Padleft ( 1 0 ) ) ; We end this section with a discussion of the Trim and TrimEnd methods. When working with String objects, they sometimes have extra spaces or other formatting characters at the beginning or at the end of the string. The Trim and TrimEnd methods will remove spaces or other characters from either end
116CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING of a string. You can specify either a single character to trim or an array of characters. If you specify an array of characters, if any of the characters in the array are found, they will be trimmed from the string. Let’s first look at an example that trims spaces from the beginning and end of a set of string values: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string [ ] names = new string [ ] { ” Haitham ” , ” Mohamed ” , ”Ahmed ” , - ” Saed ” } ; Console . WriteLine ( ) ; showNames ( names ) ; Console . WriteLine ( ) ; trimVals ( names ) ; Console . WriteLine ( ) ; showNames ( names ) ; } static void showNames ( string [ ] arr ) { for ( int i = 0 ; i <= arr . GetUpperBound ( 0 ) ; i++) Console . Write ( arr [ i ] ) ; } static void trimVals ( string [ ] arr ) { char [ ] charArr = new char [ ] { ' ' } ; for ( int i = 0 ; i<= arr . GetUpperBound ( 0 ) ; i++) { arr [ i ] = arr [ i ] . Trim ( charArr [ 0 ] ) ; arr [ i ] = arr [ i ] . TrimEnd ( charArr [ 0 ] ) ; } } } 6.2 String Builder The StringBuilder class provides access to mutable String objects. Objects of the String class are immutable, meaning that they cannot be changed. Every time you change the value of a String object, a new object is created to hold the value. StringBuilder objects, on the other hand, are mutable. When you make a change to a StringBuilder object, you are changing the original object, not working with a copy. In this section, we discuss how to use the StringBuilder class for those situations where many changes are to be to the String objects in your programs. The StringBuilder class is found in the System.Text namespace so you must import this namespace into your program before you can use StringBuilder objects. You can construct a StringBuilder object in one of three ways. The first way is to create the object using the default constructor: StringBuilder stBuff1 = new StringBuilder ( ) ;
6.2. STRING BUILDER 117 This line creates the object stBu↵1 with the capacity to hold a string 16 characters in length. This capacity is assigned by default, but it can be changed by passing in a new capacity in a constructor call, like this: StringBuilder stBuff2 = New StringBuilder ( 2 5 ) ; This line builds an object that can initially hold 25 characters. The final constructor call takes a string as the argument: StringBuilder stBuff3 = New StringBuilder ( ” H e l l o , world ” ) ; The capacity is set to 16 because the string argument didn’t exceed 16 char- acters. Had the string argument been longer than 16, the capacity would have been set to 32. Every time the capacity of a StringBuilder object is exceeded, the capacity is increased by 16 characters. There are several prop- erties in the StringBuilder class that you can use to obtain information about a StringBuilder object. The Length property specifies the number of char- acters in the current instance and the Capacity property returns the current capacity of the instance. The MaxCapacity property returns the maximum number of characters allowed in the current instance of the object (though this is automatically increased if more characters are added to the object). The following program fragment demonstrates how to use these properties: StringBuilder stBuff = new StringBuilder ( ” Haitham El Ghareeb ” ) ; Console . WriteLine ( ” Length o f s t B u f f 3 : ” & stBuff . Length ( ) ) ; Console . WriteLine ( ” C a p a c i t y o f s t B u f f 3 : ” & stBuff . Capacity ( ) ) ; Console . WriteLine ( ”Maximum c a p a c i t y o f s t B u f f 3 : ” + stBuff . MaxCapacity ) ; The Length property can also be used to set the current length of a String- Builder object, as in stBuff . Length = 1 0 ; Console . Write ( stBuff3 ) ; This code outputs “Haitham El”. To ensure that a minimum capacity is maintained for a StringBuilder instance, you can call the EnsureCapacity method, passing in an integer that states the minimum capacity for the object. Here’s an example: stBuff . Ensu reCapaci ty ( 2 5 ) ; Another property you can use is the Chars property. This property either returns the character in the position specified in its argument or sets the
118CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING character passed as an argument. The following code shows a simple example using the Chars property. StringBuilder stBuff = New StringBuilder ( ” Haitham El Ghareeb ” ) ; If ( stBuff . Chars ( 0 ) <> ”D” ) stBuff . Chars ( 0 ) = ”D” ; 6.2.1 Modifying StringBuilder Objects We can modify a StringBuilder object by appending new characters to the end of the object, inserting characters into an object, replacing a set of characters in an object with di↵erent characters, and remove characters from an object. You can add characters to the end of a StringBuilder object by using the Append method. This method takes a string value as an argument and concatenates the string to the end of the current value in the object. The following program demonstrates how the Append method works: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff As New StringBuilder ( ) ; String [ ] words = new string [ ] { ”now ” , ” i s ” , ” t h e ” , ” time ” , ” - f o r ” , ” a l l ” , ” good ” , ”men ” , ” t o ” , ”come ” , ” t o ” , ” t h e ” , - ” aid ” , ” o f ” , ” t h e i r ” , ” party ”} For ( int i = 0 ; i <= words . GetUpperBound ( 0 ) ; i++) stBuff . Append ( words ( index ) ) ; Console . WriteLine ( stBuff ) ; } } The output is, of course Now is the time for all good men to come to the aid of their party A formatted string can be appended to a StringBuilder object. A formatted string is a string that includes a format specification embedded in the string. There are too many format specifications to cover in this section, so we’ll just demonstrate a common specification. We can place a formatted number within a StringBuilder object like this: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ) ; Console . WriteLine ( ) ; stBuff . AppendFormat ( ”Your o r d e r i s f o r {0000} w i d g e t s . ” , 2 3 4 ) ; stBuff . AppendFormat ( ” nWe have {0000} w i d g e t s l e f t . ” , 1 2 ) ; Console . WriteLine ( stBuff ) ; }
6.2. STRING BUILDER 119 } The format specification is enclosed within curly braces that are embedded in a string literal. The data after the comma is placed into the specification when the code is executed. Next is the Insert method. This method allows us to insert a string into the current StringBuilder object. The method can take up to three arguments. The first argument specifies the position to begin the insertion. The second argu- ment is the string you want to insert. The third argument, which is optional, is an integer that specifies the number of times you want to insert the string into the object. Here’s a small program that demonstrates how the Insert method is used: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ) ; stBuff . Insert ( 0 , ” H e l l o ” ) ; stBuff . Append ( ” world ” ) ; stBuff . Insert ( 5 , ” , ” ) ; Console . WriteLine ( stBuff ) ; char chars [ ] = new char [ ] { ' t ' , ' h ' , ' e ' , ' r ' , 'e ' } ; stBuff . Insert ( 5 , ” ” & chars ) ; Console . WriteLine ( stBuff ) ; } } The output is Hello, world Hello there, world The following program utilizes the Insert method using the third argument for specifying the number of insertions to make: StringBuilder stBuff = New StringBuilder ( ) ; stBuff . Insert ( 0 , ” and on ” , 6 ) ; Console . WriteLine ( stBuff ) ; The output is and on and on and on and on and on and on The StringBuilder class has a Remove method for removing characters from a StringBuilder object. This method takes two arguments: a starting position and the number of characters to remove. Here’s how it works: StringBuilder stBuff = New StringBuilder ( ” n o i s e i n+++++s t r i n g ” ) ; stBuff . Remove ( 9 , 5 ) ; Console . WriteLine ( stBuff ) ;
120CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING The output is noise in string We can replace characters in a StringBuilder object with the Replace method. This method takes two arguments: the old string to replace and the new string to put in its place. The following code fragment demonstrates how the method works: StringBuilder stBuff = New StringBuilder ( ” r e c i e v e d e c i e v e r e c i e p t ” ) ; stBuff . Replace ( ” c i e ” , ” c e i ” ) ; Console . WriteLine ( stBuff ) ; Each “cie” is replaced with “cei”. When working with StringBuilder objects, you will often want to convert them to strings, perhaps in order to use a method that isn’t found in the StringBuilder class. You can do this with the ToString. This method returns a String instance of the current StringBuilder instance. An example is shown: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ”HELLO WORLD” ) ; string st = stBuff . ToString ( ) ; st = st . ToLower ( ) ; st = st . Replace ( st . Substring ( 0 , 1 ) , st . Substring ( 0 , 1 ) . ToUpper ( ) ) ; stBuff . Replace ( stBuff . ToString , st ) ; Console . WriteLine ( stBuff ) ; } } This program displays the string “Hello world” by first converting stBu↵ to a string (the st variable), making all the characters in the string lowercase, capitalizing the first letter in the string, and then replacing the old string in the StringBuilder object with the value of st. The ToString method is used in the first argument to Replace because the first parameter is supposed to be a string. 6.3 Pattern Matching Whereas the String and StringBuilder classes provide a set of methods that can be used to process string-based data, the RegEx and its supporting classes provide much more power for string-processing tasks. String process- ing mostly involves looking for patterns in strings (pattern matching) and it is performed via a special language called a regular expression.
6.3. PATTERN MATCHING 121 6.3.1 Regular Expression A regular expression is a language that describes patterns of characters in strings, along with descriptors for repeating characters, alternatives, and groupings of characters. Regular expressions can be used to perform both searches in strings and substitutions in strings. A regular expression itself is just a string of characters that define a pattern we want to search for in another string. Generally, the characters in a regular expression match themselves, so that the regular expression “the” matches that sequence of characters wherever they are found in a string. A regular expression can also include special characters that are called metacharacters. Metacharacters are used to signify repetition, alternation, or grouping. To use regular ex- pressions, we have to import the RegEx class into our programs. This class is found in the System.Text.RegularExpressions namespace. Once we have the class imported into our program, we have to decide what we want to do with the RegEx class. If we want to perform matching, we need to use the Match class. If we’re going to do substitutions, we don’t need the Match class. Instead, we can use the Replace method of the RegEx class. Let’s start by looking at how to match words in a string. Given a sample string, “the quick brown fox jumped over the lazy dog”, we want to find out where the word “the” is found in the string. The following program performs this task: using System ; using System . Text . R e g u l a r E x p r e s s i o n s ; class My RegExpSa mple { static void Main ( ) { Regex reg = New Regex ( ” t h e ” ) ; string str1 = ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ; Match matchSet ; int matchPos ; matchSet = reg . Match ( str1 ) ; If ( matchSet . Success ) { matchPos = matchSet . Index ; Console . WriteLine ( ” found match a t p o s i t i o n : ” + matchPos ) ; } } } The first thing we do is create a new RegEx object and pass the constructor the regular expression we’re trying to match. After we initialize a string to match against, we declare a Match object, matchSet. The Match class pro- vides methods for storing data concerning a match made with the regular expression. The If statement uses one of the Match class properties, Success, to determine if there was a successful match. If the value returns True, then the regular expression matched at least one substring in the string. Other-
122CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING wise, the value stored in Success is False. There’s another way a program can check to see if a match is successful. You can pre-test the regular ex- pression by passing it and the target string to the IsMatch method. This method returns True if a match is generated by the regular expression and False otherwise. The method works like this: If ( Regex . IsMatch ( str1 , ” t h e ” ) ) { Match aMatch ; aMatch = reg . Match ( str1 ) ; } One problem with the Match class is that it only stores one match. In the preceding example, there are two matches for the substring “the”. We can use another class, the Matches class, to store multiple matches with a regular expression. We can store the matches in a MatchCollection object in order to work with all the matches found. Here’s an example (only the code inside the Main function is included): using System ; using System . Text . R e g u l a r E x p r e s s i o n s ; class chapter8 { static void Main ( ) { Regex reg = new Regex ( ” t h e ” ) ; string str1 = ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ; M at ch Co l le ct io n matchSet ; matchSet = reg . Matches ( str1 ) ; i f ( matchSet . Count > 0 ) f o r e a c h ( Match aMatch in matchSet ) Console . WriteLine ( ” found a match a t : ” aMatch . Index ) ; Console . Read ( ) ; } } 6.4 Summary String processing is a common operation in most C# programs. The String class provides a multitude of methods for performing every kind of opera- tion on strings. Although the “classic” built-in string functions (Mid, InStr, etc.) are still available for use, we should prefer the String class methods to these functions, both for performance and for clarity. String class objects in C# are immutable, meaning that every time we make a change to an object, a new copy of the object is created. If you are creating long strings, or are making many changes to the same object, you should use the String- Bu↵er class instead. StringBuilder objects are mutable, allowing for much better performance. Regular expressions present powerful options for per-
6.5. EXCERCISES 123 forming text processing and pattern matching. Regular expressions can run the gamut from very simple (“a”) to complex combinations that look more like line noise than executable code. Nonetheless, learning to use regular expressions will allow you to perform text processing on texts you would not even consider using tools such as the methods of the String class. 6.5 Excercises 1. Write a function that converts a phrase into pig Latin. A word is converted to pig Latin by removing the first character of the word, placing it at the back of the word, and adding the characters “ay” to the word. For example, “hello world” in pig Latin is “ellohay orldway.” Your function can assume that each word consists of at least two letters and that each word is separated by one space, with no punctuation marks. 2. Write a function that counts the occurrences of a word in a string. The function should return an integer. Do not assume that just one space separates words and a string can contain punctuation. Write the func- tion so that it works with either a String argument or a StringBuilder object. 3. Write a function that takes a number, such as 52, and returns the number as a word, as in fifty-two. 4. Write a subroutine that takes a simple sentence in noun-verb-object form and parses the sentence into its di↵erent parts. For example, the sentence “Mary walked the dog” is parsed into this: Noun: Mary Verb: walked Object: the dog This function should work with both String objects and StringBuilder objects. 5. Write regular expressions to match the following: • r a string consists of an “x”, followed by any three characters, and then a “y” • r a word ending in “ed” r a phone number • r an HTML anchor tag
124CHAPTER 6. STRING CLASS, STRING BUILDER, AND PATTERN MATCHING 6. Write a regular expression that finds all the words in a string that contain double letters, such as “deep” and “book”. 7. Write a regular expression that finds all the header tags (¡h1¿, ¡h2¿, etc.) in a Web page. 8. Write a function, using a regular expression that performs a simple search and replace in a string.

LectureNotes-04-DSA

  • 1.
    Chapter 6 String Class,String Builder, and Pattern Matching Strings are common to most computer programs. Certain types of programs, such as word processors and web applications, make heavy use of strings, which forces the programmer of such applications to pay special attention to the e ciency of string processing. In this chapter, we examine how C# works with strings, how to use the String class, and finally, how to work with the StringBuilder class. The StringBuilder class is used when a program must make many changes to a String object because strings and String objects are immutable, whereas StringBuilder objects are mutable. 6.1 String Class A string is a series of characters that can include letters, numbers, and other symbols. String literals are created in C# by enclosing a series of characters within a set of double quotation marks. Here are some examples of string literals: ” Haitham El Ghareeb ” ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ”123 45 6789” ” helghareeb@mans . edu . eg ” A string can consist of any character that is part of the Unicode character set. A string can also consist of no characters. This is a special string called the empty string and it is shown by placing two double quotation marks next to each other (“ ”). Please keep in mind that this is not the string that represents a space. That string looks like this—“ ”. Strings in C# have a 105
  • 2.
    106CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING special nature—they are both native types and objects of a class. Actually, to be more precise, we should say that we can work with strings as if they are native data values, but in reality every string created is an object of String class. 6.1.1 Creating String Object Strings are created like this: string name = ” Haitham A. El Ghareeb ” ; though you can of course, declare the variable and assign it data in two separate statements. The declaration syntax makes name look like it is just a regular variable, but it is actually an instance of a String object. C# strings also allow you to place escape characters inside the strings. Escape characters are used to place format characters such as line breaks and tab stops within a string. 6.1.2 String Class Methods Although there are many operations we can perform on strings, a small set of operations dominates. Three of the top operations are as follows: 1. finding a substring in a string 2. determining the length of a string, and 3. determining the position of a character in a string. The following short program demonstrates how to perform these operations. A String object is instantiated to the string “Haitham El-Ghareeb”. We then break the string into its two constituent pieces: the first word and the second word. Here’s the code, followed by an explanation of the String methods used: using System ; class My St r in gE xa m pl e { static void Main ( ) { string string1 = ” Haitham , El Ghareeb ! ” ; int len = string1 . Length ; int pos = string1 . IndexOf ( ” ” ) ; string firstWord , secondWord ; firstWord = string1 . Substring ( 0 , pos ) ; secondWord = string1 . Substring ( pos +1 , ( len 1) (pos +1) ) ;
  • 3.
    6.1. STRING CLASS 107 Console . WriteLine ( ” F i r s t word : ” + firstWord ) ; Console . WriteLine ( ” Second word : ” + secondWord ) ; Console . Read ( ) ; } } The first thing we do is use Length property to determine the length of the object string. The length is simply the total number of all the characters in the string. To break up a two-word phrase into separate words, we need to know what separates the words. In a well-formed phrase, a space separates words and so we want to find the space between the two words in this phrase. We can do this with the IndexOf method. This method takes a character and returns the character’s position in the string. Strings in C# are zero-based and therefore the first character in the string is at position 0, the second character is at position 1, and so on. If the character can’t be found in the string, a-1 is returned. The IndexOf method finds the position of the space separating the two words and is used in the next method, Substring, to actually pull the first word out of the string. The Substring method takes two arguments: a starting position and the number of characters to pull. Look at the following example: string m = ”Now i s t h e time ” ; string sub = m . Substring ( 0 , 3 ) ; The value of sub is “Now”. The Substring method will pull as many char- acters out of a string as you ask it to, but if you try to go beyond the end of the string, an exception is thrown. The first word is pulled out of the string by starting at position 0 and pulling out pos number of characters. This may seem odd, since pos contains the position of the space, but because strings are zero-based, this is the correct number. The next step is to pull out the second word. Since we know where the space is, we know that the second word starts at pos+1 (again, we’re assuming we’re working with a well-formed phrase where each word is separated by exactly one space). The harder part is deciding exactly how many characters to pull out, knowing that an exception will be thrown if we try to go beyond the end of the string. There is a formula of sorts we can use for this calculation. First, we add 1 to the position where the space was found and then subtract that value from the length of the string. That will tell the method exactly how many char- acters to extract. Although this short program is interesting, it’s not very useful. What we really need is a program that will pull out the words out of a well-formed phrase of any length. There are several di↵erent algorithms we can use to do this. The algorithm we’ll use here contains the following steps:
  • 4.
    108CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING 1. Find the position of the first space in the string. 2. Extract the word. 3. Build a new string starting at the position past the space and continuing until the end of the string. 4. Look for another space in the new string. 5. If there isn’t another space, extract the word from that position to the end of the string. 6. Otherwise, loop back to step 2. Here is the code we built from this algorithm (each word extracted from the string is stored in a collection named words): using System ; class separateWords { static void Main ( ) { string astring = ” I l o v e S t r i n g M a n i p u l a t i o n ” ; int pos ; string word ; ArrayList words = new ArrayList ( ) ; pos = astring . IndexOf ( ” ” ) ; While ( pos > 0 ) { word = astring . Substring ( 0 , pos ) ; words . Add ( word ) ; astring = astring . Substring ( pos +1 , astring . Length ( pos + 1 ) ) - ; pos = astring . IndexOf ( ” ” ) ; i f ( pos == 1) { word = astring . Substring ( 0 , asstring . Length ) ; words . Add ( word ) ; } Console . Read ( ) ; } } } Of course, if we were going to actually use this algorithm in a program we’d make it a function and have it return a collection, like this: using System ; using System . Collections ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string astring = ” I s t i l l Love S t r i n g M a n i p u l a t i o n ” ; ArrayList words = new ArrayList ( ) ; words = SplitWords ( astring ) ; f o r e a c h ( string word in words ) Console . Write ( word + ” ” ) ; Console . Read ( ) ; } static ArrayList SplitWords ( string astring ) {
  • 5.
    6.1. STRING CLASS 109 string [ ] ws = new string [ astring . Length 1 ] ; ArrayList words = new ArrayList ( ) ; int pos ; string word ; pos = astring . IndexOf ( ” ” ) ; w h i l e ( pos > 0 ) { word = astring . Substring ( 0 , pos ) ; words . Add ( word ) ; astring = astring . Substring ( pos +1 , astring . Length (pos +1) ) ; i f ( pos == 1) { word = astring . Substring ( 0 , astring . Length ) ; words . Add ( word ) ; } } return words ; } } It turns out, though, that the String class already has a method for splitting a string into parts (the Split method) as well as a method that can take a data collection and combine its parts into a string (the Join method). 6.1.3 Join and Split Methods Breaking up strings into individual pieces of data is a very common function. Many programs, from Web applications to everyday o ce applications, store data in some type of string format. To simplify the process of breaking up strings and putting them back together, the String class provides two meth- ods to use: the Split method for breaking up strings and the Join method for making a string out of the data stored in an array. The Split method takes a string, breaks it into constituent pieces, and puts those pieces into a String array. The method works by focusing on a separating character to determine where to break up the string. In the example in the last section, the Split- Words function always used the space as the separator. We can specify what separator to look for when using the Split method. In fact, the separator is the first argument to the method. The argument must come in the form of a char array, with the first element of the array being the character used as the delimiter. Many application programs export data by writing out strings of data separated by commas. These are called comma-separated value strings or CSVs for short. Some authors use the term comma-delimited. A comma- delimited string looks like this: "Haitham, El-Ghareeb, Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Egypt,35516" Each logical piece of data in this string is separated by a comma. We can put each of these logical pieces into an array using the Split method like this:
  • 6.
    110CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING string data = ” Haitham , El Ghareeb , I n f o r m a t i o n Systems Department , - F a c u l t y o f Computers and I n f o r m a t i o n S c i e n c e s , Mansoura U n i v e r s i t y , - Egypt , 3 5 5 1 6 ” ; string [ ] sdata ; char [ ] delimiter = new char [ ] { ' , ' } ; sdata = data . Split ( delimiter , data . Length ) ; Now we can access this data using standard array techniques: f o r e a c h ( string word in sdata ) Console . Write ( word + ” ” ) ; There is one more parameter we can pass to the Split method—the number of elements we want to store in the array. For example, if I want to put the first string element in the first position of the array and the rest of the string in the second element, I would call the method like this: sdata = data . Split ( delimiter , 2 ) ; The elements in the array are: • 0th element-Haitham • 1st element-El-Ghareeb, Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Egypt,35516 We can go the other way, from an array to a string, using the Join method. This method takes two arguments:the original array and a character to sepa- rate the elements. A string is built consisting of each array element followed by the separator element. We should also mention that this method is often called as a class method, meaning we call the method from the String class itself and not from a String instance. Here’s an example using the same data we used for the Split method: using System ; class JoinString { static void Main ( ) { string data = ” Haitham , El Ghareeb , I n f o r m a t i o n Systems Department - , F a c u l t y o f Computers and I n f o r m a t i o n S c i e n c e s , Mansoura - U n i v e r s i t y , Egypt , 3 5 5 1 6 ” ; string [ ] sdata ; char [ ] delimiter = new char [ ] { ' , ' } ; sdata = data . Split ( delimiter , data . Length ) ; f o r e a c h ( string word in sdata ) Console . Write ( word + ” ” ) ; string joined ; joined = String . Join ( ' , ' , sdata ) ; Console . Write ( joined ) ; } }
  • 7.
    6.1. STRING CLASS 111 These methods are useful for getting data into your program from another source (the Split method) and sending data out of your program to another source (the Join method). 6.1.4 Comparing Strings There are several ways to compare String objects in C#. The most obvious ways are to use the relational operators, which for most situations will work just fine. However, there are situations where other comparison techniques are more useful, such as if we want to know if a string is greater than, less than, or equal to another string, and for situations like that we have to use methods found in the String class. Strings are compared with each other much as we compare numbers. However, since it’s not obvious if “a” is greater than or less than “H”, we have to have some sort of numeric scale to use. That scale is the Unicode table. Each character (actually every symbol) has a Unicode value, which the operating system uses to convert a character’s binary representation to that character. You can determine a character’s Unicode value by using the ASC function. ASC actually refers to the ASCII code of a number. ASCII is an older numeric code that precedes Unicode, and the ASC function was first developed before Unicode subsumed ASCII. To find the ASCII value for a character, simply convert the character to an integer using a cast, like this: int charCode ; charCode = ( int ) ' a ' ; The value 97 is stored in the variable. Two strings are compared, then, by actually comparing their numeric codes. The strings “a” and “b” are not equal because code 97 is not code 98. The compareTo method actually lets us determine the exact relationship between two String objects. We’ll see how to use that method shortly. The first comparison method we’ll examine is the Equals method. This method is called from a String object and takes another String object as its argument. It then compares the two String objects character-by-character. If they contain the same characters (based on their numeric codes), the method returns True. Otherwise, the method returns False. The method is called like this: string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; i f ( s1 . Equals ( s2 ) ) Console . WriteLine ( ”They a r e t h e same . ” ) ; else Console . WriteLine ( ”They a r e not t h e same . ” ) ;
  • 8.
    112CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING The next method for comparing strings is CompareTo. This method also takes a String as an argument but it doesn’t return a Boolean value. Instead, the method returns either 1, -1, or 0, depending on the relationship between he passed-in string and the string instance calling the method. Here are some examples: string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 0 s2 = ” f o o f o o ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 1 s2 = ” f o o a a r ” ; Console . WriteLine ( s1 . CompareTo ( s2 ) ) ; // returns 1 If two strings are equal, the CompareTo method returns a 0; if the passed-in string is “below” the method-calling string, the method returns a -1; if the passed-in string is “above” the method-calling string, the method returns a 1. An alternative to the CompareTo method is the Compare method, which is usually called as a class method. This method performs the same type of comparison as the CompareTo method and returns the same values for the same comparisons. The Compare method is used like this: static void Main ( ) { string s1 = ” Haitham ” ; string s2 = ” Haitham ” ; int compVal = String . Compare ( s1 , s2 ) ; s w i t c h ( compVal ) { case 0 : Console . WriteLine ( s1 + ” ” + s2 + ” a r e e q u a l ” ) ; break ; case 1 : Console . WriteLine ( s1 + ” i s l e s s than ” + s2 ) ; break ; case 2 : Console . WriteLine ( s1 + ” i s g r e a t e r than ” + s2 ) ; break ; default : Console . WriteLine ( ”Can ' t compare ” ) ; } } Two other comparison methods that can be useful when working with strings are StartsWith and EndsWith. These instance methods take a string as an argument and return True if the instance either starts with or ends with the string argument. Following are two short programs that demonstrate the use of these methods. First, we’ll demonstrate the EndsWith method: using System ;
  • 9.
    6.1. STRING CLASS 113 using System . Collections ; class S t r i n g C o m p a r i s o n { static void Main ( ) { string [ ] nouns = new string [ ] { ” a p p l e s ” , ” o r a n g e s ” , ” banana ” , ” c h e r r y ” - , ” tomatoes ” } ; ArrayList pluralNouns = new ArrayList ( ) ; f o r e a c h ( string noun in nouns ) i f ( noun . EndsWith ( ” s ” ) ) pluralNouns . Add ( noun ) ; f o r e a c h ( string noun in pluralNouns ) Console . Write ( noun + ” ” ) ; } } First, we create an array of nouns, some of which are in plural form. Then we loop through the elements of the array, checking to see if any of the nouns are plurals. If so, they’re added to a collection. Then we loop through the collection, displaying each plural. We use the same basic idea in the next program to determine which words start with the prefix “tri”: using System ; using System . Collections ; class StringEndings { static void Main ( ) { string [ ] words = new string [ ] { ” t r i a n g l e ” , ” d i a g o n a l ” , ” t r i m e s t e r ” , ” - bifocal ” , ” triglycerides ” }; ArrayList triWords = new ArrayList ( ) ; f o r e a c h ( string word in words ) i f ( word . StartsWith ( ” t r i ” ) ) triWords . Add ( word ) ; f o r e a c h ( string word in triWords ) Console . Write ( word + ” ” ) ; } } 6.1.5 String Manipulation String processing usually involves making changes to strings. We need to insert new characters into a string, remove characters that don’t belong any- more, replace old characters with new characters, change the case of certain characters, and add or remove space from strings, just to name a few opera- tions. There are methods in the String class for all of these operations, and in this section we’ll examine them. We’ll start with the Insert method. This method inserts a string into another string at a specified position. Insert returns a new string. The method is called like this: String1 = String0 . Insert ( Position , String )
  • 10.
    114CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING An example of using this code is: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string s1 = ” H e l l o , . Welcome t o Data S t r u c t u r e s and A l g o r i t h m s c l a s s . - ”; string name = ”Ahmed” ; int pos = s1 . IndexOf ( ” , ” ) ; s1 = s1 . Insert ( pos +2 , name ) ; Console . WriteLine ( s1 ) ; } } The output is Hello, Ahmed. Welcome to Data Structures and Algorithms class. The program creates a string, s1, which deliberately leaves space for a name, much like you’d do with a letter you plan to run through a mail merge. We add two to the position where we find the comma to make sure there is a space between the comma and the name. The next most logical method af- ter Insert is Remove. This method takes two Integer arguments: a starting position and a count, which is the number of characters you want to remove. Here’s the code that removes a name from a string after the name has been inserted: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string s1 = ” H e l l o , . Welcome t o Data S t r u c t u r e s and A l g o r i t h m s - class .”; string name = ” Saed ” ; int pos = s1 . IndexOf ( ” , ” ) ; s1 = s1 . Insert ( pos +2 , name ) ; Console . WriteLine ( s1 ) ; s1 = s1 . Remove ( pos +2 , name . Length ) ; Console . WriteLine ( s1 ) ; } } The Remove method uses the same position for inserting a name to remove the name, and the count is calculated by taking the length of the name vari- able. This allows us to remove any name inserted into the string. The next logical method is the Replace method. This method takes two arguments: a string of characters to remove and a string of characters to replace them with. The method returns the new string. Here’s how to use Replace: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) {
  • 11.
    6.1. STRING CLASS 115 string [ ] words = new string [ ] { ” r e c i e v e ” , ” d e c i e v e ” , ” r e c i e p t ” } ; for ( int i = 0 ; i <= words . GetUpperBound ( 0 ) ; i++) { words [ i ] = words [ i ] . Replace ( ” c i e ” , ” c e i ” ) ; Console . WriteLine ( words [ i ] ) ; } } } The only tricky part of this code is the way the Replace method is called. Since we’re accessing each String object via an array element, we have to use array addressing followed by the method name, causing us to write this fragment: words ( index ) . Replace ( ” c i e ” , ” c e i ” ) ; There is no problem with doing this, of course, because the compiler knows that words(index) evaluates to a String object. When displaying data from our programs, we often want to align the data within a printing field in order to line the data up nicely. The String class includes two methods for perform- ing this alignment: PadLeft and PadRight. The PadLeft method right-aligns a string and the PadRight method left-aligns a string. For example, if you want to print the word “Hello” in a 10-character field right-aligned, you would write this: string s1 = ” H e l l o ” ; Console . WriteLine ( s1 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( ” world ” ) ; The output is: Hello world Here’s an example using PadRight: string s1 = ” Haitham ” ; string s2 = ” Abdel Monem” ; string s3 = ” El Ghareeb ” ; Console . Write ( s1 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( s2 . PadLeft ( 1 0 ) ) ; Console . Write ( s3 . PadLeft ( 1 0 ) ) ; Console . WriteLine ( s2 . Padleft ( 1 0 ) ) ; We end this section with a discussion of the Trim and TrimEnd methods. When working with String objects, they sometimes have extra spaces or other formatting characters at the beginning or at the end of the string. The Trim and TrimEnd methods will remove spaces or other characters from either end
  • 12.
    116CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING of a string. You can specify either a single character to trim or an array of characters. If you specify an array of characters, if any of the characters in the array are found, they will be trimmed from the string. Let’s first look at an example that trims spaces from the beginning and end of a set of string values: using System ; class S t r i n g M a n i p u l a t i o n { static void Main ( ) { string [ ] names = new string [ ] { ” Haitham ” , ” Mohamed ” , ”Ahmed ” , - ” Saed ” } ; Console . WriteLine ( ) ; showNames ( names ) ; Console . WriteLine ( ) ; trimVals ( names ) ; Console . WriteLine ( ) ; showNames ( names ) ; } static void showNames ( string [ ] arr ) { for ( int i = 0 ; i <= arr . GetUpperBound ( 0 ) ; i++) Console . Write ( arr [ i ] ) ; } static void trimVals ( string [ ] arr ) { char [ ] charArr = new char [ ] { ' ' } ; for ( int i = 0 ; i<= arr . GetUpperBound ( 0 ) ; i++) { arr [ i ] = arr [ i ] . Trim ( charArr [ 0 ] ) ; arr [ i ] = arr [ i ] . TrimEnd ( charArr [ 0 ] ) ; } } } 6.2 String Builder The StringBuilder class provides access to mutable String objects. Objects of the String class are immutable, meaning that they cannot be changed. Every time you change the value of a String object, a new object is created to hold the value. StringBuilder objects, on the other hand, are mutable. When you make a change to a StringBuilder object, you are changing the original object, not working with a copy. In this section, we discuss how to use the StringBuilder class for those situations where many changes are to be to the String objects in your programs. The StringBuilder class is found in the System.Text namespace so you must import this namespace into your program before you can use StringBuilder objects. You can construct a StringBuilder object in one of three ways. The first way is to create the object using the default constructor: StringBuilder stBuff1 = new StringBuilder ( ) ;
  • 13.
    6.2. STRING BUILDER 117 This line creates the object stBu↵1 with the capacity to hold a string 16 characters in length. This capacity is assigned by default, but it can be changed by passing in a new capacity in a constructor call, like this: StringBuilder stBuff2 = New StringBuilder ( 2 5 ) ; This line builds an object that can initially hold 25 characters. The final constructor call takes a string as the argument: StringBuilder stBuff3 = New StringBuilder ( ” H e l l o , world ” ) ; The capacity is set to 16 because the string argument didn’t exceed 16 char- acters. Had the string argument been longer than 16, the capacity would have been set to 32. Every time the capacity of a StringBuilder object is exceeded, the capacity is increased by 16 characters. There are several prop- erties in the StringBuilder class that you can use to obtain information about a StringBuilder object. The Length property specifies the number of char- acters in the current instance and the Capacity property returns the current capacity of the instance. The MaxCapacity property returns the maximum number of characters allowed in the current instance of the object (though this is automatically increased if more characters are added to the object). The following program fragment demonstrates how to use these properties: StringBuilder stBuff = new StringBuilder ( ” Haitham El Ghareeb ” ) ; Console . WriteLine ( ” Length o f s t B u f f 3 : ” & stBuff . Length ( ) ) ; Console . WriteLine ( ” C a p a c i t y o f s t B u f f 3 : ” & stBuff . Capacity ( ) ) ; Console . WriteLine ( ”Maximum c a p a c i t y o f s t B u f f 3 : ” + stBuff . MaxCapacity ) ; The Length property can also be used to set the current length of a String- Builder object, as in stBuff . Length = 1 0 ; Console . Write ( stBuff3 ) ; This code outputs “Haitham El”. To ensure that a minimum capacity is maintained for a StringBuilder instance, you can call the EnsureCapacity method, passing in an integer that states the minimum capacity for the object. Here’s an example: stBuff . Ensu reCapaci ty ( 2 5 ) ; Another property you can use is the Chars property. This property either returns the character in the position specified in its argument or sets the
  • 14.
    118CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING character passed as an argument. The following code shows a simple example using the Chars property. StringBuilder stBuff = New StringBuilder ( ” Haitham El Ghareeb ” ) ; If ( stBuff . Chars ( 0 ) <> ”D” ) stBuff . Chars ( 0 ) = ”D” ; 6.2.1 Modifying StringBuilder Objects We can modify a StringBuilder object by appending new characters to the end of the object, inserting characters into an object, replacing a set of characters in an object with di↵erent characters, and remove characters from an object. You can add characters to the end of a StringBuilder object by using the Append method. This method takes a string value as an argument and concatenates the string to the end of the current value in the object. The following program demonstrates how the Append method works: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff As New StringBuilder ( ) ; String [ ] words = new string [ ] { ”now ” , ” i s ” , ” t h e ” , ” time ” , ” - f o r ” , ” a l l ” , ” good ” , ”men ” , ” t o ” , ”come ” , ” t o ” , ” t h e ” , - ” aid ” , ” o f ” , ” t h e i r ” , ” party ”} For ( int i = 0 ; i <= words . GetUpperBound ( 0 ) ; i++) stBuff . Append ( words ( index ) ) ; Console . WriteLine ( stBuff ) ; } } The output is, of course Now is the time for all good men to come to the aid of their party A formatted string can be appended to a StringBuilder object. A formatted string is a string that includes a format specification embedded in the string. There are too many format specifications to cover in this section, so we’ll just demonstrate a common specification. We can place a formatted number within a StringBuilder object like this: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ) ; Console . WriteLine ( ) ; stBuff . AppendFormat ( ”Your o r d e r i s f o r {0000} w i d g e t s . ” , 2 3 4 ) ; stBuff . AppendFormat ( ” nWe have {0000} w i d g e t s l e f t . ” , 1 2 ) ; Console . WriteLine ( stBuff ) ; }
  • 15.
    6.2. STRING BUILDER 119 } The format specification is enclosed within curly braces that are embedded in a string literal. The data after the comma is placed into the specification when the code is executed. Next is the Insert method. This method allows us to insert a string into the current StringBuilder object. The method can take up to three arguments. The first argument specifies the position to begin the insertion. The second argu- ment is the string you want to insert. The third argument, which is optional, is an integer that specifies the number of times you want to insert the string into the object. Here’s a small program that demonstrates how the Insert method is used: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ) ; stBuff . Insert ( 0 , ” H e l l o ” ) ; stBuff . Append ( ” world ” ) ; stBuff . Insert ( 5 , ” , ” ) ; Console . WriteLine ( stBuff ) ; char chars [ ] = new char [ ] { ' t ' , ' h ' , ' e ' , ' r ' , 'e ' } ; stBuff . Insert ( 5 , ” ” & chars ) ; Console . WriteLine ( stBuff ) ; } } The output is Hello, world Hello there, world The following program utilizes the Insert method using the third argument for specifying the number of insertions to make: StringBuilder stBuff = New StringBuilder ( ) ; stBuff . Insert ( 0 , ” and on ” , 6 ) ; Console . WriteLine ( stBuff ) ; The output is and on and on and on and on and on and on The StringBuilder class has a Remove method for removing characters from a StringBuilder object. This method takes two arguments: a starting position and the number of characters to remove. Here’s how it works: StringBuilder stBuff = New StringBuilder ( ” n o i s e i n+++++s t r i n g ” ) ; stBuff . Remove ( 9 , 5 ) ; Console . WriteLine ( stBuff ) ;
  • 16.
    120CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING The output is noise in string We can replace characters in a StringBuilder object with the Replace method. This method takes two arguments: the old string to replace and the new string to put in its place. The following code fragment demonstrates how the method works: StringBuilder stBuff = New StringBuilder ( ” r e c i e v e d e c i e v e r e c i e p t ” ) ; stBuff . Replace ( ” c i e ” , ” c e i ” ) ; Console . WriteLine ( stBuff ) ; Each “cie” is replaced with “cei”. When working with StringBuilder objects, you will often want to convert them to strings, perhaps in order to use a method that isn’t found in the StringBuilder class. You can do this with the ToString. This method returns a String instance of the current StringBuilder instance. An example is shown: Using System . Text ; class S t r i n g B u i l d e r M a n i p u l a t i o n { static void Main ( ) { StringBuilder stBuff = New StringBuilder ( ”HELLO WORLD” ) ; string st = stBuff . ToString ( ) ; st = st . ToLower ( ) ; st = st . Replace ( st . Substring ( 0 , 1 ) , st . Substring ( 0 , 1 ) . ToUpper ( ) ) ; stBuff . Replace ( stBuff . ToString , st ) ; Console . WriteLine ( stBuff ) ; } } This program displays the string “Hello world” by first converting stBu↵ to a string (the st variable), making all the characters in the string lowercase, capitalizing the first letter in the string, and then replacing the old string in the StringBuilder object with the value of st. The ToString method is used in the first argument to Replace because the first parameter is supposed to be a string. 6.3 Pattern Matching Whereas the String and StringBuilder classes provide a set of methods that can be used to process string-based data, the RegEx and its supporting classes provide much more power for string-processing tasks. String process- ing mostly involves looking for patterns in strings (pattern matching) and it is performed via a special language called a regular expression.
  • 17.
    6.3. PATTERN MATCHING 121 6.3.1 Regular Expression A regular expression is a language that describes patterns of characters in strings, along with descriptors for repeating characters, alternatives, and groupings of characters. Regular expressions can be used to perform both searches in strings and substitutions in strings. A regular expression itself is just a string of characters that define a pattern we want to search for in another string. Generally, the characters in a regular expression match themselves, so that the regular expression “the” matches that sequence of characters wherever they are found in a string. A regular expression can also include special characters that are called metacharacters. Metacharacters are used to signify repetition, alternation, or grouping. To use regular ex- pressions, we have to import the RegEx class into our programs. This class is found in the System.Text.RegularExpressions namespace. Once we have the class imported into our program, we have to decide what we want to do with the RegEx class. If we want to perform matching, we need to use the Match class. If we’re going to do substitutions, we don’t need the Match class. Instead, we can use the Replace method of the RegEx class. Let’s start by looking at how to match words in a string. Given a sample string, “the quick brown fox jumped over the lazy dog”, we want to find out where the word “the” is found in the string. The following program performs this task: using System ; using System . Text . R e g u l a r E x p r e s s i o n s ; class My RegExpSa mple { static void Main ( ) { Regex reg = New Regex ( ” t h e ” ) ; string str1 = ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ; Match matchSet ; int matchPos ; matchSet = reg . Match ( str1 ) ; If ( matchSet . Success ) { matchPos = matchSet . Index ; Console . WriteLine ( ” found match a t p o s i t i o n : ” + matchPos ) ; } } } The first thing we do is create a new RegEx object and pass the constructor the regular expression we’re trying to match. After we initialize a string to match against, we declare a Match object, matchSet. The Match class pro- vides methods for storing data concerning a match made with the regular expression. The If statement uses one of the Match class properties, Success, to determine if there was a successful match. If the value returns True, then the regular expression matched at least one substring in the string. Other-
  • 18.
    122CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING wise, the value stored in Success is False. There’s another way a program can check to see if a match is successful. You can pre-test the regular ex- pression by passing it and the target string to the IsMatch method. This method returns True if a match is generated by the regular expression and False otherwise. The method works like this: If ( Regex . IsMatch ( str1 , ” t h e ” ) ) { Match aMatch ; aMatch = reg . Match ( str1 ) ; } One problem with the Match class is that it only stores one match. In the preceding example, there are two matches for the substring “the”. We can use another class, the Matches class, to store multiple matches with a regular expression. We can store the matches in a MatchCollection object in order to work with all the matches found. Here’s an example (only the code inside the Main function is included): using System ; using System . Text . R e g u l a r E x p r e s s i o n s ; class chapter8 { static void Main ( ) { Regex reg = new Regex ( ” t h e ” ) ; string str1 = ” t h e q u i c k brown f o x jumped o v e r t h e l a z y dog ” ; M at ch Co l le ct io n matchSet ; matchSet = reg . Matches ( str1 ) ; i f ( matchSet . Count > 0 ) f o r e a c h ( Match aMatch in matchSet ) Console . WriteLine ( ” found a match a t : ” aMatch . Index ) ; Console . Read ( ) ; } } 6.4 Summary String processing is a common operation in most C# programs. The String class provides a multitude of methods for performing every kind of opera- tion on strings. Although the “classic” built-in string functions (Mid, InStr, etc.) are still available for use, we should prefer the String class methods to these functions, both for performance and for clarity. String class objects in C# are immutable, meaning that every time we make a change to an object, a new copy of the object is created. If you are creating long strings, or are making many changes to the same object, you should use the String- Bu↵er class instead. StringBuilder objects are mutable, allowing for much better performance. Regular expressions present powerful options for per-
  • 19.
    6.5. EXCERCISES 123 forming text processing and pattern matching. Regular expressions can run the gamut from very simple (“a”) to complex combinations that look more like line noise than executable code. Nonetheless, learning to use regular expressions will allow you to perform text processing on texts you would not even consider using tools such as the methods of the String class. 6.5 Excercises 1. Write a function that converts a phrase into pig Latin. A word is converted to pig Latin by removing the first character of the word, placing it at the back of the word, and adding the characters “ay” to the word. For example, “hello world” in pig Latin is “ellohay orldway.” Your function can assume that each word consists of at least two letters and that each word is separated by one space, with no punctuation marks. 2. Write a function that counts the occurrences of a word in a string. The function should return an integer. Do not assume that just one space separates words and a string can contain punctuation. Write the func- tion so that it works with either a String argument or a StringBuilder object. 3. Write a function that takes a number, such as 52, and returns the number as a word, as in fifty-two. 4. Write a subroutine that takes a simple sentence in noun-verb-object form and parses the sentence into its di↵erent parts. For example, the sentence “Mary walked the dog” is parsed into this: Noun: Mary Verb: walked Object: the dog This function should work with both String objects and StringBuilder objects. 5. Write regular expressions to match the following: • r a string consists of an “x”, followed by any three characters, and then a “y” • r a word ending in “ed” r a phone number • r an HTML anchor tag
  • 20.
    124CHAPTER 6. STRINGCLASS, STRING BUILDER, AND PATTERN MATCHING 6. Write a regular expression that finds all the words in a string that contain double letters, such as “deep” and “book”. 7. Write a regular expression that finds all the header tags (¡h1¿, ¡h2¿, etc.) in a Web page. 8. Write a function, using a regular expression that performs a simple search and replace in a string.