String Replace in Python | Python String Replace
By Rohit Sharma
Updated on Jul 03, 2023 | 8 min read | 6.87K+ views
Share
All Courses
For working professionals
Doctorate
Artificial Intelligence
MBA
Data Science
Marketing
Management
Education
Law
Gen AI & Agentic AI
For fresh graduates
Software & Tech
Data Science
Management
Marketing
Back
Doctorate
View All Doctorate Courses
Artificial Intelligence
View All AI Courses
Data Science
View All Data Science Courses
Marketing
View All Marketing Courses
Management
View All Management Courses
Education
View all Education Courses
Software & Tech
View All Software & Tech Courses
Gen AI & Agentic AI
View All Gen & Agentic AI Courses
Data Science
View All Data Science Courses
Management
View All Management Courses
Marketing
View All Marketing Courses
More
By Rohit Sharma
Updated on Jul 03, 2023 | 8 min read | 6.87K+ views
Share
Table of Contents
Replacing characters and strings in Python is a crucial task when it comes to Data Cleaning or Text Processing. Your data might have formatting issues with garbage characters that need to be removed, the categories might be having spelling issues, etc. Also while text preprocessing for NLP based problems, string replacement is the most basic and important step while preparing the textual data.
In this tutorial, we will be going over multiple ways to replace different types of strings. If you are interested to get into data science, check out our data science certifications. By the end of this tutorial, you will have the knowledge of the following:
The string replace function in Python must be used to swap out one or more instances of a character or substring from the original string with the new character or substring. Strings are immutable, as we all know. As a result, the string replace in Python creates a copy of the updated string (the oldValues and newValues are switched).
The replace(old_str, new_str, count) method consists of 3 arguments:
Return Type: str
By replacing every instance of the old substring with a new one, this function creates a replica of the string. Only the first count occurrences are replaced if the optional argument count is provided.
If the oldValue that we provided as a parameter to replace() is not present in the original string, the original string is presented as the output.
Time Complexity: O(n)
Space Complexity: O(n)
Imagine that while writing an article for your project, you made a few typing mistakes throughout the piece. Using Python replace character in string, you can fix them all in a matter of seconds. For greater comprehension, let’s code this.
oldString = ‘CSE is an acronym for Computer Scince Engineering’.
newString = oldString.replace(‘Scince’, ‘Science’);
print(‘old string:’, oldString)
print(‘new string:’, newString)
Output:
old string: CSE is an acronym for Computer Scince Engineering.
new string: CSE is an acronym for Computer Science Engineering.
Using the replace function, we can replace every full form that appears in the text with its corresponding abbreviation. For greater comprehension, let’s code this.
oldString = ‘Computer Science Engineering’
newString = oldString.replace(‘Computer Science Engineering’, ‘CSE’)
print(‘old string:’, oldString)
print(‘new string:’, newString)
Output:
old string: Computer Science Engineering.
new string: CSE
Let’s go over a few examples to understand the working.
Single replace
Mystr = “This is a sample string” Newstr = Mystr.replace(‘is’, ‘was’)
Popular Data Science Programs
#Output: Thwas was a sample string If you recall, Strings in Python are immutable. So when we call the replace method, it essentially makes another string object with the modified data. Moreover, we didn’t specify the count parameter in the above example. If not specified, the replace method will replace all the occurrences of the string.
Multiple replace
Mystr = “This is a sample string” Newstr = Mystr.replace(“s”, “X”) #Output: ThiX iX a Xample Xtring Multiple replace first n occurrences
If you only want first N occurrences,
Mystr = “This is a sample string” Newstr = Mystr.replace(“s”, “X”, 3)#Output: ThiX iX a Xample string Multiple strings replace
In the above examples, we replaced one string a different number of times. Now what if you want to replace different strings in the same big string. We can write an effective function for it and get it done using the same method.
Consider the example as above, but now we want to replace “h”, “is” and “ng” with “X”.
def MultipleStrings(mainStr, strReplaceList, newStr): # Iterating over the strings to be replaced for elem in strReplaceList: # Checking if string is in the main string if elem in mainStr : # Replace the string mainStr = mainStr.replace(elem, newStr) return mainStr Mystr = “This is a sample string” Newstr = MultipleStrings(Mystr, [‘h’, ‘is’, ‘ng’] , “X”) #Output: TXX X a sample striX Read: Python Tutorial
Data Science Courses to upskill
Explore Data Science Courses for Career Progression
Python’s regex is a module specifically for dealing with text data – be it finding substrings, replacing strings or anything. Regex has the sub() function to find and replace/substitute substrings easily. Let’s go over its syntax and a few use cases.
The regex.sub(pattern, replacement, original_string) function takes 3 arguments:
Same as the replace method, regex also creates another string object with the modified string. Let’s go over a few working examples.
Removing whitespace
Whitespaces can be treated as special characters and replaced with other characters. In the below example, we intend to replace whitespaces with “X”.
import re Mystr = “This is a sample string” # Replace all whitespaces in Mystr with ‘X’ Newstr = re.sub(r”\s+”, ‘X’, Mystr) #Output: ThisXisXaXsampleXstring As we see, all the whitespaces were replaced. The pattern is given by r”\s+” which means all the whitespace characters.
Removing all special characters
To remove all the special characters, we will pass a pattern which matches with all the special characters.
import re import string Mystr = “Tempo@@&[(000)]%%$@@66isit$$#$%-+Str” pattern = r'[‘ + string.punctuation + ‘]’ # Replace all special characters in a string with X Newstr = re.sub(pattern, ‘X’, Mystr) #Output: TempoXXXXX000XXXXXXX66isitXXXXXXXStr Removing substring as case insensitive
In real life data, there might be cases where there might be many versions of the same word with different upper and lower case characters. To remove them all, putting all the words separately as the pattern wouldn’t be effective. The regex sub() function takes the flag re.IGNORECASE to ignore the cases. Let’s see how it works.
import re Mystr = “This IS a sample Istring” # Replace substring in a string with a case-insensitive approach Newstr = re.sub(r’is’,‘**’, Mystr, flags=re.IGNORECASE) #Output: Th** ** a sample **tring upGrad’s Exclusive Data Science Webinar for you –
Watch our Webinar on How to Build Digital & Data Mindset?
Removing multiple characters using regex
The regex function can easily remove multiple characters from a string. Below is an example.
import re Mystr = “This is a sample string” pattern = r'[hsa]’ # Remove characters ‘h’, ‘s’ and ‘a’ from a string Newstr = re.sub(pattern, ”, Mystr) #Output: Ti i mple tring Replacing using join()
Another way to remove or replace characters is to iterate through the string and check them against some condition.
charList = [‘h’, ‘s’, ‘a’] Mystr = “This is a sample string” # Remove all characters in list, from the string Newstr = ”.join((elem for elem in Mystr if elem not in charList)) #Output: Ti i mple tring Replacing using join() and filter()
Above example can also be done by using the filter function.
Mystr = “This is a sample string” charList = [‘h’, ‘s’, ‘a’] # Remove all characters in list, from the string Newstr = “”.join(filter(lambda k: k not in charList , Mystr)) #Output: Ti i mple trying Must Read: Fascinating Python Applications in Real World
Many times the numerical data is also present in the strings that might need to be removed and processed separately as a different feature. Let’s go over a few examples to see how these can be implemented.
Using regex
Consider the below string from which we need to remove the numeric data.
Mystr = “Sample string9211 of year 20xx” pattern = r'[0-9]’ # Match all digits in the string and replace them by empty string Newstr = re.sub(pattern, “”, Mystr) #Output: Sample string of year xx In the above code, we use the matching pattern r'[0-9]’ to match for all the digits.
Using join() function
We can also iterate upon the string and filter out the digits using the isdigit() method which returns False for alphabets.
Mystr = “Sample string9211 of year 20xx” # Iterates over the chars in the string and joins all characters except digits Newstr = “”.join((item for item in Mystr if not item.isdigit())) #Output: Sample string of year xx Using join() and filter()
Similarly, we can also put the filtering condition in the filter function to only return the characters which return True.
Mystr = “Sample string9211 of year 20xx” # Filter all the digits from characters in string & join remaining chars Newstr = “”.join(filter(lambda item: not item.isdigit(), Mystr)) #Output: Sample string of year xx For instance, Facebook rebranded as Meta. The Python replace() method can be used to make changes to the documentation.
str =”Facebook is an online social media service platform. Facebook is an American international technology conglomerate based in California.”
str_replace =str.replace(“Facebook”,”Meta”)
print(“String after replace:”+str_replace)
Output:
Meta is an online social media service platform. Meta is an American international technology conglomerate based in California.
We covered a lot of examples showing different ways to remove or replace characters/whitespaces/numbers from a string. We highly recommend you to try out more examples and different ways to do the above examples and also more examples of your own.
If you are curious to learn about python, data science, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
This function becomes very useful when you are applying data cleansing techniques to it. Unnecessary or garbage characters can be easily removed using this function. Replacing strings or characters is not only used in data cleansing but also in processing problems of NLP.
Having these types of methods to deal with strings is important as the whole data science field deals with huge chunks of data.
Strings are useful as you can store large amounts of data with ease using this data type. Python supports a good number of useful methods to perform operations on strings. Also, strings are mutable in Python which reduces the error generation and they are less expensive as well.
The string is one of the most used built-in data types in Python. Not only in python, but many other languages also support this as a pre-defined data type and support various methods to operate on them.
840 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Top Resources