758. Bold Words in String

MediumTrieArrayHash TableStringString Matching
Leetcode Link

Problem Description

You are given an array of keyword strings words and a string s. Your task is to make all occurrences of the keywords in s bold by wrapping them with HTML bold tags <b> and </b>.

The key requirements are:

  1. Find all occurrences of each keyword from words that appear in string s
  2. Wrap these occurrences with <b> and </b> tags to make them bold
  3. Use the minimum number of tags possible - if keywords overlap or are adjacent, they should share the same pair of bold tags
  4. The tags must form valid HTML combinations

For example, if you have overlapping keywords that both need to be bolded, instead of creating nested or separate tags like <b>key</b><b>word</b>, you should merge them into a single bolded section <b>keyword</b>.

The algorithm uses a Trie data structure to efficiently find all keyword matches in the string. It:

  • Builds a Trie from all the keywords
  • Finds all matching intervals [start, end] where keywords appear in s
  • Merges overlapping or adjacent intervals to minimize the number of bold tags
  • Constructs the final string by inserting <b> and </b> tags at the appropriate positions

The solution ensures that overlapping or consecutive bold sections are merged into single continuous bold regions, achieving the requirement of using the least number of tags possible.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The core challenge is efficiently finding all occurrences of multiple keywords in a string and then merging overlapping regions. Let's think about how to approach this step by step.

First, we need to find all keyword matches. A naive approach would be to check each keyword at every position in the string, but this would be inefficient with time complexity O(n * m * k) where n is the length of string, m is the number of keywords, and k is the average keyword length.

This is where the Trie comes in. By building a Trie from all keywords, we can simultaneously check for all possible keyword matches starting from any position in just one traversal. At each position i in the string, we traverse the Trie character by character. Whenever we reach a node marked as is_end, we know we've found a complete keyword match from position i to the current position.

Once we have all the matching intervals, we face the second challenge: merging overlapping or adjacent intervals. Why do we need this? Consider the string "aaabbcc" with keywords ["aaa", "aab", "bc"]. Without merging, we might get <b>aaa</b><b>b</b><b>bc</b>c, but the optimal result should be <b>aaabbc</b>c with just one pair of tags.

The merging logic follows a simple rule: if two intervals overlap or are adjacent (meaning the end of one interval is at position j and the start of the next is at position j+1 or less), they should be combined into a single interval. We sort the intervals by start position and then iterate through them, extending the current interval when overlaps occur or starting a new interval when there's a gap.

Finally, we reconstruct the string by iterating through the original string and the merged intervals simultaneously, inserting bold tags at the appropriate positions. This ensures we maintain the original string content while adding the minimum number of tags needed.

The beauty of this approach is that it separates the problem into three independent, manageable steps: finding matches efficiently with a Trie, merging intervals to minimize tags, and reconstructing the final string with proper tag placement.

Learn more about Trie patterns.

Solution Implementation

1class Trie: 2 """Trie data structure for efficient prefix matching""" 3 4 def __init__(self): 5 # Initialize children array for ASCII characters (128 slots) 6 self.children = [None] * 128 7 # Flag to mark if current node represents end of a word 8 self.is_end = False 9 10 def insert(self, word: str) -> None: 11 """Insert a word into the trie""" 12 node = self 13 for char in word: 14 # Get ASCII value of character 15 index = ord(char) 16 # Create new node if path doesn't exist 17 if node.children[index] is None: 18 node.children[index] = Trie() 19 # Move to next node 20 node = node.children[index] 21 # Mark the end of word 22 node.is_end = True 23 24 25class Solution: 26 def boldWords(self, words: List[str], s: str) -> str: 27 """ 28 Add bold tags around substrings that appear in words list. 29 30 Args: 31 words: List of words to search for in string s 32 s: Target string to add bold tags to 33 34 Returns: 35 String with bold tags added around matching words 36 """ 37 # Build trie from all words 38 trie = Trie() 39 for word in words: 40 trie.insert(word) 41 42 n = len(s) 43 # Store intervals [start, end] of matched words 44 matched_intervals = [] 45 46 # Find all matching substrings starting from each position 47 for i in range(n): 48 node = trie 49 for j in range(i, n): 50 index = ord(s[j]) 51 # Stop if no matching path in trie 52 if node.children[index] is None: 53 break 54 node = node.children[index] 55 # Record interval if we found a complete word 56 if node.is_end: 57 matched_intervals.append([i, j]) 58 59 # Return original string if no matches found 60 if not matched_intervals: 61 return s 62 63 # Merge overlapping or adjacent intervals 64 merged_intervals = [] 65 start, end = matched_intervals[0] 66 67 for curr_start, curr_end in matched_intervals[1:]: 68 # Check if intervals are disjoint (not adjacent or overlapping) 69 if end + 1 < curr_start: 70 # Save current interval and start new one 71 merged_intervals.append([start, end]) 72 start, end = curr_start, curr_end 73 else: 74 # Merge intervals by extending end position 75 end = max(end, curr_end) 76 # Add the last interval 77 merged_intervals.append([start, end]) 78 79 # Build result string with bold tags 80 result = [] 81 string_index = 0 82 interval_index = 0 83 84 while string_index < n: 85 # If no more intervals, append remaining string 86 if interval_index == len(merged_intervals): 87 result.append(s[string_index:]) 88 break 89 90 interval_start, interval_end = merged_intervals[interval_index] 91 92 # Append characters before the bold section 93 if string_index < interval_start: 94 result.append(s[string_index:interval_start]) 95 96 # Add bold tags and the bold text 97 result.append('<b>') 98 result.append(s[interval_start:interval_end + 1]) 99 result.append('</b>') 100 101 # Move to next interval and update string position 102 interval_index += 1 103 string_index = interval_end + 1 104 105 return ''.join(result) 106
1/** 2 * Trie (Prefix Tree) data structure for efficient word searching 3 */ 4class Trie { 5 // Array to store child nodes for ASCII characters (128 possible values) 6 Trie[] children = new Trie[128]; 7 // Flag to mark if current node represents end of a word 8 boolean isEndOfWord; 9 10 /** 11 * Inserts a word into the trie 12 * @param word the word to insert 13 */ 14 public void insert(String word) { 15 Trie currentNode = this; 16 17 // Traverse through each character in the word 18 for (char character : word.toCharArray()) { 19 // Create new node if path doesn't exist 20 if (currentNode.children[character] == null) { 21 currentNode.children[character] = new Trie(); 22 } 23 // Move to the child node 24 currentNode = currentNode.children[character]; 25 } 26 // Mark the last node as end of word 27 currentNode.isEndOfWord = true; 28 } 29} 30 31/** 32 * Solution class to add bold tags around words found in the string 33 */ 34class Solution { 35 /** 36 * Adds bold HTML tags around words that appear in the given string 37 * @param words array of words to be bolded 38 * @param s the target string 39 * @return string with bold tags added 40 */ 41 public String boldWords(String[] words, String s) { 42 // Build trie with all words 43 Trie trie = new Trie(); 44 for (String word : words) { 45 trie.insert(word); 46 } 47 48 // Store all intervals [start, end] where words are found 49 List<int[]> intervals = new ArrayList<>(); 50 int stringLength = s.length(); 51 52 // Find all word occurrences starting from each position 53 for (int startPos = 0; startPos < stringLength; startPos++) { 54 Trie currentNode = trie; 55 56 // Try to match words starting from current position 57 for (int endPos = startPos; endPos < stringLength; endPos++) { 58 int charIndex = s.charAt(endPos); 59 60 // No match found, break inner loop 61 if (currentNode.children[charIndex] == null) { 62 break; 63 } 64 65 // Move to next node in trie 66 currentNode = currentNode.children[charIndex]; 67 68 // If we found a complete word, record the interval 69 if (currentNode.isEndOfWord) { 70 intervals.add(new int[] {startPos, endPos}); 71 } 72 } 73 } 74 75 // No words found, return original string 76 if (intervals.isEmpty()) { 77 return s; 78 } 79 80 // Merge overlapping or adjacent intervals 81 List<int[]> mergedIntervals = new ArrayList<>(); 82 int currentStart = intervals.get(0)[0]; 83 int currentEnd = intervals.get(0)[1]; 84 85 for (int i = 1; i < intervals.size(); i++) { 86 int nextStart = intervals.get(i)[0]; 87 int nextEnd = intervals.get(i)[1]; 88 89 // If intervals are not adjacent or overlapping, save current and start new 90 if (currentEnd + 1 < nextStart) { 91 mergedIntervals.add(new int[] {currentStart, currentEnd}); 92 currentStart = nextStart; 93 currentEnd = nextEnd; 94 } else { 95 // Extend current interval if overlapping or adjacent 96 currentEnd = Math.max(currentEnd, nextEnd); 97 } 98 } 99 // Add the last interval 100 mergedIntervals.add(new int[] {currentStart, currentEnd}); 101 102 // Build result string with bold tags 103 int currentPos = 0; 104 int intervalIndex = 0; 105 StringBuilder result = new StringBuilder(); 106 107 while (currentPos < stringLength) { 108 // All intervals processed, append remaining string 109 if (intervalIndex == mergedIntervals.size()) { 110 result.append(s.substring(currentPos)); 111 break; 112 } 113 114 int intervalStart = mergedIntervals.get(intervalIndex)[0]; 115 int intervalEnd = mergedIntervals.get(intervalIndex)[1]; 116 117 // Append non-bold portion before current interval 118 if (currentPos < intervalStart) { 119 result.append(s.substring(currentPos, intervalStart)); 120 } 121 122 // Append bold portion with tags 123 intervalIndex++; 124 result.append("<b>"); 125 result.append(s.substring(intervalStart, intervalEnd + 1)); 126 result.append("</b>"); 127 128 // Update position to after current interval 129 currentPos = intervalEnd + 1; 130 } 131 132 return result.toString(); 133 } 134} 135
1class Trie { 2public: 3 vector<Trie*> children; 4 bool isEnd; 5 6 Trie() { 7 children.resize(128); // ASCII character set 8 isEnd = false; 9 } 10 11 // Insert a word into the trie 12 void insert(string word) { 13 Trie* node = this; 14 for (char c : word) { 15 if (!node->children[c]) { 16 node->children[c] = new Trie(); 17 } 18 node = node->children[c]; 19 } 20 node->isEnd = true; 21 } 22}; 23 24class Solution { 25public: 26 string boldWords(vector<string>& words, string s) { 27 // Build trie with all words 28 Trie* trie = new Trie(); 29 for (const string& word : words) { 30 trie->insert(word); 31 } 32 33 int n = s.size(); 34 vector<pair<int, int>> intervals; 35 36 // Find all matching word intervals in string s 37 for (int i = 0; i < n; ++i) { 38 Trie* node = trie; 39 for (int j = i; j < n; ++j) { 40 int charIndex = s[j]; 41 if (!node->children[charIndex]) { 42 break; // No matching prefix, stop searching 43 } 44 node = node->children[charIndex]; 45 if (node->isEnd) { 46 // Found a complete word, record interval [i, j] 47 intervals.push_back({i, j}); 48 } 49 } 50 } 51 52 // No words found in string 53 if (intervals.empty()) { 54 return s; 55 } 56 57 // Merge overlapping or adjacent intervals 58 vector<pair<int, int>> mergedIntervals; 59 int start = intervals[0].first; 60 int end = intervals[0].second; 61 62 for (int i = 1; i < intervals.size(); ++i) { 63 int currentStart = intervals[i].first; 64 int currentEnd = intervals[i].second; 65 66 if (end + 1 < currentStart) { 67 // Non-overlapping and non-adjacent interval 68 mergedIntervals.push_back({start, end}); 69 start = currentStart; 70 end = currentEnd; 71 } else { 72 // Overlapping or adjacent interval, merge them 73 end = max(end, currentEnd); 74 } 75 } 76 mergedIntervals.push_back({start, end}); 77 78 // Build result string with bold tags 79 string result = ""; 80 int stringIndex = 0; 81 int intervalIndex = 0; 82 83 while (stringIndex < n) { 84 if (intervalIndex == mergedIntervals.size()) { 85 // No more intervals, append remaining string 86 result += s.substr(stringIndex); 87 break; 88 } 89 90 int intervalStart = mergedIntervals[intervalIndex].first; 91 int intervalEnd = mergedIntervals[intervalIndex].second; 92 93 // Append non-bold part before current interval 94 if (stringIndex < intervalStart) { 95 result += s.substr(stringIndex, intervalStart - stringIndex); 96 } 97 98 // Append bold part 99 result += "<b>"; 100 result += s.substr(intervalStart, intervalEnd - intervalStart + 1); 101 result += "</b>"; 102 103 stringIndex = intervalEnd + 1; 104 ++intervalIndex; 105 } 106 107 return result; 108 } 109}; 110
1// Trie node structure for efficient prefix matching 2class TrieNode { 3 children: (TrieNode | null)[]; 4 isEnd: boolean; 5 6 constructor() { 7 this.children = new Array(128).fill(null); // ASCII character set 8 this.isEnd = false; 9 } 10} 11 12// Insert a word into the trie 13function insertWord(root: TrieNode, word: string): void { 14 let node = root; 15 for (const char of word) { 16 const charCode = char.charCodeAt(0); 17 if (!node.children[charCode]) { 18 node.children[charCode] = new TrieNode(); 19 } 20 node = node.children[charCode]!; 21 } 22 node.isEnd = true; 23} 24 25function boldWords(words: string[], s: string): string { 26 // Build trie with all words for efficient prefix matching 27 const trie = new TrieNode(); 28 for (const word of words) { 29 insertWord(trie, word); 30 } 31 32 const n = s.length; 33 const intervals: [number, number][] = []; 34 35 // Find all matching word intervals in string s 36 for (let i = 0; i < n; i++) { 37 let node: TrieNode | null = trie; 38 for (let j = i; j < n; j++) { 39 const charCode = s.charCodeAt(j); 40 if (!node.children[charCode]) { 41 break; // No matching prefix, stop searching from this position 42 } 43 node = node.children[charCode]; 44 if (node!.isEnd) { 45 // Found a complete word, record interval [i, j] 46 intervals.push([[i, j]]); 47 } 48 } 49 } 50 51 // No words found in string, return original 52 if (intervals.length === 0) { 53 return s; 54 } 55 56 // Merge overlapping or adjacent intervals 57 const mergedIntervals: [number, number][] = []; 58 let start = intervals[0][0]; 59 let end = intervals[0][1]; 60 61 for (let i = 1; i < intervals.length; i++) { 62 const currentStart = intervals[i][0]; 63 const currentEnd = intervals[i][1]; 64 65 if (end + 1 < currentStart) { 66 // Non-overlapping and non-adjacent interval 67 mergedIntervals.push([start, end]); 68 start = currentStart; 69 end = currentEnd; 70 } else { 71 // Overlapping or adjacent interval, merge them 72 end = Math.max(end, currentEnd); 73 } 74 } 75 mergedIntervals.push([start, end]); 76 77 // Build result string with bold tags 78 let result = ""; 79 let stringIndex = 0; 80 let intervalIndex = 0; 81 82 while (stringIndex < n) { 83 if (intervalIndex === mergedIntervals.length) { 84 // No more intervals, append remaining string 85 result += s.substring(stringIndex); 86 break; 87 } 88 89 const intervalStart = mergedIntervals[intervalIndex][0]; 90 const intervalEnd = mergedIntervals[intervalIndex][1]; 91 92 // Append non-bold part before current interval 93 if (stringIndex < intervalStart) { 94 result += s.substring(stringIndex, intervalStart); 95 } 96 97 // Append bold part with tags 98 result += "<b>"; 99 result += s.substring(intervalStart, intervalEnd + 1); 100 result += "</b>"; 101 102 stringIndex = intervalEnd + 1; 103 intervalIndex++; 104 } 105 106 return result; 107} 108

Solution Approach

The solution implements a three-phase approach: building a Trie for efficient pattern matching, finding all keyword occurrences, and merging overlapping intervals.

Phase 1: Building the Trie

We create a Trie class with 128 children (to handle ASCII characters) and an is_end flag to mark complete words. The insert method adds each keyword character by character:

def insert(self, word):  node = self  for c in word:  idx = ord(c) # Get ASCII value  if node.children[idx] is None:  node.children[idx] = [Trie](/problems/trie_intro)()  node = node.children[idx]  node.is_end = True # Mark end of word

Phase 2: Finding All Keyword Matches

For each position i in string s, we traverse the Trie to find all keywords starting at that position:

pairs = [] for i in range(n):  node = [trie](/problems/trie_intro)  for j in range(i, n):  idx = ord(s[j])  if node.children[idx] is None:  break # No more matches possible  node = node.children[idx]  if node.is_end:  pairs.append([i, j]) # Found keyword from i to j

This generates intervals [start, end] for each keyword occurrence. For example, if "abc" appears at position 2, we get [2, 4].

Phase 3: Merging Overlapping Intervals

We merge intervals that overlap or are adjacent to minimize bold tags:

st, ed = pairs[0] t = [] for a, b in pairs[1:]:  if ed + 1 < a: # Gap between intervals  t.append([st, ed])  st, ed = a, b  else: # Overlapping or adjacent  ed = max(ed, b) # Extend current interval t.append([st, ed])

The condition ed + 1 < a checks if there's a gap. If ed = 3 and a = 4, they're adjacent (positions 3 and 4), so we merge them. If a = 5, there's a gap at position 4, so we start a new interval.

Phase 4: Reconstructing the String

Finally, we build the result string by iterating through both the original string and merged intervals:

ans = [] i = j = 0 while i < n:  if j == len(t):  ans.append(s[i:]) # Append remaining string  break  st, ed = t[j]  if i < st:  ans.append(s[i:st]) # Add non-bold portion  ans.append('<b>')  ans.append(s[st : ed + 1]) # Add bold portion  ans.append('</b>')  j += 1  i = ed + 1

The algorithm maintains two pointers: i for the current position in string s, and j for the current interval in the merged list. It adds non-bold portions between intervals and wraps bold portions with tags.

Time Complexity: O(n * m) where n is the length of string s and m is the maximum length of any keyword, as we check for keywords starting at each position.

Space Complexity: O(k * m) for the Trie where k is the number of keywords and m is their average length, plus O(n) for storing intervals.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works.

Input:

  • words = ["ab", "bc"]
  • s = "aabcd"

Step 1: Build the Trie

We construct a Trie from the keywords:

 root  / \  a b  | |  b* c*

The asterisk (*) indicates is_end = True, marking complete words.

Step 2: Find All Keyword Matches

We check each position in s = "aabcd":

  • Position 0 ('a'):

    • Follow 'a' in Trie → found
    • Follow 'a' → 'b' → found "ab" at [0, 1] ✓
    • No more matches from position 0
  • Position 1 ('a'):

    • Follow 'a' in Trie → found
    • Follow 'a' → 'b' → found "ab" at [1, 2] ✓
    • No more matches from position 1
  • Position 2 ('b'):

    • Follow 'b' in Trie → found
    • Follow 'b' → 'c' → found "bc" at [2, 3] ✓
    • No more matches from position 2
  • Position 3 ('c'):

    • Follow 'c' in Trie → not found
    • No matches from position 3
  • Position 4 ('d'):

    • Follow 'd' in Trie → not found
    • No matches from position 4

Found intervals: [[0, 1], [1, 2], [2, 3]]

Step 3: Merge Overlapping/Adjacent Intervals

Sort intervals (already sorted): [[0, 1], [1, 2], [2, 3]]

Merging process:

  • Start with [0, 1]
  • Check [1, 2]: Since 1 + 1 = 2 ≥ 1 (adjacent), merge → [0, 2]
  • Check [2, 3]: Since 2 + 1 = 3 ≥ 2 (adjacent), merge → [0, 3]

Merged intervals: [[0, 3]]

Step 4: Reconstruct String with Bold Tags

Original string: "aabcd" Merged intervals: [[0, 3]]

Building the result:

  • i = 0, j = 0
  • Interval [0, 3]: Add <b> + s[0:4] + </b><b>aabc</b>
  • i = 4, j = 1 (no more intervals)
  • Add remaining s[4:] → "d"

Final Result: "<b>aabc</b>d"

The keywords "ab" (appearing twice) and "bc" all overlap or are adjacent, so they're merged into one continuous bold section, using just one pair of bold tags instead of three.

Time and Space Complexity

Time Complexity: O(m * k + n^2)

  • Building the Trie: O(m * k) where m is the number of words and k is the average length of each word. Each word requires traversing and potentially creating nodes for each character.

  • Finding all matching intervals: O(n^2) where n is the length of string s. For each starting position i in the string (n positions), we potentially check up to n - i characters to find all words starting at position i. In the worst case, this gives us O(n^2) operations.

  • Merging intervals: O(p) where p is the number of matching pairs found. In the worst case, p could be O(n^2) if many overlapping words are found.

  • Building the final string: O(n) as we traverse through the string once to construct the result.

Overall time complexity is dominated by O(m * k + n^2).

Space Complexity: O(m * k * 128 + n)

  • Trie storage: O(m * k * 128) in the worst case. Each Trie node has an array of 128 children (for ASCII characters). With m words of average length k, we could have up to m * k nodes, each consuming O(128) space.

  • Pairs list: O(p) where p is the number of matching intervals, which could be O(n^2) in the worst case.

  • Merged intervals list t: O(n) in the worst case when all intervals are disjoint.

  • Result string construction: O(n) for the answer list and final string.

The space complexity is dominated by the Trie structure, giving us O(m * k * 128 + n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Interval Merging Logic

The most common pitfall is incorrectly determining when intervals should be merged. Many developers make the mistake of only checking for overlapping intervals (end >= curr_start) but forget to handle adjacent intervals.

Incorrect approach:

# This only merges overlapping intervals, not adjacent ones if end >= curr_start:  end = max(end, curr_end) else:  merged_intervals.append([start, end])  start, end = curr_start, curr_end

Why it's wrong: If you have the word "abc" at position [0,2] and "def" at position [3,5], and both are keywords, the output would be <b>abc</b><b>def</b> instead of the desired <b>abcdef</b>.

Correct approach:

# Check if there's a gap between intervals if end + 1 < curr_start: # Gap exists when end+1 < curr_start  merged_intervals.append([start, end])  start, end = curr_start, curr_end else: # Adjacent or overlapping  end = max(end, curr_end)

2. Off-by-One Errors in String Slicing

Another frequent mistake occurs when reconstructing the final string with bold tags. The intervals store inclusive end positions, but Python string slicing uses exclusive end indices.

Incorrect approach:

# Forgetting that interval_end is inclusive result.append(s[interval_start:interval_end]) # Missing the last character!

Correct approach:

# Add 1 to interval_end since it's inclusive result.append(s[interval_start:interval_end + 1])

3. Not Handling Empty or Edge Cases

Failing to check for empty matched intervals before attempting to merge them causes index errors.

Incorrect approach:

# Directly accessing first element without checking start, end = matched_intervals[0] # IndexError if matched_intervals is empty

Correct approach:

# Check if any matches were found if not matched_intervals:  return s  start, end = matched_intervals[0]

4. Inefficient Trie Implementation for ASCII Characters

Using a dictionary for Trie children when dealing with ASCII characters adds unnecessary overhead and complexity.

Less efficient approach:

class Trie:  def __init__(self):  self.children = {} # Dictionary approach  self.is_end = False   def insert(self, word):  node = self  for c in word:  if c not in node.children:  node.children[c] = Trie()  node = node.children[c]

More efficient approach:

class Trie:  def __init__(self):  self.children = [None] * 128 # Array for ASCII characters  self.is_end = False

5. Incorrect Pointer Updates After Adding Bold Tags

When reconstructing the string, updating the string index incorrectly can lead to duplicated or skipped characters.

Incorrect approach:

# Using interval_end instead of interval_end + 1 string_index = interval_end # This would duplicate the character at interval_end

Correct approach:

# Move past the last character of the current bold section string_index = interval_end + 1

Prevention Strategy

To avoid these pitfalls:

  1. Draw out examples with adjacent intervals like [0,2] and [3,5] to verify merge logic
  2. Test with edge cases including empty words list, no matches, and single character strings
  3. Use clear variable names like inclusive_end to remind yourself about index boundaries
  4. Add assertions during development to catch off-by-one errors early
  5. Walk through the algorithm with a simple example like words = ["ab", "bc"] and s = "abc" to ensure correct merging
Loading...
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following problems can be solved with backtracking (select multiple)


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More