3104. Find Longest Self-Contained Substring

Problem Description

Given a string s, you need to find the length of the longest self-contained substring.

A substring t of string s is called self-contained if:

  1. t != s (the substring cannot be the entire string)
  2. For every character in t, that character does not appear anywhere else in s outside of t

In other words, a self-contained substring is a portion of the string where all occurrences of every character within that substring are completely contained within the substring itself - none of those characters appear in the rest of the string.

For example:

  • If s = "abcabc", the substring "abc" (from index 0 to 2) is self-contained because the characters 'a', 'b', and 'c' at positions 0, 1, 2 don't appear anywhere after index 2 in the remaining string.
  • If s = "abcab", there is no self-contained substring because every character appears both inside and outside any potential substring.

Return the length of the longest self-contained substring if one exists, otherwise return -1.

The key insight is that a self-contained substring must start at the first occurrence of some character and must include all occurrences of every character it contains. The algorithm tracks the first and last positions of each character, then checks potential substrings starting from each character's first occurrence to find valid self-contained substrings.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation is that if a substring is self-contained, it must include all occurrences of every character that appears in it. This means if character 'a' appears in our substring, we must include every 'a' in the entire string for the substring to be self-contained.

This leads to an important insight: a self-contained substring can only start at a position where a character appears for the first time in the string. Why? Because if we start at the second or later occurrence of a character, we've already left out at least one occurrence of that character, violating the self-contained property.

Consider the string "abcabc". If we want a substring containing 'a', we must start at index 0 (first 'a') and extend at least to index 3 (last 'a'). Starting at any other position would leave some 'a' outside our substring.

The approach then becomes:

  1. For each character, record where it first appears and where it last appears
  2. Try starting a substring at each character's first occurrence position
  3. As we extend the substring, keep track of the rightmost position we must reach to include all occurrences of characters we've seen so far
  4. If we encounter a character whose first occurrence is before our starting position, this substring cannot be self-contained
  5. If we successfully reach a point where we've included all necessary occurrences and haven't used the entire string, we have a valid self-contained substring

This greedy expansion approach works because once we commit to including a character, we must include all its occurrences, which might force us to include more characters, creating a cascading effect until we either find a valid boundary or determine the substring cannot be self-contained.

Learn more about Binary Search and Prefix Sum patterns.

Solution Implementation

1class Solution: 2 def maxSubstringLength(self, s: str) -> int: 3 # Build dictionaries to track first and last occurrence of each character 4 first_occurrence = {} 5 last_occurrence = {} 6 7 # Populate the occurrence dictionaries 8 for index, char in enumerate(s): 9 if char not in first_occurrence: 10 first_occurrence[char] = index 11 last_occurrence[char] = index 12 13 # Initialize result and get string length 14 max_length = -1 15 string_length = len(s) 16 17 # Try each character as a potential starting point for a valid substring 18 for char, start_index in first_occurrence.items(): 19 # Track the maximum index we need to include to satisfy all character occurrences 20 required_end = last_occurrence[char] 21 22 # Expand from start_index and check if we can form a valid substring 23 for current_index in range(start_index, string_length): 24 current_char = s[current_index] 25 char_first = first_occurrence[current_char] 26 char_last = last_occurrence[current_char] 27 28 # If current character's first occurrence is before our start, invalid substring 29 if char_first < start_index: 30 break 31 32 # Update the required end to include all occurrences of current character 33 required_end = max(required_end, char_last) 34 35 # Check if we've reached a valid substring endpoint 36 # Valid if: we've covered all required occurrences AND it's not the entire string 37 if required_end == current_index and current_index - start_index + 1 < string_length: 38 max_length = max(max_length, current_index - start_index + 1) 39 40 return max_length 41
1class Solution { 2 public int maxSubstringLength(String s) { 3 // Arrays to store the first and last occurrence index of each character 4 int[] firstOccurrence = new int[26]; 5 int[] lastOccurrence = new int[26]; 6 7 // Initialize first occurrence array with -1 (indicating character not found yet) 8 Arrays.fill(firstOccurrence, -1); 9 10 int stringLength = s.length(); 11 12 // Record first and last occurrence of each character 13 for (int i = 0; i < stringLength; ++i) { 14 int charIndex = s.charAt(i) - 'a'; 15 16 // If this is the first time we see this character 17 if (firstOccurrence[charIndex] == -1) { 18 firstOccurrence[charIndex] = i; 19 } 20 // Update last occurrence (will be updated for every occurrence) 21 lastOccurrence[charIndex] = i; 22 } 23 24 int maxLength = -1; 25 26 // Try starting a substring from each character's first occurrence 27 for (int charIdx = 0; charIdx < 26; ++charIdx) { 28 int startPos = firstOccurrence[charIdx]; 29 30 // Skip if this character doesn't exist in the string 31 if (startPos == -1) { 32 continue; 33 } 34 35 // Track the rightmost position we need to include 36 int requiredEndPos = lastOccurrence[charIdx]; 37 38 // Expand from start position and check if we can form a valid substring 39 for (int currentPos = startPos; currentPos < stringLength; ++currentPos) { 40 int currentCharIndex = s.charAt(currentPos) - 'a'; 41 int currentCharFirstPos = firstOccurrence[currentCharIndex]; 42 int currentCharLastPos = lastOccurrence[currentCharIndex]; 43 44 // If current character's first occurrence is before our start position, 45 // we cannot form a valid substring 46 if (currentCharFirstPos < startPos) { 47 break; 48 } 49 50 // Update the required end position to include all occurrences 51 // of characters we've seen so far 52 requiredEndPos = Math.max(requiredEndPos, currentCharLastPos); 53 54 // If we've reached the required end position and the substring 55 // is not the entire string, update the maximum length 56 if (requiredEndPos == currentPos && currentPos - startPos + 1 < stringLength) { 57 maxLength = Math.max(maxLength, currentPos - startPos + 1); 58 } 59 } 60 } 61 62 return maxLength; 63 } 64} 65
1class Solution { 2public: 3 int maxSubstringLength(string s) { 4 // Store the first occurrence index of each character ('a' to 'z') 5 // Initialize with -1 to indicate character hasn't appeared yet 6 vector<int> firstOccurrence(26, -1); 7 8 // Store the last occurrence index of each character ('a' to 'z') 9 vector<int> lastOccurrence(26); 10 11 int stringLength = s.length(); 12 13 // Populate first and last occurrence arrays for each character 14 for (int i = 0; i < stringLength; ++i) { 15 int charIndex = s[i] - 'a'; // Convert character to 0-based index 16 17 // Record first occurrence if not seen before 18 if (firstOccurrence[charIndex] == -1) { 19 firstOccurrence[charIndex] = i; 20 } 21 22 // Always update last occurrence 23 lastOccurrence[charIndex] = i; 24 } 25 26 int maxLength = -1; // Maximum valid substring length found 27 28 // Try each character as a potential starting point 29 for (int charIdx = 0; charIdx < 26; ++charIdx) { 30 int startPos = firstOccurrence[charIdx]; 31 32 // Skip if this character doesn't exist in the string 33 if (startPos == -1) { 34 continue; 35 } 36 37 // Track the maximum required ending position for current substring 38 int maxEndPos = lastOccurrence[charIdx]; 39 40 // Expand substring from startPos, checking validity 41 for (int currentPos = startPos; currentPos < stringLength; ++currentPos) { 42 int currentCharIndex = s[currentPos] - 'a'; 43 int currentCharFirstPos = firstOccurrence[currentCharIndex]; 44 int currentCharLastPos = lastOccurrence[currentCharIndex]; 45 46 // If current character appears before our start position, 47 // we cannot form a valid substring 48 if (currentCharFirstPos < startPos) { 49 break; 50 } 51 52 // Update maximum required ending position 53 maxEndPos = max(maxEndPos, currentCharLastPos); 54 55 // Check if we've reached a valid substring endpoint: 56 // - All required characters are included (maxEndPos == currentPos) 57 // - The substring is not the entire string (length < stringLength) 58 if (maxEndPos == currentPos && currentPos - startPos + 1 < stringLength) { 59 maxLength = max(maxLength, currentPos - startPos + 1); 60 } 61 } 62 } 63 64 return maxLength; 65 } 66}; 67
1/** 2 * Finds the maximum length of a substring where all characters in the substring 3 * appear only within that substring and nowhere else in the string. 4 * 5 * @param s - The input string containing only lowercase letters 6 * @returns The maximum length of such a substring, or -1 if no valid substring exists 7 */ 8function maxSubstringLength(s: string): number { 9 // Arrays to track first and last occurrence indices for each letter (a-z) 10 const firstOccurrence: number[] = Array(26).fill(-1); 11 const lastOccurrence: number[] = Array(26).fill(0); 12 const stringLength: number = s.length; 13 14 // Record the first and last occurrence index for each character 15 for (let i = 0; i < stringLength; ++i) { 16 const charIndex: number = s.charCodeAt(i) - 97; // Convert 'a'-'z' to 0-25 17 18 if (firstOccurrence[charIndex] === -1) { 19 firstOccurrence[charIndex] = i; 20 } 21 lastOccurrence[charIndex] = i; 22 } 23 24 let maxLength: number = -1; 25 26 // Try each character as a potential starting point for a valid substring 27 for (let charCode = 0; charCode < 26; ++charCode) { 28 const startIndex: number = firstOccurrence[charCode]; 29 30 // Skip if this character doesn't exist in the string 31 if (startIndex === -1) { 32 continue; 33 } 34 35 // Track the maximum required ending position for the current substring 36 let maxEndIndex: number = lastOccurrence[charCode]; 37 38 // Expand the substring from startIndex, checking each character 39 for (let currentIndex = startIndex; currentIndex < stringLength; ++currentIndex) { 40 const currentCharIndex: number = s.charCodeAt(currentIndex) - 97; 41 const currentCharFirstOccurrence: number = firstOccurrence[currentCharIndex]; 42 43 // If character appears before our start, this substring is invalid 44 if (currentCharFirstOccurrence < startIndex) { 45 break; 46 } 47 48 // Update the required ending position to include all occurrences 49 const currentCharLastOccurrence: number = lastOccurrence[currentCharIndex]; 50 maxEndIndex = Math.max(maxEndIndex, currentCharLastOccurrence); 51 52 // Check if we've found a complete valid substring 53 // (current position matches required end and it's not the entire string) 54 if (maxEndIndex === currentIndex && currentIndex - startIndex + 1 < stringLength) { 55 maxLength = Math.max(maxLength, currentIndex - startIndex + 1); 56 } 57 } 58 } 59 60 return maxLength; 61} 62

Solution Approach

First, we create two hash tables (or dictionaries) first and last to record the first and last occurrence positions of each character in the string:

first, last = {}, {} for i, c in enumerate(s):  if c not in first:  first[c] = i  last[c] = i

Next, we enumerate each character c as a potential starting point for our self-contained substring. We only consider positions where characters appear for the first time (stored in first):

for c, i in first.items():  mx = last[c] # Initialize mx with the last occurrence of character c

Here, i is the starting position of our potential substring, and mx tracks the rightmost position we must reach to include all occurrences of characters we've encountered.

For each starting position i, we traverse from position i to the end of the string:

for j in range(i, n):  a, b = first[s[j]], last[s[j]]

At each position j, we get:

  • a: the first occurrence position of character s[j]
  • b: the last occurrence position of character s[j]

If a < i, it means character s[j] has already appeared before our starting position i. This violates the self-contained property, so we break out of the inner loop:

if a < i:  break

Otherwise, we update mx to ensure we include all occurrences of s[j]:

mx = max(mx, b)

If mx == j, it means we've reached a position where we've included all necessary occurrences of all characters seen so far. If additionally j - i + 1 < n (ensuring the substring is not the entire string), we have found a valid self-contained substring:

if mx == j and j - i + 1 < n:  ans = max(ans, j - i + 1)

The algorithm continues this process for all possible starting positions and returns the maximum length found, or -1 if no valid self-contained substring exists.

The time complexity is O(n²) where n is the length of the string, as we potentially examine each substring starting from each character's first occurrence. The space complexity is O(k) where k is the number of unique characters in the string.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string s = "abcabc".

Step 1: Build first and last occurrence maps

  • Iterate through the string to record positions:
    • 'a': first = 0, last = 3
    • 'b': first = 1, last = 4
    • 'c': first = 2, last = 5

Step 2: Try each character's first occurrence as a starting point

Starting from 'a' (position 0):

  • Initialize: i = 0, mx = 3 (last position of 'a')
  • Position 0: character 'a'
    • First occurrence of 'a' = 0 (not < 0, so continue)
    • Update mx = max(3, 3) = 3
    • Check: mx (3) ≠ j (0), continue
  • Position 1: character 'b'
    • First occurrence of 'b' = 1 (not < 0, so continue)
    • Update mx = max(3, 4) = 4
    • Check: mx (4) ≠ j (1), continue
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 0, so continue)
    • Update mx = max(4, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (not < 0, so continue)
    • Update mx = max(5, 3) = 5
    • Check: mx (5) ≠ j (3), continue
  • Position 4: character 'b'
    • First occurrence of 'b' = 1 (not < 0, so continue)
    • Update mx = max(5, 4) = 5
    • Check: mx (5) ≠ j (4), continue
  • Position 5: character 'c'
    • First occurrence of 'c' = 2 (not < 0, so continue)
    • Update mx = max(5, 5) = 5
    • Check: mx (5) = j (5) ✓ and 5 - 0 + 1 = 6 = n, not valid (equals entire string)

Starting from 'b' (position 1):

  • Initialize: i = 1, mx = 4 (last position of 'b')
  • Position 1: character 'b'
    • First occurrence of 'b' = 1 (not < 1, so continue)
    • Update mx = max(4, 4) = 4
    • Check: mx (4) ≠ j (1), continue
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 1, so continue)
    • Update mx = max(4, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (0 < 1) ✗ Break!
    • Character 'a' appears before our start position, invalid

Starting from 'c' (position 2):

  • Initialize: i = 2, mx = 5 (last position of 'c')
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 2, so continue)
    • Update mx = max(5, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (0 < 2) ✗ Break!
    • Character 'a' appears before our start position, invalid

Step 3: Return result No valid self-contained substring was found, so return -1.


Let's try another example where a valid substring exists: s = "abcdefabc"

Step 1: Build maps

  • 'a': first = 0, last = 6
  • 'b': first = 1, last = 7
  • 'c': first = 2, last = 8
  • 'd': first = 3, last = 3
  • 'e': first = 4, last = 4
  • 'f': first = 5, last = 5

Step 2: Try starting from 'd' (position 3):

  • Initialize: i = 3, mx = 3
  • Position 3: character 'd'
    • First occurrence = 3 (not < 3, continue)
    • Update mx = max(3, 3) = 3
    • Check: mx (3) = j (3) ✓ and 3 - 3 + 1 = 1 < 9
    • Found valid substring "d" with length 1

Continuing to check 'e' (position 4):

  • Similar process yields "e" with length 1

Continuing to check 'f' (position 5):

  • Similar process yields "f" with length 1

The longest self-contained substring has length 1 (any of "d", "e", or "f").

Time and Space Complexity

Time Complexity: O(n × |Σ|)

The algorithm consists of two main parts:

  1. The first loop iterates through the string once to build the first and last dictionaries, taking O(n) time.
  2. The second nested loop structure:
    • The outer loop iterates through each unique character in the first dictionary, which can be at most |Σ| characters
    • For each character, the inner loop can iterate up to n positions in the worst case
    • Therefore, this part takes O(|Σ| × n) time

The overall time complexity is O(n) + O(|Σ| × n) = O(n × |Σ|), where n is the length of string s and |Σ| represents the size of the character set (26 for lowercase letters).

Space Complexity: O(|Σ|)

The algorithm uses:

  • Two dictionaries first and last to store the first and last occurrence indices of each character
  • Each dictionary can store at most |Σ| key-value pairs (one for each unique character in the string)
  • A few additional variables (ans, n, mx, etc.) that use constant space

Therefore, the space complexity is O(|Σ|), where |Σ| = 26 for lowercase English letters.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Breaking Early When Character Appears Before Start Position

A critical pitfall is forgetting to break the inner loop when encountering a character that has already appeared before the current starting position. This would lead to incorrect results by considering invalid substrings.

Incorrect Implementation:

for current_index in range(start_index, string_length):  current_char = s[current_index]  char_first = first_occurrence[current_char]  char_last = last_occurrence[current_char]   # MISSING: Should break if char_first < start_index   required_end = max(required_end, char_last)  # ... rest of code

Why This Fails: Without the break condition, the algorithm would continue processing even when it encounters characters that violate the self-contained property. For example, with s = "abcabc", when starting from index 3 (second 'a'), it would incorrectly consider the substring "abc" at indices 3-5 as valid, even though 'a', 'b', 'c' also appear at indices 0-2.

Solution: Always include the break condition:

if char_first < start_index:  break

2. Incorrect Validation of Substring Length

Another common mistake is using <= instead of < when checking if the substring length is less than the total string length.

Incorrect Implementation:

if required_end == current_index and current_index - start_index + 1 <= string_length:  max_length = max(max_length, current_index - start_index + 1)

Why This Fails: Using <= would allow the entire string to be considered as a valid self-contained substring, which violates the constraint that t != s. This would return the length of the entire string instead of -1 for strings like s = "aaa" where no proper self-contained substring exists.

Solution: Use strict inequality:

if required_end == current_index and current_index - start_index + 1 < string_length:  max_length = max(max_length, current_index - start_index + 1)

3. Not Initializing required_end Correctly

Forgetting to initialize required_end with the last occurrence of the starting character can cause the algorithm to miss valid substrings.

Incorrect Implementation:

for char, start_index in first_occurrence.items():  required_end = start_index # Wrong: should be last_occurrence[char]  # ... rest of code

Why This Fails: If a character appears multiple times, we must ensure all its occurrences are included in the substring. Starting with required_end = start_index would fail to account for later occurrences of the starting character itself.

Solution: Initialize with the last occurrence of the starting character:

required_end = last_occurrence[char]
Loading...
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which technique can we use to find the middle of a linked list?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More