Table of contents
Open Table of contents
Problem
Given a string s
, partition the string into one or more substrings such that the characters in each substring are unique. The goal is to minimize the number of substrings in such a partition.
Example 1:
Input: s = "abacaba"
Output: 4
Explanation:
Possible partitions are ("a","ba","cab","a") and ("ab","a","ca","ba"). Minimum number of substrings is 4.
Example 2:
Input: s = "ssssss"
Output: 6
Explanation:
The only valid partition is ("s","s","s","s","s","s").
Constraints:
1 <= s.length <= 10^5
s
consists of only English lowercase letters.
Explanation/Approach
The problem can be approached using a greedy strategy. We need to traverse the string and create substrings, ensuring that each character appears only once in each substring.
Greedy Approach:
- Initialization: Start with an empty substring and a set to track characters in the current substring.
- Traversing the String: For each character in
s
, check if it is already in the current substring (using the set). - Creating New Substrings: If the character is already in the set, start a new substring and clear the set.
- Counting Substrings: Keep track of the number of substrings formed during the traversal.
Solution: Greedy Approach
class Solution:
def partitionString(self, s: str) -> int:
# Set to store unique characters of the current substring
unique_chars = set()
# Count of substrings
substring_count = 0
for char in s:
# If char is already in the current substring, start a new one
# Increment the counter, this means a substring has been found
if char in unique_chars:
unique_chars.clear()
substring_count += 1
unique_chars.add(char)
# Increment count for the last substring
substring_count += 1
return substring_count
Time and Memory Complexity
The time complexity of this solution is O(n), where n is the length of the string s
. This is because we traverse the string once. The space complexity is O(1), as the set unique_chars
will have at most 26 characters (all lowercase English letters).
Explanation
This approach uses a greedy strategy, where we continuously form substrings with unique characters. When we encounter a character that is already in the current substring, we start a new substring. The count of substrings is incremented each time we start a new one.
Visual Representation
Consider the input s = "abacaba"
.
-
Initialization:
- Substring count = 0
- Unique characters set =
{}
-
Processing Each Character:
- Process ‘a’: Substring count = 1, Set =
{a}
- Process ‘b’: Substring count = 1, Set =
{a, b}
- Process ‘a’: Substring count = 2, Start new substring, Set =
{a}
- Process ‘c’: Substring count = 2, Set =
{a, c}
- Process ‘a’: Substring count = 3, Start new substring, Set =
{a}
- Process ‘b’: Substring count = 3, Set =
{a, b}
- Process ‘a’: Substring count = 4, Start new substring, Set =
{a}
- Process ‘a’: Substring count = 1, Set =
-
Final Substring Count:
- Total substrings = 4 (The last increment happens after the loop).
The process of creating and tracking substrings can be visualized as follows:
Solution: Using Last Seen Indices
class Solution:
def partitionString(self, s):
# Array to track each characters and the last index it was seen
last_seen = [-1]*26
count = 1
# The index of current substring start
sub_start = 0
for i, char in enumerate(s):
# If this char was seen at any index after `sub_start` it means it is a duplicate
if last_seen[ord(char) - ord("a")] >= sub_start:
count += 1
# Update substring start index to the current index
sub_start = i
# Update last time the char has been seen
last_seen[ord(char) - ord("a")] = i
return count
Time and Memory Complexity
- Time: O(n)
- Space: O(1)
Explanation
This optimized approach keeps track of the last seen index of each character in the string. As we traverse the string, we update these indices. We also keep track of the farthest last seen index. When our current index in the string matches this farthest index, it signals the end of a substring with unique characters, prompting us to start a new substring.
Visual Representation
Consider the input s = "abacaba"
.
-
Tracking Last Seen Indices:
- Process ‘a’: last_seen[‘a’] = 0, sub_start = 0
- Process ‘b’: last_seen[‘b’] = 1, sub_start = 0
- Process ‘a’: last_seen[‘a’] = 2, sub_start = 2
- Process ‘c’: last_seen[‘c’] = 3, sub_start = 2
- Process ‘a’: last_seen[‘a’] = 4, sub_start = 4
- Process ‘b’: last_seen[‘b’] = 5, sub_start = 4
- Process ‘a’: last_seen[‘a’] = 6, sub_start = 6
-
Incrementing Substring Count:
- Increment count at indices 2, 4 and 6.
-
Final Substring Count:
- Total substrings = 4.