Subsequence
We have 1) subset 2) subarray and 3) subsequence problems. In this article, I am going to summary some subsequence problems.
792. Number of Matching Subsequences
https://leetcode.com/problems/number-of-matching-subsequences/
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
from functools import lru_cache
@lru_cache(None)
def good(i,j,word):
if not word:return True
w = word[0]
if w not in s_i:return False
begin_i = bisect.bisect(s_i[w], j)
valid_idx = s_i[w][begin_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word[1:])
return sum(good(0,-1,w) for w in words)
Above solution is my first version naive DFS solution and got TLE
28 / 49 test cases passed.Status:
Time Limit Exceeded
Similar idea this code works
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
def isMatch(word, w_i, d_i):
if w_i == len(word): return True
l = dict_idxs[word[w_i]]
if len(l) == 0 or d_i > l[-1]: return False
i = l[bisect.bisect_left(l, d_i)]
return isMatch(word, w_i + 1, i + 1)dict_idxs = collections.defaultdict(list)
for i in range(len(S)):
dict_idxs[S[i]].append(i)
return sum(isMatch(word, 0, 0) for word in words)
The above code is from
Finally, I realize that my original code is on the edge of AC.
This version is TLE
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
#from functools import lru_cache
#@lru_cache(None)
def good(i,j,word):
if i==len(word):return True
w = word[i]
d = s_i[w]
if not d or d[-1]<=j:return False
valid_i = bisect.bisect(d, j)
valid_idx = d[valid_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word)
#return good(i+1,d[valid_i], word)
return sum(good(0,-1,w) for w in words)
If we don’t generate a valid_idx list. Instead, if we directly use valid_i, we will get AC. From this example, one lesson is learned. Do not slice a list from a large list, it is kind of time and memory consuming.
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
#from functools import lru_cache
#@lru_cache(None)
def good(i,j,word):
if i==len(word):return True
w = word[i]
d = s_i[w]
if not d or d[-1]<=j:return False
valid_i = bisect.bisect(d, j)
"""
valid_idx = d[valid_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word)
"""
return good(i+1,d[valid_i], word)
return sum(good(0,-1,w) for w in words)
runtime
Runtime: 812 ms, faster than 33.27% of Python3 online submissions for Number of Matching Subsequences.Memory Usage: 53.2 MB, less than 25.00% of Python3 online submissions for Number of Matching Subsequences.
This problem can also be solved by using another elegant way. See Stefan's post
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
waiting = collections.defaultdict(list)
for it in map(iter, words):
waiting[next(it)].append(it)
for c in S:
for it in waiting.pop(c, ()):
waiting[next(it, None)].append(it)
return len(waiting[None])
Complexity analysis
Since ‘The length of S will be in the range of [1, 50000]’, in the worst case, we will have the occurrence of the letter occurring the most as 50000.
Stefan’s solution complexity is O(len(S) + O(numbers of letters in all words))
This one in the worst case:
O(len(S) + O(numbers of letters in all words)*log(len(S)))
For reference, math.log(50000, 2) = 15.609640474436812