Subsequence

Jimmy (xiaoke) Shen
3 min readJan 24, 2020

--

We have 1) subset 2) subarray and 3) subsequence problems. In this article, I am going to summary some subsequence problems.

792. Number of Matching Subsequences

https://leetcode.com/problems/number-of-matching-subsequences/

class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
from functools import lru_cache
@lru_cache(None)
def good(i,j,word):
if not word:return True
w = word[0]
if w not in s_i:return False
begin_i = bisect.bisect(s_i[w], j)
valid_idx = s_i[w][begin_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word[1:])

return sum(good(0,-1,w) for w in words)

Above solution is my first version naive DFS solution and got TLE

28 / 49 test cases passed.Status:

Time Limit Exceeded

Similar idea this code works

class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
def isMatch(word, w_i, d_i):
if w_i == len(word): return True
l = dict_idxs[word[w_i]]
if len(l) == 0 or d_i > l[-1]: return False
i = l[bisect.bisect_left(l, d_i)]
return isMatch(word, w_i + 1, i + 1)
dict_idxs = collections.defaultdict(list)
for i in range(len(S)):
dict_idxs[S[i]].append(i)
return sum(isMatch(word, 0, 0) for word in words)

The above code is from

Finally, I realize that my original code is on the edge of AC.

This version is TLE

class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
#from functools import lru_cache
#@lru_cache(None)
def good(i,j,word):
if i==len(word):return True
w = word[i]
d = s_i[w]
if not d or d[-1]<=j:return False
valid_i = bisect.bisect(d, j)
valid_idx = d[valid_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word)
#return good(i+1,d[valid_i], word)

return sum(good(0,-1,w) for w in words)

If we don’t generate a valid_idx list. Instead, if we directly use valid_i, we will get AC. From this example, one lesson is learned. Do not slice a list from a large list, it is kind of time and memory consuming.

class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
s_i = collections.defaultdict(list)
for i,ch in enumerate(S):
s_i[ch].append(i)
#from functools import lru_cache
#@lru_cache(None)
def good(i,j,word):
if i==len(word):return True
w = word[i]
d = s_i[w]
if not d or d[-1]<=j:return False
valid_i = bisect.bisect(d, j)
"""
valid_idx = d[valid_i:]
if not valid_idx:return False
return good(i+1,valid_idx[0], word)
"""
return good(i+1,d[valid_i], word)

return sum(good(0,-1,w) for w in words)

runtime

Runtime: 812 ms, faster than 33.27% of Python3 online submissions for Number of Matching Subsequences.Memory Usage: 53.2 MB, less than 25.00% of Python3 online submissions for Number of Matching Subsequences.

This problem can also be solved by using another elegant way. See Stefan's post

https://leetcode.com/problems/number-of-matching-subsequences/discuss/117634/Efficient-and-simple-go-through-words-in-parallel-with-explanation

class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
waiting = collections.defaultdict(list)
for it in map(iter, words):
waiting[next(it)].append(it)
for c in S:
for it in waiting.pop(c, ()):
waiting[next(it, None)].append(it)
return len(waiting[None])

Complexity analysis

Since ‘The length of S will be in the range of [1, 50000]’, in the worst case, we will have the occurrence of the letter occurring the most as 50000.

Stefan’s solution complexity is O(len(S) + O(numbers of letters in all words))
This one in the worst case:
O(len(S) + O(numbers of letters in all words)*log(len(S)))
For reference, math.log(50000, 2) = 15.609640474436812

--

--

No responses yet