Using bit manipulation to speed up the computation

Solution 1: brute force by organizing the words in 26 categories (run time 1016 ms).

  • Naively change words into integers by using bit minupulation.
  • Organize the words in 26 categories. (has “a”, has “b”, …)
  • For each puzzle, check the words in the category of puzzle[0]

The complexity for this one is hard to analysis. I though it may get TLE, actually, it can get AC. The reason should be C++ and bit manipulation are fast.

class Solution {
public:
int get_word_mask(string &word){
int ret = 0;
for (auto & ch: word){
ret |= (1 << (ch - 'a'));
}
return ret;
}
vector<int> findNumOfValidWords(vector<string>& words, vector<string>& puzzles) {
vector<vector<int>> data(26);
for (auto &word: words){
auto word_mask = get_word_mask(word);
for (int i = 0; i < 26; ++i){
if ((1 << i) & word_mask){
if(__builtin_popcount(word_mask) <= 7) data[i].push_back(word_mask);
}
}
}
const int n = puzzles.size();
vector<int> ret(n, 0);
for (int i = 0; i < n; ++i){
auto p = puzzles[i];
auto p_mask = get_word_mask(p);
for (auto & x: data[p[0] - 'a']){
ret[i] += ((x & p_mask) == x);
}
}
return ret;
}
};

Solution 2: Check each subset of a puzzle (108 ms).

The idea is checking each subset of a puzzle

As each puzzle has length of 7 and we have to include the first one, so we only have 2⁶ possibilities. The complexity will be O(M*2⁶) where M is the length of the puzzles.

class Solution {
public:
int get_word_mask(string &word){
int ret = 0;
for (auto & ch: word){
ret |= (1 << (ch - 'a'));
}
return ret;
}
vector<int> findNumOfValidWords(vector<string>& words, vector<string>& puzzles) {
unordered_map<int, int> m;
for (auto &word: words){
auto word_mask = get_word_mask(word);
if (__builtin_popcount(word_mask) > 7) continue;
m[word_mask]++;
}
const int n = puzzles.size();
vector<int> ret(n, 0);
for (int i = 0; i < n; ++i){
auto p = puzzles[i];
int p_subset = 1 << (p[0] - 'a');
// find all subsets of this puzzle
for (int mask = 0; mask < (1 << 6); ++mask){
int this_subset = p_subset;
for (int k = 0; k < 6; ++k){
if ((1 << k) & mask){
this_subset |= 1 << (p[k+1]-'a');
}
}
ret[i] += m[this_subset];
}
}
return ret;
}
};

Python code will be much shorter by using frozen set. The reason of using frozen set is it is hashable.

class Solution:
def findNumOfValidWords(self, words: List[str], puzzles: List[str]) -> List[int]:
count = collections.Counter(frozenset(w) for w in words)
res = []
for p in puzzles:
cur = 0
for k in range(len(p)):
for c in itertools.combinations(p[1:], k):
cur += count[frozenset(tuple(p[0]) + c)]
res.append(cur)
return res

Thanks for reading.

--

--