LRU and LFU
For this kind of design problem, I feel that C++ is relatively easier as we have more data structure that can be used such as double linked list in C++. In this article I am going to use Python to implement the design problem.
LRU is relative easier than LFU. However, I found a post about LFU implementation. The post is very clear, I’d like to share that post firstly and try to follow a similar way to design and implement LRU.
LFU
Excellent implementation from Leetcode
Please read the original post. I am putting it here as I real like this one.
What is my data structure?
1. A Doubly linked Node
class Node:
+ key: int
+ value: int
+ freq: int
+ prev: Node
+ next: Node
2. A Doubly Linked List
Note: This part could be replaced by OrderedDict
, I implemented it by hand for clarity
class DLinkedList:
- sentinel: Node
+ size: int
+ append(node: Node) -> None
+ pop(node: Node) -> Node
3. Our LFUCache
class LFUCache:
- node: dict[key: int, node: Node]
- freq: dict[freq: int, lst: DlinkedList]
- minfreq: int
+ get(key: int) -> int
+ put(key: int, value: int) -> None
Visualization
Explanation
Each key is mapping to the corresponding node (self._node
), where we can retrieve the node in O(1)
time.
Each frequency freq
is mapped to a Doubly Linked List (self._freq
), where all nodes in the DLinkedList
have the same frequency, freq
. Moreover, each node will be always inserted in the head (indicating most recently used).
A minimum frequency self._minfreq
is maintained to keep track of the minimum frequency of across all nodes in this cache, such that the DLinkedList with the min frequency can always be retrieved in O(1) time.
Here is how the algorithm works
get(key)
- query the
node
by callingself._node[key]
- find the frequency by checking
node.freq
, assigned asf
, and query theDLinkedList
that this node is in, through callingself._freq[f]
- pop this node
- update node’s frequence, append the node to the new
DLinkedList
with frequencyf+1
- if the
DLinkedList
is empty andself._minfreq == f
, updateself._minfreq
tof+1
. - return
node.val
put(key, value)
- If key is already in cache, do the same thing as
get(key)
, and updatenode.val
asvalue
- Otherwise:
- if the cache is full, pop the least frequenly used element (*)
- add new node to
self._node
- add new node to
self._freq[1]
- reset
self._minfreq
to 1
(*) The least frequently used element is the tail element in the DLinkedList with frequency self._minfreq
Implementation
Below is the implementation with a detailed comment as well.
import collections
class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.prev = self.next = None
class DLinkedList:
""" An implementation of doubly linked list.
Two APIs provided:
append(node): append the node to the head of the linked list.
pop(node=None): remove the referenced node.
If None is given, remove the one from tail, which is the least recently used.
Both operation, apparently, are in O(1) complexity.
"""
def __init__(self):
self._sentinel = Node(None, None) # dummy node
self._sentinel.next = self._sentinel.prev = self._sentinel
self._size = 0
def __len__(self):
return self._size
def append(self, node):
node.next = self._sentinel.next
node.prev = self._sentinel
node.next.prev = node
self._sentinel.next = node
self._size += 1
def pop(self, node=None):
if self._size == 0:
return
if not node:
node = self._sentinel.prev
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1
return node
class LFUCache:
def __init__(self, capacity):
"""
:type capacity: int
Three things to maintain:
1. a dict, named as `self._node`, for the reference of all nodes given key.
That is, O(1) time to retrieve node given a key.
2. Each frequency has a doubly linked list, store in `self._freq`, where key
is the frequency, and value is an object of `DLinkedList`
3. The min frequency through all nodes. We can maintain this in O(1) time, taking
advantage of the fact that the frequency can only increment by 1. Use the following
two rules:
Rule 1: Whenever we see the size of the DLinkedList of current min frequency is 0,
the min frequency must increment by 1.
Rule 2: Whenever put in a new (key, value), the min frequency must 1 (the new node)
"""
self._size = 0
self._capacity = capacity
self._node = dict() # key: Node
self._freq = collections.defaultdict(DLinkedList)
self._minfreq = 0
def _update(self, node):
"""
This is a helper function that used in the following two cases:
1. when `get(key)` is called; and
2. when `put(key, value)` is called and the key exists.
The common point of these two cases is that:
1. no new node comes in, and
2. the node is visited one more times -> node.freq changed ->
thus the place of this node will change
The logic of this function is:
1. pop the node from the old DLinkedList (with freq `f`)
2. append the node to new DLinkedList (with freq `f+1`)
3. if old DlinkedList has size 0 and self._minfreq is `f`,
update self._minfreq to `f+1`
All of the above opeartions took O(1) time.
"""
freq = node.freq
self._freq[freq].pop(node)
if self._minfreq == freq and not self._freq[freq]:
self._minfreq += 1
node.freq += 1
freq = node.freq
self._freq[freq].append(node)
def get(self, key):
"""
Through checking self._node[key], we can get the node in O(1) time.
Just performs self._update, then we can return the value of node.
:type key: int
:rtype: int
"""
if key not in self._node:
return -1
node = self._node[key]
self._update(node)
return node.val
def put(self, key, value):
"""
If `key` already exists in self._node, we do the same operations as `get`, except
updating the node.val to new value.
Otherwise, the following logic will be performed
1. if the cache reaches its capacity, pop the least frequently used item. (*)
2. add new node to self._node
3. add new node to the DLinkedList with frequency 1
4. reset self._minfreq to 1
(*) How to pop the least frequently used item? Two facts:
1. we maintain the self._minfreq, the minimum possible frequency in cache.
2. All cache with the same frequency are stored as a DLinkedList, with
recently used order (Always append at head)
Consequence? ==> The tail of the DLinkedList with self._minfreq is the least
recently used one, pop it...
:type key: int
:type value: int
:rtype: void
"""
if self._capacity == 0:
return
if key in self._node:
node = self._node[key]
self._update(node)
node.val = value
else:
if self._size == self._capacity:
node = self._freq[self._minfreq].pop()
del self._node[node.key]
self._size -= 1
node = Node(key, value)
self._node[key] = node
self._freq[1].append(node)
self._minfreq = 1
self._size += 1
My slightly modified version
class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.next = self.prev = None
class DLinkedList:
def __init__(self):
self._size = 0
self._dummy = Node(None, None)
self._dummy.next = self._dummy.prev = self._dummy
def __len__(self):return self._size
def append(self, node):
# append to left
node.prev = self._dummy
node.next = self._dummy.next
node.next.prev = node
self._dummy.next = node
self._size += 1
def pop(self, node = None):
if self._size==0:return
if node is None:
node = self._dummy.prev
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1
return node
class LFUCache:
def __init__(self, capacity: int):
self._capacity = capacity
self._size = 0
self._nodes = {}
self._freq = collections.defaultdict(DLinkedList)
self._minfreq = 0
def get(self, key: int) -> int:
if key not in self._nodes:return -1
node = self._nodes[key]
self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self._capacity == 0:return
if key in self._nodes:
node = self._nodes[key]
node.val = value
self._update(node)
return
if self._size == self._capacity:
# pop least frequence item
node = self._freq[self._minfreq].pop()
del self._nodes[node.key]
self._size -= 1
# add the new item
node = Node(key, value)
self._nodes[key] = node
self._minfreq = 1
self._freq[1].append(node)
self._size += 1
return
def _update(self, node):
f = node.freq
node.freq += 1
self._freq[f].pop(node)
if self._minfreq == f and not self._freq[f]:
self._minfreq = f+1
self._freq[f+1].append(node)
# Your LFUCache object will be instantiated and called as such:
# obj = LFUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)
Another implementation
class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.prev = self.next = None
class DLinkedList:
def __init__(self):
self.len = 0
self.dummy = Node(-1, -1)
self.dummy.prev = self.dummy.next = self.dummy
def __len__(self):
return self.len
def appendleft(self, node):
self.len += 1
node.prev = self.dummy
node.next = self.dummy.next
self.dummy.next.prev = node
self.dummy.next = node
def pop(self, node=None):
self.len -= 1
if node is None:
node = self.dummy.prev
node.prev.next = node.next
node.next.prev = node.prev
return node
class LFUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.min_freq = 1
self.LFU = collections.defaultdict(DLinkedList)
self.nodes = {}
def _update(self, node):
curr_freq = node.freq
node = self.LFU[curr_freq].pop(node)
if node.freq == self.min_freq and len(self.LFU[self.min_freq])==0:
self.min_freq += 1
node.freq += 1
self.LFU[node.freq].appendleft(node)
return node
def get(self, key: int) -> int:
if key not in self.nodes:return -1
node = self.nodes[key]
node = self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self.capacity == 0:return
if key in self.nodes:
node = self.nodes[key]
node.val = value
self._update(node)
return
if self.capacity == len(self.nodes):
node = self.LFU[self.min_freq].pop()
del self.nodes[node.key]
new_node = Node(key, value)
self.nodes[key] = new_node
self.min_freq = 1
self.LFU[1].appendleft(new_node)
return
# Your LFUCache object will be instantiated and called as such:
# obj = LFUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)
C++ code based on [1]
struct Node {
int key;
int val;
int freq;
list<int>::iterator it;
};
class LFUCache {
private:
int capacity_, min_freq_;
// nodes_: {key:node, ...}
unordered_map<int, Node> nodes_;
// freq_l_: {freq1: key1<->key2...,
// freq2: key3<->key4}
unordered_map<int, list<int>> freq_l_;
void update(Node& node) {
int old_freq = node.freq;
int new_freq = old_freq + 1;
node.freq = new_freq;
freq_l_[old_freq].erase(node.it);
// push to the new_freq list and update the iterator
freq_l_[new_freq].push_front(node.key);
node.it = freq_l_[new_freq].begin();
// update min_freq when necessary
if (min_freq_ == old_freq && freq_l_[old_freq].empty()) {
min_freq_ = new_freq;
}
return;
}
public:
LFUCache(int capacity): capacity_(capacity), min_freq_(1) {
}
int get(int key) {
auto it = nodes_.find(key);
//found
if (it != nodes_.end()) {
update(it->second);
return it->second.val;
}
return -1;
}
void put(int key, int value) {
if (capacity_ == 0) return;
auto it = nodes_.find(key);
// found
if (it != nodes_.end()) {
it->second.val = value;
update(it->second);
return;
}
// not found
// remove the lfu node if reach the capacity
if (nodes_.size() == capacity_) {
auto toBeDelKey = freq_l_[min_freq_].back();
freq_l_[min_freq_].pop_back();
nodes_.erase(toBeDelKey);
}
// add the new node and update min_freq_ to 1
min_freq_ = 1;
freq_l_[1].push_front(key);
Node new_node{key, value, 1,freq_l_[1].begin()};
nodes_[key] = new_node;
return;
}
};
/**
* Your LFUCache object will be instantiated and called as such:
* LFUCache* obj = new LFUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/
// cacheNode
// + key
// + val
// + freq
// + iterator of the list where we put the node.
// hashmap: nodes_ {key1: node1, key2 : node2, key3 : node3 ...} O(1) to find the node if exists;
// // hashmap by organizing the nodes by frequency:
// {
// **freq2**: key1<-> key4<-> key5
// }
// min_freq: used to know which one is the lfu node.
// get: if exists, update and return. if not exist, return -1.
// the update process: update the freq, and it and mybe the min_freq
// put:
// if exist, update
// if not:
// reach the capacity: remove the lfu, add the new node, update min_freq to 1
// not reach the capacity: add the new node, update min_freq to 1.
LRU
1. A Doubly linked Node
class Node:
- key: int
- value: int
- prev: Node
- next: Node
2. A Doubly Linked List
Note: This part could be replaced by OrderedDict
, I implemented it by hand for clarity
class DLinkedList:
- sentinel: Node
- size: int
+ append(node: Node) -> None
+ pop(node: Node) -> Node
3. Our LRUCache
class LFUCache:
- nodes: dict[key: int, node: Node]
- cache: DlinkedList
+ get(key: int) -> int
+ put(key: int, value: int) -> None
Visualization
Explanation
The LRU cache is very similar to LFU and it is simpler. We don’t need maintain the frequency information and only a double linked list can be used to maintain which one is the least recently used.
Double linked list is used, for the recently used item, append it to the left of the doubly linked list. Hence, the least recently used will be the tail of the doubly linked list.
Implementation
class Node:
def __init__(self, key=None, val=None):
self.key = key
self.val = val
self.prev = self.next = None
class DLinkedList:
def __init__(self):
self._dummy = Node()
self._dummy.prev = self._dummy.next = self._dummy
self._size = 0
def __len__(self):return self._size
def append(self, node):
# append left
node.next = self._dummy.next
node.prev = self._dummy
self._dummy.next = node
node.next.prev = node
self._size += 1
def pop(self, node=None):
if self._size == 0 :return None
if node is None:
node = self._dummy.prev
# delete this node
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1
return node
class LRUCache:
def __init__(self, capacity: int):
self._capacity = capacity
self._size = 0
self._nodes = {}
self._cache = DLinkedList()
def _update(self, node):
node = self._cache.pop(node)
self._cache.append(node)
def get(self, key: int) -> int:
if key not in self._nodes:return -1
node = self._nodes[key]
self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self._capacity ==0 :return
if key in self._nodes:
node = self._nodes[key]
node.val = value
self._update(node)
return
if self._capacity == self._size:
node = self._cache.pop()
del self._nodes[node.key]
self._size -= 1
node = Node(key, value)
self._nodes[key] = node
self._cache.append(node)
self._size += 1
return
# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)
Python version 2
class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.next = self.prev = None
class DLinkedList:
def __init__(self):
self.dummy = Node(-1, -1)
self.dummy.next = self.dummy.prev = self.dummy
self.len = 0
def __len__(self):
return self.len
def pop(self, to_be_popped=None):
"""
D <->Last <->D
"""
if to_be_popped is None:
to_be_popped = self.dummy.prev
to_be_popped.next.prev = to_be_popped.prev
to_be_popped.prev.next = to_be_popped.next
self.len -= 1
return to_be_popped
def appendleft(self, node):
self.len += 1
node.next = self.dummy.next
node.prev = self.dummy
self.dummy.next.prev = node
self.dummy.next = node
return
def pop_appendleft(self, node):
self.pop(node)
self.appendleft(node)
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.nodes = {}
self.LRU = DLinkedList()
def get(self, key: int) -> int:
if key in self.nodes:
self.LRU.pop_appendleft(self.nodes[key])
return self.nodes[key].val
else:return -1
def put(self, key: int, value: int) -> None:
if key in self.nodes:
node = self.nodes[key]
node.val = value
self.LRU.pop_appendleft(node)
return
if self.capacity == 0:return
if self.capacity == len(self.LRU):
popped_node = self.LRU.pop()
del self.nodes[popped_node.key]
new_node = Node(key, value)
self.nodes[key] = new_node
self.LRU.appendleft(new_node)
return
# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)
C++ code
typedef pair<int, int> ii;
class LRUCache {
int capacity_;
list<ii> cache;
unordered_map<int, list<ii>::iterator> m_;
public:
LRUCache(int capacity): capacity_(capacity) {
}
int get(int key) {
auto it = m_.find(key);
// not find
if (it == m_.end()) return -1;
// find
cache.splice(cache.begin(), cache, it->second);
return cache.front().second;
}
void put(int key, int value) {
auto it = m_.find(key);
// found
if (it != m_.end())
{
cache.splice(cache.begin(), cache, it->second);
cache.front().second = value;
m_[key] = cache.begin();
return;
}
// not found
// reach the capacity
if (cache.size() == capacity_)
{
m_.erase(cache.back().first);
cache.pop_back();
}
cache.emplace_front(key, value);
m_[key] = cache.begin();
}
};
/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache* obj = new LRUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/
Another version by using head and tail two nodes
class LRUCache {
struct Node {
Node * prev;
Node * next;
int value;
int key;
Node(int x = 0, int k = 0) : value(x), key(k), prev(nullptr), next(nullptr) {}
};
unordered_map<int, Node*> cache;
Node * head;
Node * tail;
int capacity;
public:
LRUCache(int n) : capacity(n) {
head = new Node();
tail = new Node();
head->next = tail;
tail->prev = head;
}
int get(int key) {
if(cache.find(key) == cache.end()) return -1;
// key found update(node);
Node * node = cache[key];
update(node); // remove and send to head
return node->value;
}
void put(int key, int value) {
// Found:
if(cache.find(key) != cache.end()) {
cache[key]->value = value;
Node * node = cache[key];
update(node);
return;
}
// Not found, insert node
// If not enough capacity: remove node from tail
if(cache.size() >= capacity) evict(tail->prev);
Node * newNode = new Node(value, key);
cache[key] = newNode;
moveToHead(newNode);
}
private:
void removeNode(Node * node) {
node->prev->next = node->next;
node->next->prev = node->prev;
}void evict(Node * node) {
removeNode(node);
cache.erase(node->key);
delete node;
}void moveToHead(Node * node) {
node->next = head->next;
node->prev = head;
head->next->prev = node;
head->next = node;
}void update(Node * node) {
removeNode(node);
moveToHead(node);
}
};
c++ use list
using PII = pair<int, int>;
class LRUCache {
public:
unordered_map<int, list<PII>::iterator> nodes;
list<PII> LRU;
int capacity;
int size;
LRUCache(int capacity): capacity(capacity), size(0){
}
int get(int key) {
if (nodes.find(key) == nodes.end()) return -1;
int value = nodes[key] -> second;
// delete old one
LRU.erase(nodes[key]);
// insert the recently used one to front
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
return value;
}
void put(int key, int value) {
if (capacity == 0) return;
// key exist
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
return;
}
// key does not exist
if (size == capacity){
nodes.erase(LRU.rbegin() -> first);
LRU.pop_back();
size--;
}
LRU.emplace_front(key, value);
size++;
nodes[key] = LRU.begin();
}
};/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache* obj = new LRUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/
binarysearch: Last Value Map
using PII = pair<int, int>;
class LastValueMap {
public:
unordered_map<int, list<PII>::iterator> nodes;
list<PII> LRU;
LastValueMap() {
}void set(int key, int value) {
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
}
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
}void remove(int key) {
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
nodes.erase(key);
}
}int getLast() {
return LRU.begin() -> second;
}
};
Reference
[1]http://zxi.mytechroad.com/blog/hashtable/leetcode-460-lfu-cache/