LRU and LFU

Jimmy (xiaoke) Shen
12 min readApr 2, 2020

For this kind of design problem, I feel that C++ is relatively easier as we have more data structure that can be used such as double linked list in C++. In this article I am going to use Python to implement the design problem.

LRU is relative easier than LFU. However, I found a post about LFU implementation. The post is very clear, I’d like to share that post firstly and try to follow a similar way to design and implement LRU.

LFU

Excellent implementation from Leetcode

Please read the original post. I am putting it here as I real like this one.

What is my data structure?

1. A Doubly linked Node

class Node:
+ key: int
+ value: int
+ freq: int
+ prev: Node
+ next: Node

2. A Doubly Linked List

Note: This part could be replaced by OrderedDict, I implemented it by hand for clarity

class DLinkedList:
- sentinel: Node
+ size: int
+ append(node: Node) -> None
+ pop(node: Node) -> Node

3. Our LFUCache

class LFUCache:
- node: dict[key: int, node: Node]
- freq: dict[freq: int, lst: DlinkedList]
- minfreq: int
+ get(key: int) -> int
+ put(key: int, value: int) -> None

Visualization

Explanation

Each key is mapping to the corresponding node (self._node), where we can retrieve the node in O(1) time.

Each frequency freq is mapped to a Doubly Linked List (self._freq), where all nodes in the DLinkedList have the same frequency, freq. Moreover, each node will be always inserted in the head (indicating most recently used).

A minimum frequency self._minfreq is maintained to keep track of the minimum frequency of across all nodes in this cache, such that the DLinkedList with the min frequency can always be retrieved in O(1) time.

Here is how the algorithm works

get(key)

  1. query the node by calling self._node[key]
  2. find the frequency by checking node.freq, assigned as f, and query the DLinkedList that this node is in, through calling self._freq[f]
  3. pop this node
  4. update node’s frequence, append the node to the new DLinkedList with frequency f+1
  5. if the DLinkedList is empty and self._minfreq == f, update self._minfreq to f+1.
  6. return node.val

put(key, value)

  • If key is already in cache, do the same thing as get(key), and update node.val as value
  • Otherwise:
  1. if the cache is full, pop the least frequenly used element (*)
  2. add new node to self._node
  3. add new node to self._freq[1]
  4. reset self._minfreq to 1

(*) The least frequently used element is the tail element in the DLinkedList with frequency self._minfreq

Implementation

Below is the implementation with a detailed comment as well.

import collections
class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.prev = self.next = None
class DLinkedList:
""" An implementation of doubly linked list.

Two APIs provided:

append(node): append the node to the head of the linked list.
pop(node=None): remove the referenced node.
If None is given, remove the one from tail, which is the least recently used.

Both operation, apparently, are in O(1) complexity.
"""
def __init__(self):
self._sentinel = Node(None, None) # dummy node
self._sentinel.next = self._sentinel.prev = self._sentinel
self._size = 0

def __len__(self):
return self._size

def append(self, node):
node.next = self._sentinel.next
node.prev = self._sentinel
node.next.prev = node
self._sentinel.next = node
self._size += 1

def pop(self, node=None):
if self._size == 0:
return

if not node:
node = self._sentinel.prev
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1

return node

class LFUCache:
def __init__(self, capacity):
"""
:type capacity: int

Three things to maintain:

1. a dict, named as `self._node`, for the reference of all nodes given key.
That is, O(1) time to retrieve node given a key.

2. Each frequency has a doubly linked list, store in `self._freq`, where key
is the frequency, and value is an object of `DLinkedList`

3. The min frequency through all nodes. We can maintain this in O(1) time, taking
advantage of the fact that the frequency can only increment by 1. Use the following
two rules:

Rule 1: Whenever we see the size of the DLinkedList of current min frequency is 0,
the min frequency must increment by 1.

Rule 2: Whenever put in a new (key, value), the min frequency must 1 (the new node)

"""
self._size = 0
self._capacity = capacity

self._node = dict() # key: Node
self._freq = collections.defaultdict(DLinkedList)
self._minfreq = 0


def _update(self, node):
"""
This is a helper function that used in the following two cases:

1. when `get(key)` is called; and
2. when `put(key, value)` is called and the key exists.

The common point of these two cases is that:

1. no new node comes in, and
2. the node is visited one more times -> node.freq changed ->
thus the place of this node will change

The logic of this function is:

1. pop the node from the old DLinkedList (with freq `f`)
2. append the node to new DLinkedList (with freq `f+1`)
3. if old DlinkedList has size 0 and self._minfreq is `f`,
update self._minfreq to `f+1`

All of the above opeartions took O(1) time.
"""
freq = node.freq

self._freq[freq].pop(node)
if self._minfreq == freq and not self._freq[freq]:
self._minfreq += 1

node.freq += 1
freq = node.freq
self._freq[freq].append(node)

def get(self, key):
"""
Through checking self._node[key], we can get the node in O(1) time.
Just performs self._update, then we can return the value of node.

:type key: int
:rtype: int
"""
if key not in self._node:
return -1

node = self._node[key]
self._update(node)
return node.val
def put(self, key, value):
"""
If `key` already exists in self._node, we do the same operations as `get`, except
updating the node.val to new value.

Otherwise, the following logic will be performed

1. if the cache reaches its capacity, pop the least frequently used item. (*)
2. add new node to self._node
3. add new node to the DLinkedList with frequency 1
4. reset self._minfreq to 1

(*) How to pop the least frequently used item? Two facts:

1. we maintain the self._minfreq, the minimum possible frequency in cache.
2. All cache with the same frequency are stored as a DLinkedList, with
recently used order (Always append at head)

Consequence? ==> The tail of the DLinkedList with self._minfreq is the least
recently used one, pop it...

:type key: int
:type value: int
:rtype: void
"""
if self._capacity == 0:
return

if key in self._node:
node = self._node[key]
self._update(node)
node.val = value
else:
if self._size == self._capacity:
node = self._freq[self._minfreq].pop()
del self._node[node.key]
self._size -= 1

node = Node(key, value)
self._node[key] = node
self._freq[1].append(node)
self._minfreq = 1
self._size += 1

My slightly modified version

class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.next = self.prev = None
class DLinkedList:
def __init__(self):
self._size = 0
self._dummy = Node(None, None)
self._dummy.next = self._dummy.prev = self._dummy
def __len__(self):return self._size
def append(self, node):
# append to left
node.prev = self._dummy
node.next = self._dummy.next
node.next.prev = node
self._dummy.next = node
self._size += 1
def pop(self, node = None):
if self._size==0:return
if node is None:
node = self._dummy.prev
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1
return node
class LFUCache:
def __init__(self, capacity: int):
self._capacity = capacity
self._size = 0
self._nodes = {}
self._freq = collections.defaultdict(DLinkedList)
self._minfreq = 0
def get(self, key: int) -> int:
if key not in self._nodes:return -1
node = self._nodes[key]
self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self._capacity == 0:return
if key in self._nodes:
node = self._nodes[key]
node.val = value
self._update(node)
return
if self._size == self._capacity:
# pop least frequence item
node = self._freq[self._minfreq].pop()
del self._nodes[node.key]
self._size -= 1
# add the new item
node = Node(key, value)
self._nodes[key] = node
self._minfreq = 1
self._freq[1].append(node)
self._size += 1
return
def _update(self, node):
f = node.freq
node.freq += 1
self._freq[f].pop(node)
if self._minfreq == f and not self._freq[f]:
self._minfreq = f+1
self._freq[f+1].append(node)

# Your LFUCache object will be instantiated and called as such:
# obj = LFUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

Another implementation

class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.freq = 1
self.prev = self.next = None
class DLinkedList:
def __init__(self):
self.len = 0
self.dummy = Node(-1, -1)
self.dummy.prev = self.dummy.next = self.dummy
def __len__(self):
return self.len
def appendleft(self, node):
self.len += 1
node.prev = self.dummy
node.next = self.dummy.next
self.dummy.next.prev = node
self.dummy.next = node
def pop(self, node=None):
self.len -= 1
if node is None:
node = self.dummy.prev
node.prev.next = node.next
node.next.prev = node.prev
return node

class LFUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.min_freq = 1
self.LFU = collections.defaultdict(DLinkedList)
self.nodes = {}
def _update(self, node):
curr_freq = node.freq
node = self.LFU[curr_freq].pop(node)
if node.freq == self.min_freq and len(self.LFU[self.min_freq])==0:
self.min_freq += 1
node.freq += 1
self.LFU[node.freq].appendleft(node)
return node
def get(self, key: int) -> int:
if key not in self.nodes:return -1
node = self.nodes[key]
node = self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self.capacity == 0:return
if key in self.nodes:
node = self.nodes[key]
node.val = value
self._update(node)
return
if self.capacity == len(self.nodes):
node = self.LFU[self.min_freq].pop()
del self.nodes[node.key]
new_node = Node(key, value)
self.nodes[key] = new_node
self.min_freq = 1
self.LFU[1].appendleft(new_node)
return
# Your LFUCache object will be instantiated and called as such:
# obj = LFUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

C++ code based on [1]

struct Node {
int key;
int val;
int freq;
list<int>::iterator it;
};
class LFUCache {
private:
int capacity_, min_freq_;
// nodes_: {key:node, ...}
unordered_map<int, Node> nodes_;
// freq_l_: {freq1: key1<->key2...,
// freq2: key3<->key4}
unordered_map<int, list<int>> freq_l_;
void update(Node& node) {
int old_freq = node.freq;
int new_freq = old_freq + 1;
node.freq = new_freq;
freq_l_[old_freq].erase(node.it);
// push to the new_freq list and update the iterator
freq_l_[new_freq].push_front(node.key);
node.it = freq_l_[new_freq].begin();

// update min_freq when necessary
if (min_freq_ == old_freq && freq_l_[old_freq].empty()) {
min_freq_ = new_freq;
}

return;
}

public:
LFUCache(int capacity): capacity_(capacity), min_freq_(1) {
}

int get(int key) {
auto it = nodes_.find(key);
//found
if (it != nodes_.end()) {
update(it->second);
return it->second.val;
}
return -1;
}

void put(int key, int value) {
if (capacity_ == 0) return;
auto it = nodes_.find(key);
// found
if (it != nodes_.end()) {
it->second.val = value;
update(it->second);
return;
}
// not found
// remove the lfu node if reach the capacity
if (nodes_.size() == capacity_) {
auto toBeDelKey = freq_l_[min_freq_].back();
freq_l_[min_freq_].pop_back();
nodes_.erase(toBeDelKey);
}
// add the new node and update min_freq_ to 1
min_freq_ = 1;
freq_l_[1].push_front(key);
Node new_node{key, value, 1,freq_l_[1].begin()};
nodes_[key] = new_node;
return;

}
};
/**
* Your LFUCache object will be instantiated and called as such:
* LFUCache* obj = new LFUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/
// cacheNode
// + key
// + val
// + freq
// + iterator of the list where we put the node.
// hashmap: nodes_ {key1: node1, key2 : node2, key3 : node3 ...} O(1) to find the node if exists;
// // hashmap by organizing the nodes by frequency:
// {
// **freq2**: key1<-> key4<-> key5

// }
// min_freq: used to know which one is the lfu node.

// get: if exists, update and return. if not exist, return -1.

// the update process: update the freq, and it and mybe the min_freq
// put:
// if exist, update
// if not:
// reach the capacity: remove the lfu, add the new node, update min_freq to 1
// not reach the capacity: add the new node, update min_freq to 1.

LRU

1. A Doubly linked Node

class Node:
- key: int
- value: int
- prev: Node
- next: Node

2. A Doubly Linked List

Note: This part could be replaced by OrderedDict, I implemented it by hand for clarity

class DLinkedList:
- sentinel: Node
- size: int
+ append(node: Node) -> None
+ pop(node: Node) -> Node

3. Our LRUCache

class LFUCache:
- nodes: dict[key: int, node: Node]
- cache: DlinkedList
+ get(key: int) -> int
+ put(key: int, value: int) -> None

Visualization

Explanation

The LRU cache is very similar to LFU and it is simpler. We don’t need maintain the frequency information and only a double linked list can be used to maintain which one is the least recently used.

Double linked list is used, for the recently used item, append it to the left of the doubly linked list. Hence, the least recently used will be the tail of the doubly linked list.

Implementation

class Node:
def __init__(self, key=None, val=None):
self.key = key
self.val = val
self.prev = self.next = None
class DLinkedList:
def __init__(self):
self._dummy = Node()
self._dummy.prev = self._dummy.next = self._dummy
self._size = 0
def __len__(self):return self._size
def append(self, node):
# append left
node.next = self._dummy.next
node.prev = self._dummy
self._dummy.next = node
node.next.prev = node
self._size += 1
def pop(self, node=None):
if self._size == 0 :return None
if node is None:
node = self._dummy.prev
# delete this node
node.prev.next = node.next
node.next.prev = node.prev
self._size -= 1
return node
class LRUCache:
def __init__(self, capacity: int):
self._capacity = capacity
self._size = 0
self._nodes = {}
self._cache = DLinkedList()
def _update(self, node):
node = self._cache.pop(node)
self._cache.append(node)
def get(self, key: int) -> int:
if key not in self._nodes:return -1
node = self._nodes[key]
self._update(node)
return node.val
def put(self, key: int, value: int) -> None:
if self._capacity ==0 :return
if key in self._nodes:
node = self._nodes[key]
node.val = value
self._update(node)
return
if self._capacity == self._size:
node = self._cache.pop()
del self._nodes[node.key]
self._size -= 1
node = Node(key, value)
self._nodes[key] = node
self._cache.append(node)
self._size += 1
return
# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

Python version 2

class Node:
def __init__(self, key, val):
self.key = key
self.val = val
self.next = self.prev = None
class DLinkedList:
def __init__(self):
self.dummy = Node(-1, -1)
self.dummy.next = self.dummy.prev = self.dummy
self.len = 0

def __len__(self):
return self.len

def pop(self, to_be_popped=None):
"""
D <->Last <->D
"""
if to_be_popped is None:
to_be_popped = self.dummy.prev
to_be_popped.next.prev = to_be_popped.prev
to_be_popped.prev.next = to_be_popped.next
self.len -= 1
return to_be_popped


def appendleft(self, node):
self.len += 1
node.next = self.dummy.next
node.prev = self.dummy
self.dummy.next.prev = node
self.dummy.next = node
return
def pop_appendleft(self, node):
self.pop(node)
self.appendleft(node)
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.nodes = {}
self.LRU = DLinkedList()
def get(self, key: int) -> int:
if key in self.nodes:
self.LRU.pop_appendleft(self.nodes[key])
return self.nodes[key].val
else:return -1
def put(self, key: int, value: int) -> None:
if key in self.nodes:
node = self.nodes[key]
node.val = value
self.LRU.pop_appendleft(node)
return
if self.capacity == 0:return
if self.capacity == len(self.LRU):
popped_node = self.LRU.pop()
del self.nodes[popped_node.key]
new_node = Node(key, value)
self.nodes[key] = new_node
self.LRU.appendleft(new_node)
return
# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

C++ code

typedef pair<int, int> ii;
class LRUCache {
int capacity_;
list<ii> cache;
unordered_map<int, list<ii>::iterator> m_;
public:
LRUCache(int capacity): capacity_(capacity) {
}

int get(int key) {
auto it = m_.find(key);
// not find
if (it == m_.end()) return -1;
// find
cache.splice(cache.begin(), cache, it->second);
return cache.front().second;
}

void put(int key, int value) {
auto it = m_.find(key);
// found
if (it != m_.end())
{
cache.splice(cache.begin(), cache, it->second);
cache.front().second = value;
m_[key] = cache.begin();
return;
}
// not found
// reach the capacity
if (cache.size() == capacity_)
{
m_.erase(cache.back().first);
cache.pop_back();
}
cache.emplace_front(key, value);
m_[key] = cache.begin();
}
};
/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache* obj = new LRUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/

Another version by using head and tail two nodes

class LRUCache {
struct Node {
Node * prev;
Node * next;
int value;
int key;
Node(int x = 0, int k = 0) : value(x), key(k), prev(nullptr), next(nullptr) {}
};
unordered_map<int, Node*> cache;
Node * head;
Node * tail;
int capacity;
public:
LRUCache(int n) : capacity(n) {
head = new Node();
tail = new Node();
head->next = tail;
tail->prev = head;
}
int get(int key) {
if(cache.find(key) == cache.end()) return -1;
// key found update(node);
Node * node = cache[key];
update(node); // remove and send to head
return node->value;
}
void put(int key, int value) {
// Found:
if(cache.find(key) != cache.end()) {
cache[key]->value = value;
Node * node = cache[key];
update(node);
return;
}
// Not found, insert node
// If not enough capacity: remove node from tail
if(cache.size() >= capacity) evict(tail->prev);
Node * newNode = new Node(value, key);
cache[key] = newNode;
moveToHead(newNode);
}
private:
void removeNode(Node * node) {
node->prev->next = node->next;
node->next->prev = node->prev;
}
void evict(Node * node) {
removeNode(node);
cache.erase(node->key);
delete node;
}
void moveToHead(Node * node) {
node->next = head->next;
node->prev = head;
head->next->prev = node;
head->next = node;
}
void update(Node * node) {
removeNode(node);
moveToHead(node);
}
};

c++ use list

using PII = pair<int, int>;
class LRUCache {
public:
unordered_map<int, list<PII>::iterator> nodes;
list<PII> LRU;
int capacity;
int size;
LRUCache(int capacity): capacity(capacity), size(0){
}

int get(int key) {
if (nodes.find(key) == nodes.end()) return -1;
int value = nodes[key] -> second;
// delete old one
LRU.erase(nodes[key]);
// insert the recently used one to front
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
return value;
}

void put(int key, int value) {
if (capacity == 0) return;
// key exist
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
return;
}
// key does not exist
if (size == capacity){
nodes.erase(LRU.rbegin() -> first);
LRU.pop_back();
size--;
}
LRU.emplace_front(key, value);
size++;
nodes[key] = LRU.begin();
}
};
/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache* obj = new LRUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/

binarysearch: Last Value Map

using PII = pair<int, int>;
class LastValueMap {
public:
unordered_map<int, list<PII>::iterator> nodes;
list<PII> LRU;
LastValueMap() {
}
void set(int key, int value) {
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
}
LRU.emplace_front(key, value);
nodes[key] = LRU.begin();
}
void remove(int key) {
if (nodes.find(key) != nodes.end()){
LRU.erase(nodes[key]);
nodes.erase(key);
}
}
int getLast() {
return LRU.begin() -> second;
}
};

Reference

[1]http://zxi.mytechroad.com/blog/hashtable/leetcode-460-lfu-cache/

--

--