C++ In Practice - Combined Data Structures

Advanced O(1) Data Structures (System Design Primitives)

These problems are not merely algorithmic; they are micro-architecture design challenges. They require composing STL containers to bypass their individual limitations (e.g., combining a Hash Map's lookup with a Linked List's ordering).


1. LFU Cache (Least Frequently Used)

Requirement: O(1) get and put. Evict least frequent; tie-break with least recent.

Architecture:

  • Frequencies: A Hash Map mapping Frequency -> DoublyLinkedList<Key>.
  • Positions: A Hash Map mapping Key -> Iterator (pointing inside the lists above).
  • Values: A Hash Map mapping Key -> {Value, Frequency}.
  • MinFreq: An integer tracking the minimum frequency in the system to allow O(1) eviction.

C++11 Implementation Strategy: We use std::list for the doubly linked list. The critical trick is maintaining minFreq. When a key is accessed, we move it from freq list to freq + 1 list. If freq list becomes empty and freq == minFreq, we increment minFreq.

#include <unordered_map>
#include <list>
#include <iterator>

class LFUCache {
    int capacity;
    int minFreq;
    
    // Key -> {Value, Freq}
    std::unordered_map<int, std::pair<int, int>> keyValFreq; 
    
    // Freq -> List of Keys (Front is MRU, Back is LRU for that freq)
    std::unordered_map<int, std::list<int>> freqList; 
    
    // Key -> Iterator to the node in freqList
    std::unordered_map<int, std::list<int>::iterator> keyIter;

public:
    LFUCache(int capacity) : capacity(capacity), minFreq(0) {}

    int get(int key) {
        if (keyValFreq.find(key) == keyValFreq.end()) return -1;
        
        // Update frequency
        updateFreq(key);
        return keyValFreq[key].first;
    }

    void put(int key, int value) {
        if (capacity == 0) return;

        if (keyValFreq.find(key) != keyValFreq.end()) {
            keyValFreq[key].first = value;
            updateFreq(key);
            return;
        }

        if (keyValFreq.size() >= capacity) {
            // Evict LFU (and LRU if tie)
            // The back of the list is the LRU for that frequency
            int evictKey = freqList[minFreq].back();
            freqList[minFreq].pop_back();
            keyValFreq.erase(evictKey);
            keyIter.erase(evictKey);
        }

        // Insert new key
        minFreq = 1;
        keyValFreq[key] = {value, 1};
        freqList[1].push_front(key);
        keyIter[key] = freqList[1].begin();
    }

private:
    void updateFreq(int key) {
        int currFreq = keyValFreq[key].second;
        
        // Remove from current freq list
        freqList[currFreq].erase(keyIter[key]);
        
        // Update minFreq if necessary
        if (freqList[currFreq].empty()) {
            freqList.erase(currFreq); // Cleanup empty list
            if (minFreq == currFreq) minFreq++;
        }
        
        // Move to next freq list
        int nextFreq = currFreq + 1;
        keyValFreq[key].second = nextFreq;
        freqList[nextFreq].push_front(key);
        keyIter[key] = freqList[nextFreq].begin();
    }
};

2. Insert Delete GetRandom O(1)

Requirement: Strict O(1) average time.

Challenge: Arrays allow O(1) random access but O(N) deletion (shifting). Hash Maps allow O(1) deletion but no random access.

Solution: The "Swap-and-Pop" Idiom.

  1. Store values in std::vector (for random index access).
  2. Store Value -> Index in std::unordered_map.
  3. Delete: Swap the target element with the last element of the vector. Update the map index for the moved element. Pop the back.
#include <vector>
#include <unordered_map>
#include <cstdlib> // for rand()

class RandomizedSet {
    std::vector<int> nums;
    std::unordered_map<int, int> valToIndex;

public:
    RandomizedSet() {}

    bool insert(int val) {
        if (valToIndex.count(val)) return false;
        
        nums.push_back(val);
        valToIndex[val] = nums.size() - 1;
        return true;
    }

    bool remove(int val) {
        if (!valToIndex.count(val)) return false;

        int indexToRemove = valToIndex[val];
        int lastVal = nums.back();

        // Swap Logic:
        // 1. Move lastVal to the hole
        nums[indexToRemove] = lastVal;
        valToIndex[lastVal] = indexToRemove;

        // 2. Remove the last element
        nums.pop_back();
        valToIndex.erase(val);
        
        return true;
    }

    int getRandom() {
        return nums[rand() % nums.size()];
    }
};

3. All Oone Data Structure

Requirement: Increment/Decrement Key count, GetMax, GetMin in O(1).

Architecture:

  • Buckets: A Doubly Linked List where each node represents a Count and contains a Set of keys with that count. Sorted by count.
  • Mapping: unordered_map<Key, Iterator_to_Bucket>.

Implementation Nuance: When incrementing, move key from CurrentBucket to NextBucket. If NextBucket (count+1) doesn't exist, create it. std::list::insert inserts before, so to insert after, use std::next(it).

#include <unordered_map>
#include <unordered_set>
#include <list>
#include <string>

class AllOne {
    struct Bucket {
        int count;
        std::unordered_set<std::string> keys;
    };

    std::list<Bucket> buckets;
    std::unordered_map<std::string, std::list<Bucket>::iterator> bucketOfKey;

public:
    void inc(std::string key) {
        // If key doesn't exist, insert into count=1 bucket
        if (!bucketOfKey.count(key)) {
            if (buckets.empty() || buckets.front().count != 1) {
                buckets.push_front({1, {}});
            }
            buckets.front().keys.insert(key);
            bucketOfKey[key] = buckets.begin();
            return;
        }

        // Key exists, move to next bucket
        auto curr = bucketOfKey[key];
        auto next = std::next(curr);
        
        if (next == buckets.end() || next->count != curr->count + 1) {
            next = buckets.insert(next, {curr->count + 1, {}});
        }
        
        next->keys.insert(key);
        bucketOfKey[key] = next;
        
        curr->keys.erase(key);
        if (curr->keys.empty()) buckets.erase(curr);
    }

    void dec(std::string key) {
        if (!bucketOfKey.count(key)) return;

        auto curr = bucketOfKey[key];
        auto prev = buckets.end(); 
        if (curr != buckets.begin()) prev = std::prev(curr);

        bucketOfKey.erase(key); // Assuming removal unless moved

        if (curr->count > 1) {
            if (curr == buckets.begin() || prev->count != curr->count - 1) {
                prev = buckets.insert(curr, {curr->count - 1, {}});
            }
            prev->keys.insert(key);
            bucketOfKey[key] = prev;
        }

        curr->keys.erase(key);
        if (curr->keys.empty()) buckets.erase(curr);
    }

    std::string getMaxKey() {
        return buckets.empty() ? "" : *buckets.back().keys.begin();
    }

    std::string getMinKey() {
        return buckets.empty() ? "" : *buckets.front().keys.begin();
    }
};

4. Time Based Key-Value Store

Requirement: set(key, val, time), get(key, time). Get returns value at time_prev <= time.

Logic: Since set calls are strictly chronological (usually), the storage for each key is sorted. Use upper_bound.

#include <unordered_map>
#include <vector>
#include <string>
#include <algorithm>

class TimeMap {
    // Key -> Vector of {Timestamp, Value}
    std::unordered_map<std::string, std::vector<std::pair<int, std::string>>> store;

public:
    void set(std::string key, std::string value, int timestamp) {
        store[key].push_back({timestamp, value});
    }

    std::string get(std::string key, int timestamp) {
        if (!store.count(key)) return "";

        const auto& vec = store[key];

        // Binary Search: Find first element > timestamp
        // Compare pair with int? Need custom comparator or dummy pair.
        // Easiest: Use a dummy pair with MAX string so strictly >
        auto it = std::upper_bound(vec.begin(), vec.end(), 
            std::make_pair(timestamp, std::string(1, (char)127)));

        if (it == vec.begin()) return "";
        
        // Move back one step to get <= timestamp
        return std::prev(it)->second;
    }
};

5. Snapshot Array

Requirement: set(index, val), snap(), get(index, snap_id).

Optimization: Storing a full copy on snap() is O(N) memory/time. Too slow. We store Deltas.

Structure: vector<vector<pair<snap_id, value>>>.

Logic: get performs binary search on the history of that specific index.

#include <vector>
#include <algorithm>
#include <map>

class SnapshotArray {
    int snapId = 0;
    // Index -> List of {SnapID, Value}
    std::vector<std::vector<std::pair<int, int>>> history;

public:
    SnapshotArray(int length) {
        history.resize(length);
        // Initialize all indices with 0 at snap -1 or handle in get
        for(auto& h : history) {
            h.push_back({-1, 0});
        }
    }

    void set(int index, int val) {
        // Optimization: If value at current snapId already exists, overwrite it.
        // Otherwise, push new entry.
        if (history[index].back().first == snapId) {
            history[index].back().second = val;
        } else {
            history[index].push_back({snapId, val});
        }
    }

    int snap() {
        return snapId++;
    }

    int get(int index, int snap_id) {
        const auto& vec = history[index];
        // Find iterator to first element with ID > snap_id
        auto it = std::upper_bound(vec.begin(), vec.end(), std::make_pair(snap_id, 2000000000));
        
        // Return the element just before it (closest snap_id <= requested)
        return std::prev(it)->second;
    }
};

Summary of Advanced Structures

Problem STL Primitives Key Technique Complexity
LFU Cache unordered_map + list Iterator Map. Direct access to list nodes. O(1)
Random Set unordered_map + vector Swap-and-Pop. O(1)
All Oone list<Bucket> + unordered_map Bucket List. Move keys between buckets. O(1)
Time Map unordered_map + vector Binary Search (upper_bound). O(\log N_{key})
Snapshot vector<vector<pair>> Delta Storage + Binary Search. O(\log N_{snaps})

Here are 5 additional "Architectural Class" problems. These problems, like the LFU Cache or Snapshot Array, require composing standard STL containers to create a custom data structure with strict time complexity requirements.

1. Range Module (Dynamic Interval Management)

Mechanism: Track a set of half-open intervals [left, right). Support adding a range, removing a range, and querying if a range is fully tracked. All operations should be roughly O(\log N).

Real-world Analogy: Memory Management Unit (MMU) tracking allocated memory pages.

Implementation Strategy:

  • Data Structure: std::map<int, int> mapping Start -> End.
  • Invariant: No two intervals in the map overlap or touch. If we insert [1, 5) and [5, 10), they merge into [1, 10).
  • Logic:
    • Add: Use upper_bound to find the first interval that starts after our new range starts. Move iterator backward to check for overlap. Merge overlapping intervals into one, deleting the subsumed ones.
    • Remove: Similar to Add, but "cuts" holes in existing intervals. This might result in one interval splitting into two.
    • Query: upper_bound to find the interval starting \le query.start. Check if it covers query.end.
#include <map>
#include <algorithm>

class RangeModule {
    // Map: Start -> End
    std::map<int, int> intervals;

public:
    void addRange(int left, int right) {
        auto it = intervals.upper_bound(left);
        // Check previous interval: does it overlap or touch?
        if (it != intervals.begin()) {
            auto prev = std::prev(it);
            if (prev->second >= left) {
                left = prev->first;
                right = std::max(right, prev->second);
                intervals.erase(prev);
            }
        }

        // Check next intervals: do they overlap?
        while (it != intervals.end() && it->first <= right) {
            right = std::max(right, it->second);
            it = intervals.erase(it);
        }
        
        intervals[left] = right;
    }

    bool queryRange(int left, int right) {
        auto it = intervals.upper_bound(left);
        if (it == intervals.begin()) return false;
        // Check the interval strictly to the left of our query start
        return std::prev(it)->second >= right;
    }

    void removeRange(int left, int right) {
        auto it = intervals.upper_bound(left);
        if (it != intervals.begin()) {
            auto prev = std::prev(it);
            if (prev->second > left) {
                // Determine if we need to split the previous interval or just trim right
                int originalEnd = prev->second;
                prev->second = left; // Trim
                if (originalEnd > right) {
                    intervals[right] = originalEnd; // Split part
                    return; 
                }
            }
        }

        while (it != intervals.end() && it->first < right) {
            if (it->second <= right) {
                // Fully covered, remove
                it = intervals.erase(it);
            } else {
                // Partially covered, trim left
                int end = it->second;
                intervals.erase(it);
                intervals[right] = end;
                break;
            }
        }
    }
};

2. Design In-Memory File System

Mechanism: Support ls, mkdir, addContent, readContent.

Architecture: A Trie-like tree structure, but nodes are Directories or Files.

Key Challenge: Clean separation of File vs Directory logic.

Implementation Strategy:

  • Struct: Node contains bool isFile, string content, and map<string, Node*> children.
  • Map: Using std::map automatically keeps the ls output sorted lexicographically (a common requirement).
  • Traversal: A helper function findNode(path) splits the string by / and walks the tree.
#include <vector>
#include <string>
#include <map>
#include <sstream>
#include <algorithm>

class FileSystem {
    struct Node {
        bool isFile = false;
        std::string content = "";
        std::map<std::string, Node*> children;
    };
    
    Node* root;

    std::vector<std::string> split(const std::string& path) {
        std::vector<std::string> parts;
        std::stringstream ss(path);
        std::string item;
        while (std::getline(ss, item, '/')) {
            if (!item.empty()) parts.push_back(item);
        }
        return parts;
    }

public:
    FileSystem() {
        root = new Node();
    }

    std::vector<std::string> ls(std::string path) {
        auto parts = split(path);
        Node* curr = root;
        for (const auto& part : parts) curr = curr->children[part];

        if (curr->isFile) return {parts.back()};
        
        std::vector<std::string> res;
        for (const auto& pair : curr->children) res.push_back(pair.first);
        return res;
    }

    void mkdir(std::string path) {
        auto parts = split(path);
        Node* curr = root;
        for (const auto& part : parts) {
            if (!curr->children.count(part)) curr->children[part] = new Node();
            curr = curr->children[part];
        }
    }

    void addContentToFile(std::string filePath, std::string content) {
        auto parts = split(filePath);
        Node* curr = root;
        for (const auto& part : parts) {
            if (!curr->children.count(part)) curr->children[part] = new Node();
            curr = curr->children[part];
        }
        curr->isFile = true;
        curr->content += content;
    }
};

3. Dinner Plate Stacks (Infinite Stacks with PopAtStack)

Mechanism: You have an infinite row of stacks. push goes to the leftmost non-full stack. pop comes from the rightmost non-empty stack. popAtStack(index) pops from a specific stack.

Real-world Analogy: Warehouse shelving.

Implementation Strategy:

  • Container: vector<stack<int>> stores the actual data.
  • Availability Tracker: set<int> available (Min-Heap behavior) stores indices of stacks that have space.
  • Efficiency:
    • push: *available.begin() gives the leftmost index. O(\log N).
    • pop: vector.back(). O(1).
    • popAtStack: Direct access. Add index to available set since it now has space. O(\log N).
#include <vector>
#include <stack>
#include <set>

class DinnerPlates {
    int capacity;
    std::vector<std::stack<int>> stacks;
    std::set<int> available; // Indices of non-full stacks

public:
    DinnerPlates(int capacity) : capacity(capacity) {}

    void push(int val) {
        if (available.empty()) {
            // No holes, create new stack
            available.insert(stacks.size());
            stacks.push_back({});
        }

        int idx = *available.begin(); // Leftmost available
        stacks[idx].push(val);

        if (stacks[idx].size() == capacity) {
            available.erase(idx); // Full now
        }
    }

    int pop() {
        // Remove empty stacks from the back
        while (!stacks.empty() && stacks.back().empty()) {
            // Also remove from available set if it was there
            // (Wait, if it's empty, it MUST be in available, but we are deleting the stack entirely)
            available.erase(stacks.size() - 1);
            stacks.pop_back();
        }
        
        if (stacks.empty()) return -1;
        return popAtStack(stacks.size() - 1);
    }

    int popAtStack(int index) {
        if (index >= stacks.size() || stacks[index].empty()) return -1;

        int val = stacks[index].top();
        stacks[index].pop();
        
        // Since we popped, this stack definitely has space now
        available.insert(index);
        
        return val;
    }
};

4. Design Text Editor (Cursor Management)

Mechanism: addText(text), deleteText(k), cursorLeft(k), cursorRight(k).

Architecture: The "Two-Stack" approach (often called a Gap Buffer concept).

  • Left Stack: Stores characters to the left of the cursor.
  • Right Stack: Stores characters to the right of the cursor.
  • Movement: Moving left means popping from Left Stack and pushing to Right Stack.

C++11 Optimization: Instead of std::stack (which is hard to print/iterate for result), use std::string or std::deque acting as stacks. string is contiguous and better for the "return last 10 chars" requirement common in this problem.

#include <string>
#include <algorithm>
#include <deque>

class TextEditor {
    std::deque<char> left;  // Chars to left of cursor
    std::deque<char> right; // Chars to right of cursor (reversed logic or simply push_front)

public:
    void addText(std::string text) {
        for (char c : text) left.push_back(c);
    }

    int deleteText(int k) {
        int count = 0;
        while (k > 0 && !left.empty()) {
            left.pop_back();
            k--;
            count++;
        }
        return count;
    }

    std::string cursorLeft(int k) {
        while (k > 0 && !left.empty()) {
            right.push_front(left.back());
            left.pop_back();
            k--;
        }
        return getPreview();
    }

    std::string cursorRight(int k) {
        while (k > 0 && !right.empty()) {
            left.push_back(right.front());
            right.pop_front();
            k--;
        }
        return getPreview();
    }

    std::string getPreview() {
        // Return last 10 chars from left
        std::string res = "";
        int count = 10;
        auto it = left.rbegin();
        while (count > 0 && it != left.rend()) {
            res += *it;
            it++;
            count--;
        }
        std::reverse(res.begin(), res.end());
        return res;
    }
};

5. Leaderboard / Rank System (Order Statistic Tree)

Mechanism: addScore(player, score), getRank(score), getScoreAtIndex(k).

Challenge: STL std::set / std::map does not support "get index of element" or "get element at index" in O(\log N). std::distance is O(N).

Solution:

  1. Ideal (Production): Policy Based Data Structure (PBDS) in GCC, or Augmenting a BST (store subtree size).
  2. Didactic (Standard STL): We cannot achieve pure O(\log N) for both add and getRank using only std::set.
    • Alternative: Use std::vector + std::sort (O(N \log N)) or Fenwick Tree (if scores are integers in reasonable range).
    • Wait, we can approximate: If specific rank isn't required, but "Top K" is, we use min-heap.
    • Design Choice: If the range of scores is small (e.g., 0-100), use count[score].
    • Approximation for this exercise: Let's implement a system that maintains a Gap Ranking using a Hash Map + set of scores.

Scenario: Design a system to get the K-th largest score in O(\log N) and update scores in O(\log N). This is technically impossible with standard C++ STL. However, we can design a Tweet Counts or Rate Limiter style system instead.

Let's Pivot to: Design a Rate Limiter (Sliding Window Log) Mechanism: Allow limit requests in the last window seconds. Architecture: unordered_map<user, deque<timestamp>>. Logic:

  1. Get user's deque.
  2. Pop front while timestamp < current - window.
  3. Return deque.size() < limit.
#include <unordered_map>
#include <deque>

class RateLimiter {
    std::unordered_map<int, std::deque<int>> userRequests;
    int timeWindow;
    int maxRequests;

public:
    RateLimiter(int window, int maxReq) : timeWindow(window), maxRequests(maxReq) {}

    bool shouldAllow(int userId, int timestamp) {
        std::deque<int>& q = userRequests[userId];

        // 1. Clean up old requests
        while (!q.empty() && q.front() <= timestamp - timeWindow) {
            q.pop_front();
        }

        // 2. Check capacity
        if (q.size() < maxRequests) {
            q.push_back(timestamp);
            return true;
        }

        return false;
    }
};

Summary of New Problems

Problem Key Components Complexity Constraint
Range Module std::map (Interval Merging) O(\log N) ops.
File System Trie (map children) O(L) where L is path length.
Dinner Plates vector<stack> + set Push O(\log N), Pop O(1).
Text Editor deque (Two Stacks) Cursor Move O(K), Insert O(1).
Rate Limiter map + deque Amortized O(1) (Clean up).

C++ In Practice - Combined Data Structures | Bijup.com