C++ STL Unordered Hash Table Containers

All four standard unordered associative containers (std::unordered_set, std::unordered_multiset, std::unordered_map, std::unordered_multimap) share a unified underlying data structure: a Hash Table utilizing Separate Chaining.

C++11 standard requirements regarding iterator invalidation dictate this architecture. Open addressing (probing) is prohibited because inserting an element could shift existing elements, invalidating pointers and references.

Under the Hood Implementation (Core)

Bucket Array: A heap-allocated contiguous array of pointers (Node** _M_buckets). The size of this array is the bucket_count.

Linked List Nodes: Each element is wrapped in a dynamically allocated node, identical to a singly-linked list node (std::forward_list).

struct Node {
    Node* _M_next;
    size_t _M_hash_code; // Cached to speed up rehashing and equivalence checks
    T _M_payload;
};

Hash Function (std::hash<Key>): Transforms the key into an integer (size_t). The bucket index is calculated via modulo arithmetic: index = hash_code % bucket_count.
Equality Predicate (std::equal_to<Key>): Resolves hash collisions. If two keys land in the same bucket, the linked list is traversed, and the equality predicate compares keys to find the exact match.
Rehashing: The container tracks a load_factor (size / bucket_count). When this exceeds the max_load_factor (default 1.0), the container allocates a new bucket array (typically 2\times size or the next prime number), recalculates the target bucket for every existing node, and rewires the pointers. This is a severe O(N) operation.

Memory Characteristics Extremely fragmented. Every element requires a distinct heap allocation. Cache locality for table lookups is abysmal due to pointer chasing (first to the bucket array, then out to the node, then down the chain).

`std::unordered_set`

Under the Hood Implementation Wraps the hash table. The payload T is exactly the Key. Rejects duplicate keys. If a hash collision occurs, traverses the bucket's chain checking for strict equality. If equality is found, the insertion aborts.

Data Types Node payload is const Key.

ASCII Illustration

 Bucket Array
 +----+
0| NUL|
 +----+      +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> NUL
 +----+      +-------------------+
2| NUL|
 +----+      +-------------------+    +-------------------+
3| *-------> | Hash: 11, Key: 'C'| -> | Hash: 43, Key: 'K'| -> NUL
 +----+      +-------------------+    +-------------------+
              (Collision chain)

Big O Complexity

Search/Find: Average O(1). Worst-case O(N) (all elements hash to the same bucket).
Insert: Average O(1). Worst-case O(N).
Erase: Average O(1). Worst-case O(N).

Typical Usage Scenarios O(1) membership testing. Deduplication of streams where ordering is irrelevant. Filtering lists of blocked IP addresses or unique hardware IDs.

[!WARNING]

Interview Trap: Checking for Existence

Never use banned_macs.count(rogue) > 0 if you only need boolean existence. count() on a std::unordered_set has to iterate through all duplicates (if it were a multiset, though safe here, it's bad habit). More importantly, count() returns an integer, whereas find() != end() is the idiomatically correct and strictly O(1) way to check existence in C++.

In C++20, banned_macs.contains(rogue) was introduced, but in C++11, stick to find() != end().

`std::unordered_multiset`

Under the Hood Implementation Wraps the hash table. Allows duplicate keys. The standard guarantees that equivalent elements are stored contiguously within the same bucket's linked list.

Data Types Node payload is const Key.

ASCII Illustration

 Bucket Array
 +----+
0| NUL|
 +----+      +-------------------+    +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> | Hash: 17, Key: 'A'| -> NUL
 +----+      +-------------------+    +-------------------+
2| NUL|       (Duplicates linked adjacently)
 +----+

Big O Complexity

Search/Find: Average O(1). Worst-case O(N).
Insert: Average O(1). Worst-case O(N).
Erase (by Key): Average O(C) where C is the number of duplicates. Worst-case O(N).

Typical Usage Scenarios Frequency counting. Tracking multiple occurrences of independent events that share identical hashes, requiring rapid insertion without tree-balancing overhead.

`std::unordered_map`

Under the Hood Implementation Wraps the hash table. The payload T is a std::pair<const Key, Value>. The Hash and KeyEqual functions operate strictly on T.first.

Data Types Node payload is std::pair<const Key, Value>.

ASCII Illustration

 Bucket Array
 +----+
0| *-------> [Hash: 8, Key: "ID1", Val: 100] -> NUL
 +----+      
1| NUL|
 +----+      
2| *------->[Hash: 2, Key: "ID3", Val: 200] ->[Hash: 10, Key: "ID4", Val: 400] -> NUL
 +----+

Big O Complexity

Search/Find: Average O(1). Worst-case O(N).
Insert: Average O(1). Worst-case O(N).
Erase: Average O(1). Worst-case O(N).
operator[]: Average O(1). Default-constructs a value if the key is missing.

Typical Usage Scenarios High-performance lookup tables. Caching layers (e.g., memorization). Symbol tables in compilers. Any dictionary requirement where bounded latency (worst-case O(N)) is acceptable in exchange for drastically superior average throughput compared to std::map.

[!WARNING]

Trap: operator[] vs find()

A massive red flag is using if (my_map[key] == value) to check if a key exists!

operator[] inserts a default-constructed element into the hash map if the key does not exist. This silently inflates your hash map size and can trigger catastrophic O(N) rehashes during what should be a simple read.

Always use auto it = my_map.find(key); if (it != my_map.end()) { ... } for read-only existence checks.

[!TIP]

Interview Secret Weapon: Preempting Rehashes

Most candidates know to reserve() vectors. Few know that std::unordered_map also has reserve(N)! If you are ingesting a known array of size N into a hash map, call map.reserve(N) first. This sets the bucket count high enough that you will never trigger the expensive O(N) rehash/rewire operation during insertions.

[!NOTE]

Interview Trap: Hashing Custom Structs & Pairs

Standard C++ does not provide a default hash for std::pair<T, U> or custom structs! If you try std::unordered_map<std::pair<int, int>, int>, it will stubbornly refuse to compile. You must either provide a custom Hash Functor or use a std::map (which only requires operator<).

`std::unordered_multimap`

Under the Hood Implementation Wraps the hash table. Payload is std::pair<const Key, Value>. Permits duplicate keys. Lacks operator[] and .at(). Retrieval requires equal_range to return the bounding iterators of the duplicate block within the bucket chain.

Data Types Node payload is std::pair<const Key, Value>.

Big O Complexity

Search (equal_range): Average O(1) to find the first element. O(C) to traverse duplicates. Worst-case O(N).
Insert: Average O(1). Worst-case O(N).
Erase: Average O(C). Worst-case O(N).

Typical Usage Scenarios One-to-many mappings where search speed is prioritized over key ordering. Example: Mapping a single database foreign key to multiple row indices in memory.

Chapter 4: Unordered Associative Containers Examples

Unordered Map: Reserving Memory and Standard Usage

#include <unordered_map>
#include <string>

void unordered_map_operations() {
    std::unordered_map<std::string, int> user_sessions;

    // CRITICAL: Prevent rehashing. 
    // If the expected number of elements is known, reserve allocates 
    // the bucket array upfront. The required bucket count is automatically 
    // calculated as: ceil(expected_elements / max_load_factor).
    user_sessions.reserve(10000);

    // O(1) average case insertion
    user_sessions["session_001"] = 3600;
    user_sessions.emplace("session_002", 7200);

    // O(1) average case lookup
    auto it = user_sessions.find("session_001");
    if (it != user_sessions.end()) {
        // it->first is const std::string
        // it->second is mutable int
        it->second += 60;
    }
}

Unordered Set: Custom Hashing for User-Defined Structs C++11 does not automatically generate hash functions for custom structs. A custom functor is mandatory.

#include <unordered_set>
#include <cstdint>

struct MacAddress {
    uint8_t bytes[6];

    // Equality predicate mandatory for collision resolution
    bool operator==(const MacAddress& other) const {
        for(int i=0; i<6; ++i) {
            if (bytes[i] != other.bytes[i]) return false;
        }
        return true;
    }
};

// Custom Hash Functor
struct MacHash {
    size_t operator()(const MacAddress& mac) const {
        // Bitwise operations to combine bytes into a size_t hash.
        // FNV-1a or MurmurHash algorithms are typically implemented here for production.
        size_t hash = 0;
        for (int i = 0; i < 6; ++i) {
            hash ^= (static_cast<size_t>(mac.bytes[i]) << (i * 8));
        }
        return hash;
    }
};

void unordered_set_operations() {
    // Explicitly define the Type, Hash, and KeyEqual parameters
    std::unordered_set<MacAddress, MacHash> banned_macs;

    // reserve() dictates bucket count to avoid O(N) rehashes in critical paths
    banned_macs.reserve(256);

    MacAddress rogue = {0x00, 0x1A, 0x2B, 0x3C, 0x4D, 0x5E};
    banned_macs.insert(rogue); // O(1) insertion
    
    // O(1) membership check
    if (banned_macs.find(rogue) != banned_macs.end()) {
        // execute block protocol
    }
}