C++ STL Unordered Hash Table Containers
All four standard unordered associative containers (std::unordered_set, std::unordered_multiset, std::unordered_map, std::unordered_multimap) share a unified underlying data structure: a Hash Table utilizing Separate Chaining.
C++11 standard requirements regarding iterator invalidation dictate this architecture. Open addressing (probing) is prohibited because inserting an element could shift existing elements, invalidating pointers and references.
Under the Hood Implementation (Core)
- Bucket Array: A heap-allocated contiguous array of pointers (
Node** _M_buckets). The size of this array is thebucket_count. - Linked List Nodes: Each element is wrapped in a dynamically allocated node, identical to a singly-linked list node (
std::forward_list).struct Node { Node* _M_next; size_t _M_hash_code; // Cached to speed up rehashing and equivalence checks T _M_payload; }; - Hash Function (
std::hash<Key>): Transforms the key into an integer (size_t). The bucket index is calculated via modulo arithmetic:index = hash_code % bucket_count. - Equality Predicate (
std::equal_to<Key>): Resolves hash collisions. If two keys land in the same bucket, the linked list is traversed, and the equality predicate compares keys to find the exact match. - Rehashing: The container tracks a
load_factor(size / bucket_count). When this exceeds themax_load_factor(default 1.0), the container allocates a new bucket array (typically 2\times size or the next prime number), recalculates the target bucket for every existing node, and rewires the pointers. This is a severe O(N) operation.
Memory Characteristics Extremely fragmented. Every element requires a distinct heap allocation. Cache locality for table lookups is abysmal due to pointer chasing (first to the bucket array, then out to the node, then down the chain).
std::unordered_set
Under the Hood Implementation
Wraps the hash table. The payload T is exactly the Key. Rejects duplicate keys. If a hash collision occurs, traverses the bucket's chain checking for strict equality. If equality is found, the insertion aborts.
Data Types
Node payload is const Key.
ASCII Illustration
Bucket Array
+----+
0| NUL|
+----+ +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> NUL
+----+ +-------------------+
2| NUL|
+----+ +-------------------+ +-------------------+
3| *-------> | Hash: 11, Key: 'C'| -> | Hash: 43, Key: 'K'| -> NUL
+----+ +-------------------+ +-------------------+
(Collision chain)
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N) (all elements hash to the same bucket).
- Insert: Average O(1). Worst-case O(N).
- Erase: Average O(1). Worst-case O(N).
Typical Usage Scenarios O(1) membership testing. Deduplication of streams where ordering is irrelevant. Filtering lists of blocked IP addresses or unique hardware IDs.
[!WARNING]
Interview Trap: Checking for Existence
Never use
banned_macs.count(rogue) > 0if you only need boolean existence.count()on astd::unordered_sethas to iterate through all duplicates (if it were a multiset, though safe here, it's bad habit). More importantly,count()returns an integer, whereasfind() != end()is the idiomatically correct and strictly O(1) way to check existence in C++.In C++20,
banned_macs.contains(rogue)was introduced, but in C++11, stick tofind() != end().
std::unordered_multiset
Under the Hood Implementation Wraps the hash table. Allows duplicate keys. The standard guarantees that equivalent elements are stored contiguously within the same bucket's linked list.
Data Types
Node payload is const Key.
ASCII Illustration
Bucket Array
+----+
0| NUL|
+----+ +-------------------+ +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> | Hash: 17, Key: 'A'| -> NUL
+----+ +-------------------+ +-------------------+
2| NUL| (Duplicates linked adjacently)
+----+
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N).
- Insert: Average O(1). Worst-case O(N).
- Erase (by Key): Average O(C) where C is the number of duplicates. Worst-case O(N).
Typical Usage Scenarios Frequency counting. Tracking multiple occurrences of independent events that share identical hashes, requiring rapid insertion without tree-balancing overhead.
std::unordered_map
Under the Hood Implementation
Wraps the hash table. The payload T is a std::pair<const Key, Value>. The Hash and KeyEqual functions operate strictly on T.first.
Data Types
Node payload is std::pair<const Key, Value>.
ASCII Illustration
Bucket Array
+----+
0| *-------> [Hash: 8, Key: "ID1", Val: 100] -> NUL
+----+
1| NUL|
+----+
2| *------->[Hash: 2, Key: "ID3", Val: 200] ->[Hash: 10, Key: "ID4", Val: 400] -> NUL
+----+
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N).
- Insert: Average O(1). Worst-case O(N).
- Erase: Average O(1). Worst-case O(N).
operator[]: Average O(1). Default-constructs a value if the key is missing.
Typical Usage Scenarios
High-performance lookup tables. Caching layers (e.g., memorization). Symbol tables in compilers. Any dictionary requirement where bounded latency (worst-case O(N)) is acceptable in exchange for drastically superior average throughput compared to std::map.
[!WARNING]
Trap:
operator[]vsfind()A massive red flag is using
if (my_map[key] == value)to check if a key exists!
operator[]inserts a default-constructed element into the hash map if the key does not exist. This silently inflates your hash map size and can trigger catastrophic O(N) rehashes during what should be a simple read.Always use
auto it = my_map.find(key); if (it != my_map.end()) { ... }for read-only existence checks.
[!TIP]
Interview Secret Weapon: Preempting Rehashes
Most candidates know to
reserve()vectors. Few know thatstd::unordered_mapalso hasreserve(N)! If you are ingesting a known array of size N into a hash map, callmap.reserve(N)first. This sets the bucket count high enough that you will never trigger the expensive O(N) rehash/rewire operation during insertions.
[!NOTE]
Interview Trap: Hashing Custom Structs & Pairs
Standard C++ does not provide a default hash for
std::pair<T, U>or custom structs! If you trystd::unordered_map<std::pair<int, int>, int>, it will stubbornly refuse to compile. You must either provide a custom Hash Functor or use astd::map(which only requiresoperator<).
std::unordered_multimap
Under the Hood Implementation
Wraps the hash table. Payload is std::pair<const Key, Value>. Permits duplicate keys. Lacks operator[] and .at(). Retrieval requires equal_range to return the bounding iterators of the duplicate block within the bucket chain.
Data Types
Node payload is std::pair<const Key, Value>.
Big O Complexity
- Search (
equal_range): Average O(1) to find the first element. O(C) to traverse duplicates. Worst-case O(N). - Insert: Average O(1). Worst-case O(N).
- Erase: Average O(C). Worst-case O(N).
Typical Usage Scenarios One-to-many mappings where search speed is prioritized over key ordering. Example: Mapping a single database foreign key to multiple row indices in memory.
Chapter 4: Unordered Associative Containers Examples
Unordered Map: Reserving Memory and Standard Usage
#include <unordered_map>
#include <string>
void unordered_map_operations() {
std::unordered_map<std::string, int> user_sessions;
// CRITICAL: Prevent rehashing.
// If the expected number of elements is known, reserve allocates
// the bucket array upfront. The required bucket count is automatically
// calculated as: ceil(expected_elements / max_load_factor).
user_sessions.reserve(10000);
// O(1) average case insertion
user_sessions["session_001"] = 3600;
user_sessions.emplace("session_002", 7200);
// O(1) average case lookup
auto it = user_sessions.find("session_001");
if (it != user_sessions.end()) {
// it->first is const std::string
// it->second is mutable int
it->second += 60;
}
}
Unordered Set: Custom Hashing for User-Defined Structs C++11 does not automatically generate hash functions for custom structs. A custom functor is mandatory.
#include <unordered_set>
#include <cstdint>
struct MacAddress {
uint8_t bytes[6];
// Equality predicate mandatory for collision resolution
bool operator==(const MacAddress& other) const {
for(int i=0; i<6; ++i) {
if (bytes[i] != other.bytes[i]) return false;
}
return true;
}
};
// Custom Hash Functor
struct MacHash {
size_t operator()(const MacAddress& mac) const {
// Bitwise operations to combine bytes into a size_t hash.
// FNV-1a or MurmurHash algorithms are typically implemented here for production.
size_t hash = 0;
for (int i = 0; i < 6; ++i) {
hash ^= (static_cast<size_t>(mac.bytes[i]) << (i * 8));
}
return hash;
}
};
void unordered_set_operations() {
// Explicitly define the Type, Hash, and KeyEqual parameters
std::unordered_set<MacAddress, MacHash> banned_macs;
// reserve() dictates bucket count to avoid O(N) rehashes in critical paths
banned_macs.reserve(256);
MacAddress rogue = {0x00, 0x1A, 0x2B, 0x3C, 0x4D, 0x5E};
banned_macs.insert(rogue); // O(1) insertion
// O(1) membership check
if (banned_macs.find(rogue) != banned_macs.end()) {
// execute block protocol
}
}