C++ STL Unordered (Hash Table) Containers
All four standard unordered associative containers (std::unordered_set, std::unordered_multiset, std::unordered_map, std::unordered_multimap) share a unified underlying data structure: a Hash Table utilizing Separate Chaining.
C++11 standard requirements regarding iterator invalidation dictate this architecture. Open addressing (probing) is prohibited because inserting an element could shift existing elements, invalidating pointers and references.
Under the Hood Implementation (Core)
- Bucket Array: A heap-allocated contiguous array of pointers (
Node** _M_buckets). The size of this array is thebucket_count. - Linked List Nodes: Each element is wrapped in a dynamically allocated node, identical to a singly-linked list node (
std::forward_list).struct Node { Node* _M_next; size_t _M_hash_code; // Cached to speed up rehashing and equivalence checks T _M_payload; }; - Hash Function (
std::hash<Key>): Transforms the key into an integer (size_t). The bucket index is calculated via modulo arithmetic:index = hash_code % bucket_count. - Equality Predicate (
std::equal_to<Key>): Resolves hash collisions. If two keys land in the same bucket, the linked list is traversed, and the equality predicate compares keys to find the exact match. - Rehashing: The container tracks a
load_factor(size / bucket_count). When this exceeds themax_load_factor(default 1.0), the container allocates a new bucket array (typically 2\times size or the next prime number), recalculates the target bucket for every existing node, and rewires the pointers. This is a severe O(N) operation.
Memory Characteristics Extremely fragmented. Every element requires a distinct heap allocation. Cache locality for table lookups is abysmal due to pointer chasing (first to the bucket array, then out to the node, then down the chain).
std::unordered_set
Under the Hood Implementation
Wraps the hash table. The payload T is exactly the Key. Rejects duplicate keys. If a hash collision occurs, traverses the bucket's chain checking for strict equality. If equality is found, the insertion aborts.
Data Types
Node payload is const Key.
ASCII Illustration
Bucket Array
+----+
0| NUL|
+----+ +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> NUL
+----+ +-------------------+
2| NUL|
+----+ +-------------------+ +-------------------+
3| *-------> | Hash: 11, Key: 'C'| -> | Hash: 43, Key: 'K'| -> NUL
+----+ +-------------------+ +-------------------+
(Collision chain)
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N) (all elements hash to the same bucket).
- Insert: Average O(1). Worst-case O(N).
- Erase: Average O(1). Worst-case O(N).
Typical Usage Scenarios O(1) membership testing. Deduplication of streams where ordering is irrelevant. Filtering lists of blocked IP addresses or unique hardware IDs.
std::unordered_multiset
Under the Hood Implementation Wraps the hash table. Allows duplicate keys. The standard guarantees that equivalent elements are stored contiguously within the same bucket's linked list.
Data Types
Node payload is const Key.
ASCII Illustration
Bucket Array
+----+
0| NUL|
+----+ +-------------------+ +-------------------+
1| *-------> | Hash: 17, Key: 'A'| -> | Hash: 17, Key: 'A'| -> NUL
+----+ +-------------------+ +-------------------+
2| NUL| (Duplicates linked adjacently)
+----+
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N).
- Insert: Average O(1). Worst-case O(N).
- Erase (by Key): Average O(C) where C is the number of duplicates. Worst-case O(N).
Typical Usage Scenarios Frequency counting. Tracking multiple occurrences of independent events that share identical hashes, requiring rapid insertion without tree-balancing overhead.
std::unordered_map
Under the Hood Implementation
Wraps the hash table. The payload T is a std::pair<const Key, Value>. The Hash and KeyEqual functions operate strictly on T.first.
Data Types
Node payload is std::pair<const Key, Value>.
ASCII Illustration
Bucket Array
+----+
0| *-------> [Hash: 8, Key: "ID1", Val: 100] -> NUL
+----+
1| NUL|
+----+
2| *------->[Hash: 2, Key: "ID3", Val: 200] ->[Hash: 10, Key: "ID4", Val: 400] -> NUL
+----+
Big O Complexity
- Search/Find: Average O(1). Worst-case O(N).
- Insert: Average O(1). Worst-case O(N).
- Erase: Average O(1). Worst-case O(N).
operator[]: Average O(1). Default-constructs a value if the key is missing.
Typical Usage Scenarios
High-performance lookup tables. Caching layers (e.g., memorization). Symbol tables in compilers. Any dictionary requirement where bounded latency (worst-case O(N)) is acceptable in exchange for drastically superior average throughput compared to std::map.
std::unordered_multimap
Under the Hood Implementation
Wraps the hash table. Payload is std::pair<const Key, Value>. Permits duplicate keys. Lacks operator[] and .at(). Retrieval requires equal_range to return the bounding iterators of the duplicate block within the bucket chain.
Data Types
Node payload is std::pair<const Key, Value>.
Big O Complexity
- Search (
equal_range): Average O(1) to find the first element. O(C) to traverse duplicates. Worst-case O(N). - Insert: Average O(1). Worst-case O(N).
- Erase: Average O(C). Worst-case O(N).
Typical Usage Scenarios One-to-many mappings where search speed is prioritized over key ordering. Example: Mapping a single database foreign key to multiple row indices in memory.
Chapter 4: Unordered Associative Containers Examples
Unordered Map: Reserving Memory and Standard Usage
#include <unordered_map>
#include <string>
void unordered_map_operations() {
std::unordered_map<std::string, int> user_sessions;
// CRITICAL: Prevent rehashing.
// If the expected number of elements is known, reserve allocates
// the bucket array upfront. The required bucket count is automatically
// calculated as: ceil(expected_elements / max_load_factor).
user_sessions.reserve(10000);
// O(1) average case insertion
user_sessions["session_001"] = 3600;
user_sessions.emplace("session_002", 7200);
// O(1) average case lookup
auto it = user_sessions.find("session_001");
if (it != user_sessions.end()) {
// it->first is const std::string
// it->second is mutable int
it->second += 60;
}
}
Unordered Set: Custom Hashing for User-Defined Structs C++11 does not automatically generate hash functions for custom structs. A custom functor is mandatory.
#include <unordered_set>
#include <cstdint>
struct MacAddress {
uint8_t bytes[6];
// Equality predicate mandatory for collision resolution
bool operator==(const MacAddress& other) const {
for(int i=0; i<6; ++i) {
if (bytes[i] != other.bytes[i]) return false;
}
return true;
}
};
// Custom Hash Functor
struct MacHash {
size_t operator()(const MacAddress& mac) const {
// Bitwise operations to combine bytes into a size_t hash.
// FNV-1a or MurmurHash algorithms are typically implemented here for production.
size_t hash = 0;
for (int i = 0; i < 6; ++i) {
hash ^= (static_cast<size_t>(mac.bytes[i]) << (i * 8));
}
return hash;
}
};
void unordered_set_operations() {
// Explicitly define the Type, Hash, and KeyEqual parameters
std::unordered_set<MacAddress, MacHash> banned_macs;
// reserve() dictates bucket count to avoid O(N) rehashes in critical paths
banned_macs.reserve(256);
MacAddress rogue = {0x00, 0x1A, 0x2B, 0x3C, 0x4D, 0x5E};
banned_macs.insert(rogue); // O(1) insertion
// O(1) membership check
if (banned_macs.find(rogue) != banned_macs.end()) {
// execute block protocol
}
}