C++ In Practice - Hash Maps

HashMaps, Two Pointers, and Array Manipulation (C++11 Deep Dive)

We are using rigorous C++11 implementations, focusing on memory management, STL specifics (std::unordered_map vs std::map), and low-level efficiency.


1. Hash Table Fundamentals (std::unordered_map)

In C++11, std::unordered_map is the direct equivalent of a generic HashMap.

Under the Hood:

  • Data Structure: An array of "buckets". Each bucket is typically a linked list (Chaining) to handle collisions.
  • Hash Function: Uses std::hash<Key> to map keys to size_t hash codes.
  • Index Calculation: bucket_index = hash_code % bucket_count.
  • Rehashing: When load_factor() (elements / buckets) exceeds max_load_factor() (default 1.0), the container allocates a larger bucket array (usually approx 2x), re-calculates indices for all elements, and moves them. This is an O(N) operation.

Comparison:

  • std::map: Red-Black Tree. Sorted keys. O(log N) operations.
  • std::unordered_map: Hash Table. Unsorted. O(1) average, O(N) worst-case (all keys collide).

ASCII Memory Layout (Chaining):

[ std::unordered_map Memory Layout ]
+-------------------+
| Bucket 0 (nullptr)|
+-------------------+
| Bucket 1          |---> [ Pair<K,V> ] -> [ Pair<K,V> ] -> nullptr
+-------------------+
| Bucket 2 (nullptr)|
+-------------------+
| ...               |
+-------------------+

2. Two Sum (Unsorted)

Problem: Find two numbers that add up to target. Pattern: HashMap Lookup. Goal: O(N) Time, O(N) Space.

C++11 Implementation Details:

  • Use std::unordered_map<int, int> to store {value, index}.
  • Use find() to check existence. find() returns end() if not found.
  • Check complement existence before inserting current element to avoid using the same element twice (e.g., target 6, current 3).
#include <vector>
#include <unordered_map>
#include <stdexcept>

std::vector<int> twoSum(const std::vector<int>& nums, int target) {
    // Map: Key = number value, Value = index in vector
    std::unordered_map<int, int> seen;
    // Reserve to prevent rehashing overhead if N is large
    seen.reserve(nums.size()); 

    for (int i = 0; i < nums.size(); ++i) {
        int complement = target - nums[i];
        
        // Check if complement exists
        auto it = seen.find(complement);
        if (it != seen.end()) {
            // Found: return indices {complement_index, current_index}
            return {it->second, i};
        }
        
        // Insert current number and index
        seen[nums[i]] = i;
    }
    
    return {}; // Or throw exception if solution guaranteed
}

3. Two Sum II (Sorted Input)

Pattern: Converging Pointers. Goal: O(N) Time, O(1) Space.

C++11 Implementation Details:

  • Input is const std::vector<int>&.
  • Pointers are integer indices left and right.
  • No memory allocation required.
#include <vector>

std::vector<int> twoSumSorted(const std::vector<int>& numbers, int target) {
    int left = 0;
    int right = static_cast<int>(numbers.size()) - 1;

    while (left < right) {
        // Compute sum. Caution: potential integer overflow if numbers are large.
        // For standard interview questions, int is usually safe.
        long long sum = static_cast<long long>(numbers[left]) + numbers[right];

        if (sum == target) {
            // Problem often asks for 1-based indices
            return {left + 1, right + 1};
        }
        
        if (sum < target) {
            left++; // Need larger sum
        } else {
            right--; // Need smaller sum
        }
    }
    return {};
}

4. 3Sum

Pattern: Sorting + Two Pointers. Goal: O(N²) Time, O(1) Extra Space (excluding output).

Critical Logic:

  1. Sort: Brings duplicates together and allows Two Pointer logic.
  2. Dedup I: Outer loop skips identical starting numbers.
  3. Dedup II: Inner while-loop skips identical left/right numbers after a match is found.
#include <vector>
#include <algorithm>

std::vector<std::vector<int>> threeSum(std::vector<int>& nums) {
    std::vector<std::vector<int>> result;
    if (nums.size() < 3) return result;

    // 1. Sort O(N log N)
    std::sort(nums.begin(), nums.end());

    // 2. Iterate
    for (int i = 0; i < nums.size() - 2; ++i) {
        // Skip duplicates for the first element
        if (i > 0 && nums[i] == nums[i - 1]) continue;

        int left = i + 1;
        int right = nums.size() - 1;
        int target = -nums[i];

        while (left < right) {
            int sum = nums[left] + nums[right];

            if (sum == target) {
                result.push_back({nums[i], nums[left], nums[right]});
                
                // Skip duplicates for the second element
                while (left < right && nums[left] == nums[left + 1]) left++;
                // Skip duplicates for the third element
                while (left < right && nums[right] == nums[right - 1]) right--;

                left++;
                right--;
            } else if (sum < target) {
                left++;
            } else {
                right--;
            }
        }
    }
    return result;
}

5. Group Anagrams

Pattern: HashMap with Sorted Key. Goal: O(N * K log K) Time, O(N * K) Space (N=num strings, K=max string length).

Under the Hood:

  • Key: The sorted version of the string (canonical form).
  • Value: std::vector<std::string> storing all anagrams.
  • C++11 Move Semantics: Crucial here to avoid copying strings repeatedly when pushing into the vector.
#include <vector>
#include <string>
#include <unordered_map>
#include <algorithm>

std::vector<std::vector<std::string>> groupAnagrams(std::vector<std::string>& strs) {
    std::unordered_map<std::string, std::vector<std::string>> map;

    for (const auto& s : strs) { // O(N)
        std::string key = s;
        std::sort(key.begin(), key.end()); // O(K log K)
        map[key].push_back(s); // Copy string into vector
    }

    std::vector<std::vector<std::string>> result;
    result.reserve(map.size());
    
    for (auto& pair : map) {
        // Move the vector from the map to result to avoid deep copy
        result.push_back(std::move(pair.second));
    }

    return result;
}

6. Product of Array Except Self

Pattern: Prefix and Suffix Arrays (Accumulation). Goal: O(N) Time, O(1) Extra Space (output array doesn't count).

Logic:

  • output[i] = (product of all left of i) * (product of all right of i).
  • Pass 1: Compute prefix products directly into output.
  • Pass 2: Compute suffix products on the fly and multiply into output.
#include <vector>

std::vector<int> productExceptSelf(const std::vector<int>& nums) {
    int n = nums.size();
    std::vector<int> output(n);

    // Pass 1: Prefix Products
    // output[i] contains product of nums[0]...nums[i-1]
    output[0] = 1;
    for (int i = 1; i < n; ++i) {
        output[i] = output[i - 1] * nums[i - 1];
    }

    // Pass 2: Suffix Products integrated
    // R tracks the product of elements to the right
    int R = 1;
    for (int i = n - 1; i >= 0; --i) {
        output[i] = output[i] * R;
        R *= nums[i];
    }

    return output;
}

7. Contains Duplicate

Pattern: HashSet. Goal: O(N) Time, O(N) Space.

Alternative: std::sort + std::unique (O(N log N) Time, O(1) Space). Here is the O(N) Time approach using std::unordered_set.

#include <vector>
#include <unordered_set>

bool containsDuplicate(const std::vector<int>& nums) {
    std::unordered_set<int> seen;
    seen.reserve(nums.size()); // Optimization to avoid rehashes

    for (int num : nums) {
        // insert returns a pair<iterator, bool>. 
        // second is false if element already existed.
        if (!seen.insert(num).second) {
            return true;
        }
    }
    return false;
}

8. Top K Frequent Elements

Pattern: HashMap + Min-Heap (Priority Queue). Goal: O(N log K) Time, O(N) Space.

Logic:

  1. Count frequencies using unordered_map.
  2. Maintain a Min-Heap of size K.
  3. If a new element has higher frequency than the heap top, pop top and push new.
  4. Heap eventually contains the K largest frequent elements.
#include <vector>
#include <unordered_map>
#include <queue>

std::vector<int> topKFrequent(const std::vector<int>& nums, int k) {
    // 1. Frequency Map: O(N)
    std::unordered_map<int, int> countMap;
    for (int n : nums) countMap[n]++;

    // 2. Min-Heap: Stores pair<frequency, number>
    // Ordered by frequency (first element of pair)
    using Item = std::pair<int, int>; 
    std::priority_queue<Item, std::vector<Item>, std::greater<Item>> minHeap;

    // 3. Process Map: O(M log K) where M is unique elements
    for (auto& pair : countMap) {
        minHeap.push({pair.second, pair.first});
        if (minHeap.size() > k) {
            minHeap.pop(); // Remove element with smallest frequency
        }
    }

    // 4. Extract results
    std::vector<int> result;
    while (!minHeap.empty()) {
        result.push_back(minHeap.top().second);
        minHeap.pop();
    }
    return result;
}

Summary Table

Problem STL Container Complexity (Time/Space) Key Insight
Two Sum unordered_map O(N) / O(N) Store {val, index} to find complement.
Two Sum II None (Raw Pointers) O(N) / O(1) Sorted input allows converging pointers.
3Sum vector (Sort) O(N²) / O(1) Fix one, solve Two Sum II for rest. Skip dupes.
Anagrams unordered_map O(NK log K) / O(NK) Sorted string is the unique key.
Prod. Except Self vector O(N) / O(1) Prefix pass, then Suffix pass.
Top K unordered_map, priority_queue O(N log K) / O(N) Min-Heap keeps track of heavy hitters.