C++ In Practice - Trie Implementation

Trie (Prefix Tree) in C++11

A Trie is a tree optimized for string retrieval. Unlike a Binary Search Tree (O(L \log N)), a Trie processes a string in O(L) time, independent of the number of stored words (N).

In C++, Tries require manual memory management. Because Tries allocate thousands of small nodes, they are prone to memory fragmentation and cache misses. In high-performance scenarios, a "Static Trie" (allocated in a single std::vector) is preferred, but for didactic purposes, the dynamic pointer-based approach is standard.

Data Structure Design

The Node:

  • Array vs. Map: For standard problems (lowercase English), use TrieNode* children[26]. It guarantees O(1) transitions. std::unordered_map adds hashing overhead.
  • Destruction: C++ requires a destructor to recursively free memory, or the use of std::unique_ptr.
#include <vector>
#include <string>

struct TrieNode {
    // Array for 'a'-'z'. 
    // Use raw pointers for teaching; use smart pointers for production.
    TrieNode* children[26];
    bool isEnd;

    TrieNode() {
        isEnd = false;
        // Efficiently zero out the array
        for (int i = 0; i < 26; ++i) children[i] = nullptr;
    }

    ~TrieNode() {
        for (int i = 0; i < 26; ++i) {
            if (children[i]) delete children[i];
        }
    }
};

1. Basic Implementation (Insert, Search, StartsWith)

Complexity: Time O(L) | Space O(N \cdot L).

class Trie {
private:
    TrieNode* root;

public:
    Trie() {
        root = new TrieNode();
    }

    // RAII: Clean up memory when Trie goes out of scope
    ~Trie() {
        delete root;
    }

    void insert(std::string word) {
        TrieNode* node = root;
        for (char c : word) {
            int index = c - 'a';
            if (!node->children[index]) {
                node->children[index] = new TrieNode();
            }
            node = node->children[index];
        }
        node->isEnd = true;
    }

    bool search(std::string word) {
        TrieNode* node = traverse(word);
        return node != nullptr && node->isEnd;
    }

    bool startsWith(std::string prefix) {
        return traverse(prefix) != nullptr;
    }

private:
    TrieNode* traverse(const std::string& s) {
        TrieNode* node = root;
        for (char c : s) {
            int index = c - 'a';
            if (!node->children[index]) {
                return nullptr;
            }
            node = node->children[index];
        }
        return node;
    }
};

2. Advanced: Word Search II (Backtracking + Trie)

Problem: Find all words from a dictionary in a 2D board.

Optimization:

  1. Embed the Word: Store the full string at the leaf node of the Trie. This removes the need to pass a string path during recursion, saving massive copying overhead.
  2. Pruning: Once a word is found, remove the "End of Word" marker (or the node itself) to prevent finding duplicate instances of the same word.
#include <vector>
#include <string>

class Solution {
    struct TrieNode {
        TrieNode* children[26] = {nullptr};
        std::string* word = nullptr; // Pointer to the actual word string
        
        ~TrieNode() {
            for(auto c : children) if(c) delete c;
        }
    };

    void insert(TrieNode* root, std::string& s) {
        TrieNode* curr = root;
        for (char c : s) {
            int idx = c - 'a';
            if (!curr->children[idx]) curr->children[idx] = new TrieNode();
            curr = curr->children[idx];
        }
        curr->word = &s; // Store pointer to string
    }

public:
    std::vector<std::string> findWords(std::vector<std::vector<char>>& board, 
                                       std::vector<std::string>& words) {
        TrieNode root; // Stack allocated root, auto-cleans up children
        for (auto& w : words) insert(&root, w);

        std::vector<std::string> result;
        int m = board.size();
        int n = board[0].size();

        for (int i = 0; i < m; ++i) {
            for (int j = 0; j < n; ++j) {
                dfs(board, i, j, &root, result);
            }
        }
        return result;
    }

private:
    void dfs(std::vector<std::vector<char>>& board, int r, int c, 
             TrieNode* node, std::vector<std::string>& result) {
        
        char ch = board[r][c];
        if (ch == '#' || !node->children[ch - 'a']) return;

        TrieNode* nextNode = node->children[ch - 'a'];
        
        // Found a word?
        if (nextNode->word) {
            result.push_back(*nextNode->word);
            nextNode->word = nullptr; // Avoid duplicates
        }

        // Backtracking
        board[r][c] = '#'; // Mark visited
        
        int dirs[4][2] = {{0,1}, {0,-1}, {1,0}, {-1,0}};
        for (auto& d : dirs) {
            int nr = r + d[0];
            int nc = c + d[1];
            if (nr >= 0 && nr < board.size() && nc >= 0 && nc < board[0].size()) {
                dfs(board, nr, nc, nextNode, result);
            }
        }

        board[r][c] = ch; // Restore
    }
};

3. Autocomplete (Design Problem)

Problem: Input characters one by one, return top 3 hot sentences.

Architecture:

  • Trie: Stores sentences. Each node needs to know the "Top 3" sentences passing through it (Cache) or we traverse down to find them.
  • HashMap: To update frequencies of full sentences quickly.
  • Optimization: Storing the top 3 results in every TrieNode trades space for O(1) query time.
#include <unordered_map>
#include <vector>
#include <string>
#include <algorithm>

class AutocompleteSystem {
    struct TrieNode {
        std::unordered_map<char, TrieNode*> children;
        std::unordered_map<std::string, int> counts; // Sentence -> Freq
    };

    TrieNode* root;
    TrieNode* curr;
    std::string inputStr;

public:
    AutocompleteSystem(std::vector<std::string>& sentences, std::vector<int>& times) {
        root = new TrieNode();
        curr = root;
        inputStr = "";
        
        for (size_t i = 0; i < sentences.size(); ++i) {
            insert(sentences[i], times[i]);
        }
    }

    void insert(const std::string& s, int time) {
        TrieNode* node = root;
        for (char c : s) {
            if (!node->children.count(c)) node->children[c] = new TrieNode();
            node = node->children[c];
            node->counts[s] += time; // Update frequency at every step
        }
    }

    std::vector<std::string> input(char c) {
        if (c == '#') {
            insert(inputStr, 1);
            inputStr = "";
            curr = root;
            return {};
        }

        inputStr += c;
        if (curr && curr->children.count(c)) {
            curr = curr->children[c];
        } else {
            curr = nullptr;
            return {};
        }

        // Extract candidates from the current node
        // In a real system, you'd maintain a pre-sorted list in the node
        // instead of sorting a map every time.
        std::vector<std::pair<int, std::string>> candidates;
        for (auto& p : curr->counts) {
            candidates.push_back({p.second, p.first});
        }

        auto comp = [](const std::pair<int, std::string>& a, const std::pair<int, std::string>& b) {
            return a.first > b.first || (a.first == b.first && a.second < b.second);
        };

        // Partial sort to get top 3
        int k = std::min((int)candidates.size(), 3);
        std::partial_sort(candidates.begin(), candidates.begin() + k, candidates.end(), comp);

        std::vector<std::string> res;
        for (int i = 0; i < k; ++i) res.push_back(candidates[i].second);
        return res;
    }
};

STL Alternative: std::set (Ordered)

While C++ has no Trie, std::set<string> keeps strings sorted lexicographically. You can perform a pseudo-prefix search using lower_bound.

Example: set = {"apple", "app", "apricot", "banana"} Query Prefix: "apr"

  1. lower_bound("apr") to points to "apricot".
  2. Check if iterator valid and string starts with "apr".

Limitations:

  • Complexity: O(L log N). (Trie is O(L)).
  • If N (dictionary size) is huge, Trie wins. If N is small, std::set is cleaner code.

Summary: Trie vs. Others

Feature Trie unordered_map<string, T> set<string>
Exact Search O(L) O(L) (Avg) O(L \log N)
Prefix Search O(L) O(N \cdot L) (Scan all) O(L \log N)
Space High (Nodes) Medium (Strings) Medium (Strings)
Autocomplete Excellent Poor Decent (via Iterators)

C++ In Practice - Trie Implementation | Bijup.com