Your Progress

15/30 completed

50% complete. Continue learning to master DSA!

Hash Tables and Hash Maps

Hash tables are powerful data structures that provide fast data retrieval using key-value pairs. They use a hash function to map keys to array indices, enabling constant-time access in the average case.

Key Takeaways

  • Hash tables provide O(1) average-case time complexity for search, insert, and delete operations
  • They use a hash function to map keys to array indices
  • Collision resolution techniques include chaining and open addressing
  • Hash tables are used in databases, caches, symbol tables, and associative arrays
  • The efficiency of a hash table depends on the quality of its hash function and load factor

What is a Hash Table?

A hash table (also called a hash map) is a data structure that implements an associative array or dictionary. It allows you to store and retrieve values based on keys rather than numeric indices. The key is processed by a hash function, which computes an index into an array of buckets or slots where the corresponding value is stored.

Key characteristics of hash tables:

  • Fast access: O(1) average time complexity for lookups, insertions, and deletions
  • Unordered: Elements are not stored in any particular sequence
  • Dynamic: Can grow or shrink as needed
  • Key-value storage: Values are accessed using unique keys

How Hash Tables Work

The core of a hash table is the hash function. This function takes a key as input and returns an integer that serves as the index in the underlying array where the corresponding value is stored.

The Hashing Process

1. Create a key

Start with a key (e.g., a string "apple")

2. Apply hash function

The hash function converts the key into a hash code (e.g., hash("apple") → 42)

3. Map to array index

The hash code is mapped to an array index (e.g., 42 % array_size → 2)

4. Store value

The value is stored at that index in the array


array[0] → null
array[1] → null
array[2] → {"apple": "red fruit"}
array[3] → null
array[4] → null
...
                

Hash Functions

A good hash function should:

  • Be fast to compute
  • Minimize collisions by distributing keys uniformly across the array
  • Generate the same output for the same input consistently
  • Use all the information provided in the key

Example of a simple hash function for strings:

function hashString(str, tableSize) {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    hash += str.charCodeAt(i);
  }
  return hash % tableSize;
}

Collision Resolution

A collision occurs when two different keys hash to the same index. Since this is inevitable with a good hash function (due to the pigeonhole principle), hash tables need strategies to handle collisions.

Chaining

In chaining, each array index points to a linked list of entries that hash to the same index. When a collision occurs, the new key-value pair is simply added to the end of the linked list.

Advantages:

  • Simple implementation
  • Hash table never fills up
  • Less sensitive to hash function quality

Disadvantages:

  • Requires extra memory for linked list pointers
  • Poor cache locality
  • Traversal time for worst case can be O(n)

Open Addressing

In open addressing, all elements are stored in the hash table itself. When a collision occurs, the algorithm looks for the next available slot according to a probing sequence.

Common probing techniques:

  • Linear Probing: Check the next slot sequentially
  • Quadratic Probing: Check slots at quadratic distances
  • Double Hashing: Use a second hash function to determine the step size

Considerations:

  • Better cache performance
  • Sensitive to load factor (should stay below 0.7)
  • Requires special handling for deletions
  • More vulnerable to clustering

Hash Table Implementation

JavaScript Implementation (with Chaining)

class HashTable {
  constructor(size = 53) {
    this.keyMap = new Array(size);
  }

  _hash(key) {
    let total = 0;
    const WEIRD_PRIME = 31;
    for (let i = 0; i < Math.min(key.length, 100); i++) {
      const char = key[i];
      const value = char.charCodeAt(0) - 96;
      total = (total * WEIRD_PRIME + value) % this.keyMap.length;
    }
    return total;
  }

  set(key, value) {
    const index = this._hash(key);
    if (!this.keyMap[index]) {
      this.keyMap[index] = [];
    }
    this.keyMap[index].push([key, value]);
    return index;
  }

  get(key) {
    const index = this._hash(key);
    if (!this.keyMap[index]) return undefined;
    
    for (let i = 0; i < this.keyMap[index].length; i++) {
      if (this.keyMap[index][i][0] === key) {
        return this.keyMap[index][i][1];
      }
    }
    return undefined;
  }

  keys() {
    const keysArr = [];
    for (let i = 0; i < this.keyMap.length; i++) {
      if (this.keyMap[i]) {
        for (let j = 0; j < this.keyMap[i].length; j++) {
          if (!keysArr.includes(this.keyMap[i][j][0])) {
            keysArr.push(this.keyMap[i][j][0]);
          }
        }
      }
    }
    return keysArr;
  }

  values() {
    const valuesArr = [];
    for (let i = 0; i < this.keyMap.length; i++) {
      if (this.keyMap[i]) {
        for (let j = 0; j < this.keyMap[i].length; j++) {
          if (!valuesArr.includes(this.keyMap[i][j][1])) {
            valuesArr.push(this.keyMap[i][j][1]);
          }
        }
      }
    }
    return valuesArr;
  }
}

Python Implementation (with Open Addressing)

class HashTable:
    def __init__(self, size=7):
        self.size = size
        self.count = 0
        self.keys = [None] * size
        self.values = [None] * size
        self.deleted = [False] * size
    
    def _hash(self, key):
        if isinstance(key, int):
            return key % self.size
        
        hash_value = 0
        for char in str(key):
            hash_value = (hash_value * 31 + ord(char)) % self.size
        return hash_value
    
    def _get_probe_index(self, key, i):
        # Linear probing
        return (self._hash(key) + i) % self.size
    
    def _should_resize(self):
        load_factor = self.count / self.size
        return load_factor > 0.7
    
    def _resize(self):
        old_keys = self.keys
        old_values = self.values
        old_deleted = self.deleted
        old_size = self.size
        
        self.size = old_size * 2
        self.keys = [None] * self.size
        self.values = [None] * self.size
        self.deleted = [False] * self.size
        self.count = 0
        
        for i in range(old_size):
            if old_keys[i] is not None and not old_deleted[i]:
                self.put(old_keys[i], old_values[i])
    
    def put(self, key, value):
        if self._should_resize():
            self._resize()
        
        i = 0
        while i < self.size:
            index = self._get_probe_index(key, i)
            
            # Empty slot or tombstone
            if self.keys[index] is None or self.deleted[index]:
                self.keys[index] = key
                self.values[index] = value
                self.deleted[index] = False
                self.count += 1
                return
            
            # Update existing key
            if self.keys[index] == key and not self.deleted[index]:
                self.values[index] = value
                return
            
            i += 1
        
        # If we get here, the table is full
        self._resize()
        self.put(key, value)
    
    def get(self, key):
        i = 0
        while i < self.size:
            index = self._get_probe_index(key, i)
            
            # Empty slot - key not found
            if self.keys[index] is None:
                return None
            
            # Found key
            if self.keys[index] == key and not self.deleted[index]:
                return self.values[index]
            
            i += 1
        
        # Key not found after checking all possible positions
        return None
    
    def remove(self, key):
        i = 0
        while i < self.size:
            index = self._get_probe_index(key, i)
            
            # Empty slot - key not found
            if self.keys[index] is None:
                return False
            
            # Found key
            if self.keys[index] == key and not self.deleted[index]:
                self.deleted[index] = True
                self.count -= 1
                return True
            
            i += 1
        
        # Key not found after checking all possible positions
        return False

Time and Space Complexity

OperationAverage CaseWorst Case
InsertO(1)O(n)
DeleteO(1)O(n)
SearchO(1)O(n)

The worst-case time complexity of O(n) occurs when all keys hash to the same index, creating a long chain or requiring many probes. However, with a good hash function and appropriate resizing, hash tables achieve O(1) time complexity for operations in the average case.

Note: The space complexity of a hash table is O(n), where n is the number of key-value pairs stored. When using open addressing, the actual space used might be higher due to the need to maintain a low load factor.

Applications of Hash Tables

Database Indexing

Hash tables are used to create indexes in databases for fast data retrieval when the exact key is known.

Caching

Web browsers, content delivery networks (CDNs), and other systems use hash tables to cache data for quick access.

Symbol Tables

Compilers and interpreters use hash tables to store variables, function names, and other identifiers.

Duplicate Detection

Hash tables can quickly identify duplicate elements in large datasets by checking if an element is already present.

Associative Arrays

Many programming languages implement dictionaries or associative arrays using hash tables (e.g., JavaScript objects, Python dictionaries).

Counting Frequencies

Hash tables are ideal for counting the frequency of items in a collection, such as words in a document or elements in an array.

Common Interview Questions

1. Two Sum

Given an array of integers and a target sum, find two numbers that add up to the target.

function twoSum(nums, target) {
  const map = new Map();
  
  for (let i = 0; i < nums.length; i++) {
    const complement = target - nums[i];
    
    if (map.has(complement)) {
      return [map.get(complement), i];
    }
    
    map.set(nums[i], i);
  }
  
  return null;
}

2. First Non-Repeating Character

Find the first non-repeating character in a string.

function firstNonRepeatingChar(s) {
  const charCount = {};
  
  // Count character occurrences
  for (const char of s) {
    charCount[char] = (charCount[char] || 0) + 1;
  }
  
  // Find first character with count 1
  for (let i = 0; i < s.length; i++) {
    if (charCount[s[i]] === 1) {
      return i;
    }
  }
  
  return -1;
}

Advantages and Disadvantages

Advantages

  • Fast lookups, insertions, and deletions (average O(1))
  • Flexible keys (any hashable object can be a key)
  • Dynamic size (can grow or shrink as needed)
  • Widely implemented in programming languages
  • Essential for many algorithms and applications

Disadvantages

  • Not suitable for ordered data
  • Performance degrades with high collision rate
  • Poor hash functions can lead to clustering
  • May require more memory than other data structures
  • Worst-case performance can be O(n)

Conclusion

Hash tables are powerful data structures that provide fast access to data using keys. With their average O(1) time complexity for lookups, insertions, and deletions, they are essential tools in a programmer's toolkit. Understanding how hash tables work, including hash functions and collision resolution strategies, is crucial for writing efficient code and solving a wide range of programming problems.

Remember

The efficiency of a hash table depends heavily on the quality of its hash function and the chosen collision resolution strategy. When implemented correctly, hash tables provide an optimal balance of speed and flexibility for key-value storage.

Next Steps

Now that you understand hash tables, you might want to explore these related topics:

Related Tutorials

Related Tutorials

Arrays

Master the fundamental data structure for sequential storage.

Learn more

Linked Lists

Learn about dynamic, pointer-based data structures.

Learn more

Trees

Explore hierarchical data structures for efficient operations.

Learn more