Your Progress
50% complete. Continue learning to master DSA!
Hash Tables and Hash Maps
Hash tables are powerful data structures that provide fast data retrieval using key-value pairs. They use a hash function to map keys to array indices, enabling constant-time access in the average case.
Key Takeaways
- Hash tables provide O(1) average-case time complexity for search, insert, and delete operations
- They use a hash function to map keys to array indices
- Collision resolution techniques include chaining and open addressing
- Hash tables are used in databases, caches, symbol tables, and associative arrays
- The efficiency of a hash table depends on the quality of its hash function and load factor
What is a Hash Table?
A hash table (also called a hash map) is a data structure that implements an associative array or dictionary. It allows you to store and retrieve values based on keys rather than numeric indices. The key is processed by a hash function, which computes an index into an array of buckets or slots where the corresponding value is stored.
Key characteristics of hash tables:
- Fast access: O(1) average time complexity for lookups, insertions, and deletions
- Unordered: Elements are not stored in any particular sequence
- Dynamic: Can grow or shrink as needed
- Key-value storage: Values are accessed using unique keys
How Hash Tables Work
The core of a hash table is the hash function. This function takes a key as input and returns an integer that serves as the index in the underlying array where the corresponding value is stored.
The Hashing Process
1. Create a key
Start with a key (e.g., a string "apple")
2. Apply hash function
The hash function converts the key into a hash code (e.g., hash("apple") → 42)
3. Map to array index
The hash code is mapped to an array index (e.g., 42 % array_size → 2)
4. Store value
The value is stored at that index in the array
array[0] → null array[1] → null array[2] → {"apple": "red fruit"} array[3] → null array[4] → null ...
Hash Functions
A good hash function should:
- Be fast to compute
- Minimize collisions by distributing keys uniformly across the array
- Generate the same output for the same input consistently
- Use all the information provided in the key
Example of a simple hash function for strings:
function hashString(str, tableSize) { let hash = 0; for (let i = 0; i < str.length; i++) { hash += str.charCodeAt(i); } return hash % tableSize; }
Collision Resolution
A collision occurs when two different keys hash to the same index. Since this is inevitable with a good hash function (due to the pigeonhole principle), hash tables need strategies to handle collisions.
Chaining
In chaining, each array index points to a linked list of entries that hash to the same index. When a collision occurs, the new key-value pair is simply added to the end of the linked list.
Advantages:
- Simple implementation
- Hash table never fills up
- Less sensitive to hash function quality
Disadvantages:
- Requires extra memory for linked list pointers
- Poor cache locality
- Traversal time for worst case can be O(n)
Open Addressing
In open addressing, all elements are stored in the hash table itself. When a collision occurs, the algorithm looks for the next available slot according to a probing sequence.
Common probing techniques:
- Linear Probing: Check the next slot sequentially
- Quadratic Probing: Check slots at quadratic distances
- Double Hashing: Use a second hash function to determine the step size
Considerations:
- Better cache performance
- Sensitive to load factor (should stay below 0.7)
- Requires special handling for deletions
- More vulnerable to clustering
Hash Table Implementation
JavaScript Implementation (with Chaining)
class HashTable { constructor(size = 53) { this.keyMap = new Array(size); } _hash(key) { let total = 0; const WEIRD_PRIME = 31; for (let i = 0; i < Math.min(key.length, 100); i++) { const char = key[i]; const value = char.charCodeAt(0) - 96; total = (total * WEIRD_PRIME + value) % this.keyMap.length; } return total; } set(key, value) { const index = this._hash(key); if (!this.keyMap[index]) { this.keyMap[index] = []; } this.keyMap[index].push([key, value]); return index; } get(key) { const index = this._hash(key); if (!this.keyMap[index]) return undefined; for (let i = 0; i < this.keyMap[index].length; i++) { if (this.keyMap[index][i][0] === key) { return this.keyMap[index][i][1]; } } return undefined; } keys() { const keysArr = []; for (let i = 0; i < this.keyMap.length; i++) { if (this.keyMap[i]) { for (let j = 0; j < this.keyMap[i].length; j++) { if (!keysArr.includes(this.keyMap[i][j][0])) { keysArr.push(this.keyMap[i][j][0]); } } } } return keysArr; } values() { const valuesArr = []; for (let i = 0; i < this.keyMap.length; i++) { if (this.keyMap[i]) { for (let j = 0; j < this.keyMap[i].length; j++) { if (!valuesArr.includes(this.keyMap[i][j][1])) { valuesArr.push(this.keyMap[i][j][1]); } } } } return valuesArr; } }
Python Implementation (with Open Addressing)
class HashTable: def __init__(self, size=7): self.size = size self.count = 0 self.keys = [None] * size self.values = [None] * size self.deleted = [False] * size def _hash(self, key): if isinstance(key, int): return key % self.size hash_value = 0 for char in str(key): hash_value = (hash_value * 31 + ord(char)) % self.size return hash_value def _get_probe_index(self, key, i): # Linear probing return (self._hash(key) + i) % self.size def _should_resize(self): load_factor = self.count / self.size return load_factor > 0.7 def _resize(self): old_keys = self.keys old_values = self.values old_deleted = self.deleted old_size = self.size self.size = old_size * 2 self.keys = [None] * self.size self.values = [None] * self.size self.deleted = [False] * self.size self.count = 0 for i in range(old_size): if old_keys[i] is not None and not old_deleted[i]: self.put(old_keys[i], old_values[i]) def put(self, key, value): if self._should_resize(): self._resize() i = 0 while i < self.size: index = self._get_probe_index(key, i) # Empty slot or tombstone if self.keys[index] is None or self.deleted[index]: self.keys[index] = key self.values[index] = value self.deleted[index] = False self.count += 1 return # Update existing key if self.keys[index] == key and not self.deleted[index]: self.values[index] = value return i += 1 # If we get here, the table is full self._resize() self.put(key, value) def get(self, key): i = 0 while i < self.size: index = self._get_probe_index(key, i) # Empty slot - key not found if self.keys[index] is None: return None # Found key if self.keys[index] == key and not self.deleted[index]: return self.values[index] i += 1 # Key not found after checking all possible positions return None def remove(self, key): i = 0 while i < self.size: index = self._get_probe_index(key, i) # Empty slot - key not found if self.keys[index] is None: return False # Found key if self.keys[index] == key and not self.deleted[index]: self.deleted[index] = True self.count -= 1 return True i += 1 # Key not found after checking all possible positions return False
Time and Space Complexity
Operation | Average Case | Worst Case |
---|---|---|
Insert | O(1) | O(n) |
Delete | O(1) | O(n) |
Search | O(1) | O(n) |
The worst-case time complexity of O(n) occurs when all keys hash to the same index, creating a long chain or requiring many probes. However, with a good hash function and appropriate resizing, hash tables achieve O(1) time complexity for operations in the average case.
Note: The space complexity of a hash table is O(n), where n is the number of key-value pairs stored. When using open addressing, the actual space used might be higher due to the need to maintain a low load factor.
Applications of Hash Tables
Database Indexing
Hash tables are used to create indexes in databases for fast data retrieval when the exact key is known.
Caching
Web browsers, content delivery networks (CDNs), and other systems use hash tables to cache data for quick access.
Symbol Tables
Compilers and interpreters use hash tables to store variables, function names, and other identifiers.
Duplicate Detection
Hash tables can quickly identify duplicate elements in large datasets by checking if an element is already present.
Associative Arrays
Many programming languages implement dictionaries or associative arrays using hash tables (e.g., JavaScript objects, Python dictionaries).
Counting Frequencies
Hash tables are ideal for counting the frequency of items in a collection, such as words in a document or elements in an array.
Common Interview Questions
1. Two Sum
Given an array of integers and a target sum, find two numbers that add up to the target.
function twoSum(nums, target) { const map = new Map(); for (let i = 0; i < nums.length; i++) { const complement = target - nums[i]; if (map.has(complement)) { return [map.get(complement), i]; } map.set(nums[i], i); } return null; }
2. First Non-Repeating Character
Find the first non-repeating character in a string.
function firstNonRepeatingChar(s) { const charCount = {}; // Count character occurrences for (const char of s) { charCount[char] = (charCount[char] || 0) + 1; } // Find first character with count 1 for (let i = 0; i < s.length; i++) { if (charCount[s[i]] === 1) { return i; } } return -1; }
Advantages and Disadvantages
Advantages
- Fast lookups, insertions, and deletions (average O(1))
- Flexible keys (any hashable object can be a key)
- Dynamic size (can grow or shrink as needed)
- Widely implemented in programming languages
- Essential for many algorithms and applications
Disadvantages
- Not suitable for ordered data
- Performance degrades with high collision rate
- Poor hash functions can lead to clustering
- May require more memory than other data structures
- Worst-case performance can be O(n)
Conclusion
Hash tables are powerful data structures that provide fast access to data using keys. With their average O(1) time complexity for lookups, insertions, and deletions, they are essential tools in a programmer's toolkit. Understanding how hash tables work, including hash functions and collision resolution strategies, is crucial for writing efficient code and solving a wide range of programming problems.
Remember
The efficiency of a hash table depends heavily on the quality of its hash function and the chosen collision resolution strategy. When implemented correctly, hash tables provide an optimal balance of speed and flexibility for key-value storage.
Next Steps
Now that you understand hash tables, you might want to explore these related topics:
Related Tutorials
Related Tutorials
Arrays
Master the fundamental data structure for sequential storage.
Learn moreLinked Lists
Learn about dynamic, pointer-based data structures.
Learn moreTrees
Explore hierarchical data structures for efficient operations.
Learn more