Your Progress

23/30 completed

77% complete. Continue learning to master DSA!

String Algorithms

String algorithms are specialized techniques for processing, searching, and manipulating text data. These algorithms are fundamental in search engines, text editors, bioinformatics, and many other applications.

Key Takeaways

Pattern matching algorithms like KMP, Rabin-Karp, and Boyer-Moore offer more efficient alternatives to brute force string search
String algorithms have applications in text editors, plagiarism detection, bioinformatics, and search engines
Most efficient string matching algorithms have linear or near-linear time complexity
Preprocessing techniques can significantly improve string search performance
Different algorithms excel in different scenarios based on pattern length, alphabet size, and expected text characteristics

Introduction to String Algorithms

String algorithms form a specialized branch of algorithms that deal with text processing. They tackle problems like finding patterns within text, comparing string similarity, and efficiently manipulating string data.

These algorithms are essential in a wide range of applications:

Search engines: Finding relevant documents based on search terms
Text editors: Implementing find and replace functionality
Bioinformatics: Processing DNA sequences and finding patterns
Spell checkers: Identifying and suggesting corrections for misspelled words
Data compression: Identifying repeated patterns to compress text
Plagiarism detection: Finding similarities between documents

Pattern Matching Algorithms

Pattern matching is one of the most common string operations, involving finding occurrences of a pattern string within a larger text.

1. Naive String Matching

The simplest approach is to check every possible position in the text for a match with the pattern.

function naiveStringMatch(text, pattern) {
  const n = text.length;
  const m = pattern.length;
  const positions = [];
  
  // Check each possible starting position in the text
  for (let i = 0; i <= n - m; i++) {
    let j;
    
    // Check if pattern matches at current position
    for (j = 0; j < m; j++) {
      if (text[i + j] !== pattern[j]) {
        break;
      }
    }
    
    // If we made it through the entire pattern, we found a match
    if (j === m) {
      positions.push(i);
    }
  }
  
  return positions;
}

// Example
const text = "ABABDABACDABABCABAB";
const pattern = "ABABC";
console.log(naiveStringMatch(text, pattern)); // [10]

Time Complexity:

Worst case: O(n*m) where n is the text length and m is the pattern length
Inefficient for large texts or patterns

2. Knuth-Morris-Pratt (KMP) Algorithm

KMP improves on the naive approach by avoiding redundant comparisons when a mismatch occurs. It uses a preprocessed array (called the "failure function" or "partial match table") to skip portions of the text.

The key insight is that when a mismatch occurs, we already know some information about the text from previous comparisons.

function kmpSearch(text, pattern) {
  if (pattern.length === 0) return [];
  
  // Preprocessing: Build the LPS (Longest Prefix Suffix) array
  const lps = computeLPSArray(pattern);
  const n = text.length;
  const m = pattern.length;
  const positions = [];
  
  let i = 0; // Index for text
  let j = 0; // Index for pattern
  
  while (i < n) {
    // Current characters match
    if (pattern[j] === text[i]) {
      i++;
      j++;
    }
    
    // Found a complete match
    if (j === m) {
      positions.push(i - j);
      j = lps[j - 1]; // Look for the next match
    }
    // Characters don't match
    else if (i < n && pattern[j] !== text[i]) {
      if (j !== 0) {
        j = lps[j - 1]; // Partial match, skip comparisons using LPS
      } else {
        i++; // No match at all, move to next character in text
      }
    }
  }
  
  return positions;
}

function computeLPSArray(pattern) {
  const m = pattern.length;
  const lps = Array(m).fill(0);
  
  let len = 0;
  let i = 1;
  
  while (i < m) {
    if (pattern[i] === pattern[len]) {
      len++;
      lps[i] = len;
      i++;
    } else {
      if (len !== 0) {
        len = lps[len - 1];
      } else {
        lps[i] = 0;
        i++;
      }
    }
  }
  
  return lps;
}

Time Complexity:

Preprocessing: O(m)
Matching: O(n)
Overall: O(n+m)

3. Rabin-Karp Algorithm

Rabin-Karp uses hashing to find pattern matches. Instead of checking every character, it compares hash values of the pattern and substrings of the text.

Key features:

Uses rolling hash function to efficiently compute hash values
Particularly effective for multiple pattern search
Hash collisions require additional character-by-character verification

Time Complexity:

Average case: O(n+m)
Worst case: O(n*m) (when many hash collisions occur)

Advanced String Algorithms

Boyer-Moore Algorithm

Often the fastest string matching algorithm in practice, using two heuristics to skip characters:

Bad character rule: Skip alignments where text character doesn't match pattern
Good suffix rule: Skip to the next occurrence of already matched suffix

Aho-Corasick Algorithm

Efficient for matching multiple patterns simultaneously:

Builds a finite state machine from the patterns
Can find all occurrences of all patterns in O(n + m + z) time
Where z is the total number of pattern occurrences

Suffix Trees & Arrays

Powerful data structures for complex string operations:

Allow for O(m) substring search regardless of text size
Support finding longest common substring, all repeating patterns, etc.
Space-intensive but extremely versatile

Levenshtein Distance

Measures the difference between two strings:

Counts minimum number of single-character edits required to transform one string into another
Used in spell checking, DNA sequence analysis, and plagiarism detection
Implemented using dynamic programming

Common String Manipulation Techniques

String Reversal

// JavaScript string reversal
function reverseString(str) {
  return str.split('').reverse().join('');
}

// Manual implementation
function reverseStringManual(str) {
  let result = '';
  for (let i = str.length - 1; i >= 0; i--) {
    result += str[i];
  }
  return result;
}

Checking for Palindromes

function isPalindrome(str) {
  // Remove non-alphanumeric characters and convert to lowercase
  const cleanStr = str.replace(/[^A-Za-z0-9]/g, '').toLowerCase();
  
  // Two-pointer approach
  let left = 0;
  let right = cleanStr.length - 1;
  
  while (left < right) {
    if (cleanStr[left] !== cleanStr[right]) {
      return false;
    }
    left++;
    right--;
  }
  
  return true;
}

Finding Anagrams

function areAnagrams(str1, str2) {
  // Remove spaces and convert to lowercase
  const normalize = str => str.replace(/s/g, '').toLowerCase();
  
  const s1 = normalize(str1);
  const s2 = normalize(str2);
  
  // Quick check: if lengths differ, not anagrams
  if (s1.length !== s2.length) return false;
  
  // Count character frequencies
  const charCount = {};
  
  for (let char of s1) {
    charCount[char] = (charCount[char] || 0) + 1;
  }
  
  for (let char of s2) {
    if (!charCount[char]) return false;
    charCount[char]--;
  }
  
  return true;
}

Real-World Applications

Application	Algorithm/Technique	Why It's Used
Search Engines	Suffix Trees, Inverted Indices	Fast lookups for billions of documents
Spell Checkers	Levenshtein Distance, Tries	Identify similar words within edit distance
DNA Sequence Analysis	Suffix Arrays, KMP, Boyer-Moore	Find patterns in genomic sequences
Plagiarism Detection	Rabin-Karp, Document Fingerprinting	Identify shared text between documents
Data Compression	Huffman Coding, Lempel-Ziv	Identify and encode repeated patterns

Conclusion

String algorithms are fundamental tools in computer science with wide-ranging applications. From basic string manipulation to complex pattern matching, these algorithms power many of the technologies we use daily.

Remember

When working with string algorithms:

Choose the algorithm based on your specific needs and constraints
Consider preprocessing costs vs. search efficiency tradeoffs
Be aware of worst-case scenarios for each algorithm
Specialized data structures like tries and suffix trees can dramatically improve performance

Next Steps

To continue your journey with string algorithms, explore these related topics:

Dynamic Programming

Learn how DP applies to string problems

Tries

Explore tree structures for efficient string operations

Hash Tables

Understand hashing used in string algorithms

Introduction to DSA

Basic Data Structures

Advanced Data Structures

Searching Algorithms

Sorting Algorithms

Advanced Algorithms

Your Progress

String Algorithms

Key Takeaways

Introduction to String Algorithms

Pattern Matching Algorithms

1. Naive String Matching

Time Complexity:

2. Knuth-Morris-Pratt (KMP) Algorithm

Time Complexity:

3. Rabin-Karp Algorithm

Time Complexity:

Advanced String Algorithms

Boyer-Moore Algorithm

Aho-Corasick Algorithm

Suffix Trees & Arrays

Levenshtein Distance

Common String Manipulation Techniques

String Reversal

Checking for Palindromes

Finding Anagrams

Real-World Applications

Conclusion

Remember

Next Steps

Dynamic Programming

Tries

Hash Tables

Related Tutorials

Related Tutorials

Dynamic Programming

Hash Tables

Tries