Your Progress

23/30 completed

77% complete. Continue learning to master DSA!

String Algorithms

String algorithms are specialized techniques for processing, searching, and manipulating text data. These algorithms are fundamental in search engines, text editors, bioinformatics, and many other applications.

Key Takeaways

  • Pattern matching algorithms like KMP, Rabin-Karp, and Boyer-Moore offer more efficient alternatives to brute force string search
  • String algorithms have applications in text editors, plagiarism detection, bioinformatics, and search engines
  • Most efficient string matching algorithms have linear or near-linear time complexity
  • Preprocessing techniques can significantly improve string search performance
  • Different algorithms excel in different scenarios based on pattern length, alphabet size, and expected text characteristics

Introduction to String Algorithms

String algorithms form a specialized branch of algorithms that deal with text processing. They tackle problems like finding patterns within text, comparing string similarity, and efficiently manipulating string data.

These algorithms are essential in a wide range of applications:

  • Search engines: Finding relevant documents based on search terms
  • Text editors: Implementing find and replace functionality
  • Bioinformatics: Processing DNA sequences and finding patterns
  • Spell checkers: Identifying and suggesting corrections for misspelled words
  • Data compression: Identifying repeated patterns to compress text
  • Plagiarism detection: Finding similarities between documents

Pattern Matching Algorithms

Pattern matching is one of the most common string operations, involving finding occurrences of a pattern string within a larger text.

1. Naive String Matching

The simplest approach is to check every possible position in the text for a match with the pattern.

function naiveStringMatch(text, pattern) {
  const n = text.length;
  const m = pattern.length;
  const positions = [];
  
  // Check each possible starting position in the text
  for (let i = 0; i <= n - m; i++) {
    let j;
    
    // Check if pattern matches at current position
    for (j = 0; j < m; j++) {
      if (text[i + j] !== pattern[j]) {
        break;
      }
    }
    
    // If we made it through the entire pattern, we found a match
    if (j === m) {
      positions.push(i);
    }
  }
  
  return positions;
}

// Example
const text = "ABABDABACDABABCABAB";
const pattern = "ABABC";
console.log(naiveStringMatch(text, pattern)); // [10]

Time Complexity:

  • Worst case: O(n*m) where n is the text length and m is the pattern length
  • Inefficient for large texts or patterns

2. Knuth-Morris-Pratt (KMP) Algorithm

KMP improves on the naive approach by avoiding redundant comparisons when a mismatch occurs. It uses a preprocessed array (called the "failure function" or "partial match table") to skip portions of the text.

The key insight is that when a mismatch occurs, we already know some information about the text from previous comparisons.

function kmpSearch(text, pattern) {
  if (pattern.length === 0) return [];
  
  // Preprocessing: Build the LPS (Longest Prefix Suffix) array
  const lps = computeLPSArray(pattern);
  const n = text.length;
  const m = pattern.length;
  const positions = [];
  
  let i = 0; // Index for text
  let j = 0; // Index for pattern
  
  while (i < n) {
    // Current characters match
    if (pattern[j] === text[i]) {
      i++;
      j++;
    }
    
    // Found a complete match
    if (j === m) {
      positions.push(i - j);
      j = lps[j - 1]; // Look for the next match
    }
    // Characters don't match
    else if (i < n && pattern[j] !== text[i]) {
      if (j !== 0) {
        j = lps[j - 1]; // Partial match, skip comparisons using LPS
      } else {
        i++; // No match at all, move to next character in text
      }
    }
  }
  
  return positions;
}

function computeLPSArray(pattern) {
  const m = pattern.length;
  const lps = Array(m).fill(0);
  
  let len = 0;
  let i = 1;
  
  while (i < m) {
    if (pattern[i] === pattern[len]) {
      len++;
      lps[i] = len;
      i++;
    } else {
      if (len !== 0) {
        len = lps[len - 1];
      } else {
        lps[i] = 0;
        i++;
      }
    }
  }
  
  return lps;
}

Time Complexity:

  • Preprocessing: O(m)
  • Matching: O(n)
  • Overall: O(n+m)

3. Rabin-Karp Algorithm

Rabin-Karp uses hashing to find pattern matches. Instead of checking every character, it compares hash values of the pattern and substrings of the text.

Key features:

  • Uses rolling hash function to efficiently compute hash values
  • Particularly effective for multiple pattern search
  • Hash collisions require additional character-by-character verification

Time Complexity:

  • Average case: O(n+m)
  • Worst case: O(n*m) (when many hash collisions occur)

Advanced String Algorithms

Boyer-Moore Algorithm

Often the fastest string matching algorithm in practice, using two heuristics to skip characters:

  • Bad character rule: Skip alignments where text character doesn't match pattern
  • Good suffix rule: Skip to the next occurrence of already matched suffix

Aho-Corasick Algorithm

Efficient for matching multiple patterns simultaneously:

  • Builds a finite state machine from the patterns
  • Can find all occurrences of all patterns in O(n + m + z) time
  • Where z is the total number of pattern occurrences

Suffix Trees & Arrays

Powerful data structures for complex string operations:

  • Allow for O(m) substring search regardless of text size
  • Support finding longest common substring, all repeating patterns, etc.
  • Space-intensive but extremely versatile

Levenshtein Distance

Measures the difference between two strings:

  • Counts minimum number of single-character edits required to transform one string into another
  • Used in spell checking, DNA sequence analysis, and plagiarism detection
  • Implemented using dynamic programming

Common String Manipulation Techniques

String Reversal

// JavaScript string reversal
function reverseString(str) {
  return str.split('').reverse().join('');
}

// Manual implementation
function reverseStringManual(str) {
  let result = '';
  for (let i = str.length - 1; i >= 0; i--) {
    result += str[i];
  }
  return result;
}

Checking for Palindromes

function isPalindrome(str) {
  // Remove non-alphanumeric characters and convert to lowercase
  const cleanStr = str.replace(/[^A-Za-z0-9]/g, '').toLowerCase();
  
  // Two-pointer approach
  let left = 0;
  let right = cleanStr.length - 1;
  
  while (left < right) {
    if (cleanStr[left] !== cleanStr[right]) {
      return false;
    }
    left++;
    right--;
  }
  
  return true;
}

Finding Anagrams

function areAnagrams(str1, str2) {
  // Remove spaces and convert to lowercase
  const normalize = str => str.replace(/s/g, '').toLowerCase();
  
  const s1 = normalize(str1);
  const s2 = normalize(str2);
  
  // Quick check: if lengths differ, not anagrams
  if (s1.length !== s2.length) return false;
  
  // Count character frequencies
  const charCount = {};
  
  for (let char of s1) {
    charCount[char] = (charCount[char] || 0) + 1;
  }
  
  for (let char of s2) {
    if (!charCount[char]) return false;
    charCount[char]--;
  }
  
  return true;
}

Real-World Applications

ApplicationAlgorithm/TechniqueWhy It's Used
Search EnginesSuffix Trees, Inverted IndicesFast lookups for billions of documents
Spell CheckersLevenshtein Distance, TriesIdentify similar words within edit distance
DNA Sequence AnalysisSuffix Arrays, KMP, Boyer-MooreFind patterns in genomic sequences
Plagiarism DetectionRabin-Karp, Document FingerprintingIdentify shared text between documents
Data CompressionHuffman Coding, Lempel-ZivIdentify and encode repeated patterns

Conclusion

String algorithms are fundamental tools in computer science with wide-ranging applications. From basic string manipulation to complex pattern matching, these algorithms power many of the technologies we use daily.

Remember

When working with string algorithms:

  • Choose the algorithm based on your specific needs and constraints
  • Consider preprocessing costs vs. search efficiency tradeoffs
  • Be aware of worst-case scenarios for each algorithm
  • Specialized data structures like tries and suffix trees can dramatically improve performance

Next Steps

To continue your journey with string algorithms, explore these related topics:

Related Tutorials

Related Tutorials

Dynamic Programming

Learn how DP is applied to many string problems like longest common subsequence.

Learn more

Hash Tables

Understand the data structure that powers efficient string matching algorithms.

Learn more

Tries

Explore tree-based data structures optimized for string operations.

Learn more