Longest Common Subsequence

From Algorithmist

Jump to: navigation, search

Longest Common Subsequence is the problem of finding the longest common subsequence of two sequences of items. This is used in the "diff" file comparison utility.

The solution utilizes dynamic programming.

Contents

[edit] Overview

The problem is usually defined as:

Given two sequence of items, find the longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, in the string abcdefg, "abc", "abg", "bdf", "aeg" are all subsequences.

A naive exponential algorithm is to notice that a string of length n has O(2n) different subsequences, so we can take the shorter string, and test each of its subsequences for presence in the other string, greedily.

[edit] Recursive solution

We can try to solve the problem in terms of smaller subproblems. We are given two strings x and y, of length n and m respectively. We solve the problem of finding the longest common subsequence of x = x1...n and y = y1...m by taking the best of the three possible cases:

  1. The longest common subsequence of the strings x1...n - 1 and y1...m
  2. The longest common subsequence of the strings x1...n and y1...m - 1
  3. If xn is the same as ym, the longest common subsequence of the strings x1...n - 1 and y1...m - 1, followed by the common last character.

It is easy to construct a recursive solution from this:

func lcs(x,y)
   if ( length(x)=0 or length(y)=0 ) 
     return ""

   best = lcs(x[1,n-1],y[1,m])

   if ( length(best) < length(lcs(x[1,n],y[1,m-1])) )
      best = lcs(x[1,n],y[1,m-1])

   if ( x[n] = y[m] and length(best) < length(lcs(x[1,n-1],y[1,m-1]) 1 )
      best = lcs(x[1,n-1],y[1,m-1])   x[n]

   return best

[edit] Dynamic programming

Obviously, this is still not very efficient. But because the subproblems are repeated, we can use memoization. An even more (slightly) efficient way, which avoids the overhead of function calls, is to order the computation in such a way that whenever the results of subproblems are needed, they have already been computed, and can simply be looked up in a table. This is called Dynamic Programming.

In this case, we find lcs(x1..i,y1..j) for every i and j, starting from smaller ones, storing the results in an array at index (i,j) as we go along.

func lcs(x,y)
   n = length( x ), m = length( y )
   for i from 0 to n
      for j from 0 to m
         if ( i is 0 or j is 0 )
            table[i,j] = ""
            if ( x[i] == y[j] ) table[i,j] = x[i]
         else
            /* Sentinel */
            table[i,j] = table[i-1,j]
            if ( length( table[i,j] ) < length( table[i,j-1] ) )
               table[i,j] = table[i,j-1];
            if ( x[i] = y[j] and length( table[i,j] ) < length( table[i-1,j-1] )   1 )
               table[i,j] = table[i-1,j-1]   x[i];
   return table[n][m]

Notice how closely it parallels the recursive solution above, while entirely eliminating recursive calls. This "small" change makes the difference between exponential time and polynomial time.

[edit] further reading

Personal tools