This type of problem can be solved easily using Dynamic Programming. In this problem, you are generally given two sequences and you need to find the length of the common subsequence present in them. These type of problem finds usage in bioinformatics.
A sub-sequence may be defined as the sequence that appear in relatively same order but not necessarily continuous. E.g.”abd” and “abc” are subsequences of “abdcfgrhd” .
Procedure
- Make a (len_substr_x * len_substr_y ) matrix of zeroes.
- If the x[i] == y[j] of the substrings match. Increment the value of the value of matrix at (i, j) by 1.
- Else fill the matrix at (i, j) with the value of
largest adjacent matrix cell. - The value at ( len_substr_x – 1, len_substr_y – 1) gives us the length of largest substring.
Code
###################### longest common subsequence ######################## def solve(x,y): len_x = len(x) len_y = len(y) # creating a Matrix mat = [[0] * (len_y) for i in range(len_x)] sequence = [] for i in range(len_x): for j in range(len_y): if x[i] == y [j]: mat[i][j] = mat[i-1][j-1] + 1 sequence.append(x[i]) else: mat[i][j] = max(mat[i - 1][j], mat[i][j - 1]) return mat[len_x-1][len_y-1] if __name__ == "__main__": x = 'abrxrxr' y = 'abtdrrxxr' print(solve(x,y))