Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Wednesday, January 19, 2011

Write an Algorithm To calculate distance between two text strings

Give an algorithm that calculates the distance between two text strings (only operations you can have are: delete, add, and change, one by one).



1)
The distance of two strings, like "car" and "cat", is 1. So all we need to do is, compare two strings one character by one character and get the distance value. The complexity is O(max(m,n)), m, n are the length of each string.

2)
Let's think about the string as a vector with coordinates represented by each character ASCII value i.e.
1st string (str1): (x1 x2 ... xm)
where
x1 = str1[0] x2 = str1[1] ... xm = str1[m-1]
2nd string (str2): (y1 y2 ... yn)
y1 == str2[0] y2 == str2[1] ... yn=str2[n-1]
Assume that m<n.
In this case the distance beween str1 and str2:
D(str1 str2) = [(x1-y1)^2 + (x2-y2)^2 + ... + (xm-ym)^2 + y(m+1)^2 + ....
+ y(n-1)^2 + yn^2]^(1/2)




3)
Stanard approach is dynamic programming with complexity O(N*M).
Let strings are A[1..N] and B[1..M]. We use array DP[0..N][0..M]. DP[i][j] stores the answer for prefixes of length i and j. DP[N][M] is the answer.

DP[i][j] = i+j (if i = 0 or j = 0)
DP[i][j] = (S1[i]==S2[j]) ? 1+DP[i-1][j-1] : 1+min(DP[i-1][j] DP[i][j-1]) (otherwise)

 




4)
The distance of two strings like "car" and "cat" is 1. So all we need to do is compare two strings one character by one character and get the distance value. The complexity is O(max(m n)) m n are the length of each string.


5)
In python -

# Input is string s1 and string s2

# Check which string is bigger (compares ascii value)
if s1 > s2:
b s = s1 s2
else :
b s = s2 s1

# Covert string to list
l1 l2 = list(b) list(s)

# Check lenght of string
if len(l1) < len(l2):
looplength = len(l1)
else:
looplength = len(l2)

# Run loop to min lenght - O(n)
for i in range(looplenght):
distance = ord(l1[i]) - ord(l2[i]) # ord() converts char to Ascii value
if distance != 0:
# We have our distance from the first non-matching characters
print " s is the disatance" distance
break




6)  

No comments:

Post a Comment