Percentage of equality of texts?

7

Is there a method that lets you know the percentage of equality of two strings in Java?

An abstract algorithm would also work.

    
asked by anonymous 20.10.2014 / 11:20

1 answer

9

Levenshtein's Distance is one of the most well-known algorithms for this:

Java:

public class Levenshtein {

    public static int distance(String a, String b) {
        a = a.toLowerCase();
        b = b.toLowerCase();
        // i == 0
        int [] costs = new int [b.length() + 1];
        for (int j = 0; j < costs.length; j++)
            costs[j] = j;
        for (int i = 1; i <= a.length(); i++) {
            // j == 0; nw = lev(i - 1, j)
            costs[0] = i;
            int nw = i - 1;
            for (int j = 1; j <= b.length(); j++) {
                int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
                nw = costs[j];
                costs[j] = cj;
            }
        }
        return costs[b.length()];
    }

    public static void main(String [] args) {
        String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
        for (int i = 0; i < data.length; i += 2)
            System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
    }
}

Output:

distance(kitten, sitting) = 3
distance(saturday, sunday) = 3
distance(rosettacode, raisethysword) = 8

If you want the percentage difference, divide by the length of the string and multiply by 100.

Source:

link

JavaScript Demonstration:

function levenshtein(str1, str2) {
  var m = str1.length,
      n = str2.length,
      d = [],
      i, j;

  if (!m) return n;
  if (!n) return m;

  for (i = 0; i <= m; i++) d[i] = [i];
  for (j = 0; j <= n; j++) d[0][j] = j;

  for (j = 1; j <= n; j++) {
    for (i = 1; i <= m; i++) {
      if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1];
      else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1;
    }
  }
  return d[m][n];
}


function calc() {
  var t1 = document.getElementById('t1');
  var t2 = document.getElementById('t2');
  var r1 = document.getElementById('r1');
  var d  = levenshtein( t1.value, t2.value );
  r1.innerHTML = d;
  r2.innerHTML = ( 100 - Math.floor( 100 * d / Math.max( t1.value.length, t2.value.length))) + '%';
}
Palavra 1:<br>
<input id="t1" type="text" onKeyUp="calc()"><br>
Palavra 2:<br>
<input id="t2" type="text" onKeyUp="calc()"><br>
Distancia:<br>
<div id="r1">?</div>
Similaridade:<br>
<div id="r2">?</div>
    
20.10.2014 / 11:36