Wednesday, September 16, 2015

Fuzzy match of spoken words using string distance algorithms - The Leveshtein Distance

Conversational computing uses speech to interact with a computing system in a natural way. The key components to a conversational computing system are:
  1. Speech recognition
  2. Natural language understanding 
  3. Processing 
  4. Natural language generation 
  5. Text to speech 
There are several speech recognition services available to convert voice to text in a browser, such as the HTML5 Web Speech APIs or IBM’s Watson Speech APIs. For unconstrained speech recognition, the result of the recognition is open ended and does not always match a word in the expected vocabulary for the natural language understanding and processing components. We will look at the Levenshtein Distance (LD) algorithm (https://en.wikipedia.org/wiki/Levenshtein_distance) that can be used to perform a fuzzy match of the recognized word with a vocabulary.

There are two cases that we will consider for finding a recognized word in a vocabulary. Note that misspelled words are not a concern when using speech recognition systems, since the recognizer always returns correctly spelled words. However, LD is effective in matching misspelled words with a dictionary.

Plural words


To handle plurals of words in a vocabulary, both the singular and plural versions of the word need to be included. However, this greatly increases the size and complexity of the vocabulary, and hence the number of possible word combinations that have to be considered.

For example, if the vocabulary has the word ‘phone’, but the user either says ‘phones’ or the recognizer returns ‘phones’, the word would not be matched. However, fuzzy matching can be accomplished by using a string distance algorithm, such as the Levenshtein Distance (LD). The distance between two words is the number of deletions, insertions, or substitutions required to transform one word into the other.

To convert ‘phone’ to ‘phones’, there is one insertion. Thus, the distance is 1.

A distance of 0 means that the words are identical. The greater the distance, the more different the words are.

Similar words


Suppose that the speech recognizer returned the word ‘photo’. To convert ‘phone’ to ‘photo’ requires 2 substitutions resulting in a distance of 2.

If the word ‘photo' is in the vocabulary, then after cycling through all words in the vocabulary, ‘photo’ would be returned since it would have a distance of 0. However, if ‘photo’ is not in the vocabulary, then it may be reasonable to interpret the recognized word ‘photo’ as ‘phone’. The higher the distance, the less similar the matched words are.

NaturalNode (https://github.com/NaturalNode/natural) is an NPL library for Node.js that calculates the Levenshtein Distance. Using the examples above,

    var natural = require('natural');
    console.log(natural.LevenshteinDistance(“phone”,"phones"));
    console.log(natural.LevenshteinDistance(“phone”,”photo"));

Output

    1
    2

Which are the same results we obtained above.

By default, deletions, insertions and substitutions all have a cost of 1. However, the cost of each operation can be changed. If a substitution cost was 2, then the above would return

    console.log(natural.LevenshteinDistance("phone","phones",{
      insertion_cost: 1,
      deletion_cost: 1,
      substitution_cost: 2
    })); 
    console.log(natural.LevenshteinDistance("phone","photo",{
      insertion_cost: 1,
      deletion_cost: 1,
      substitution_cost: 2
    }));


Output

    1
    4

The results show that adding an 's' to a word to make it plural returns a distance of 1, while similar words phone and photo that require substitution returns a distance of 4.

Summary


We have seen how the Levenshtein Distance algorithm can be used to do a fuzzy match of plural words and similar words by calculating the distance between the two words. The lower the distance, the more similar the words are. This enables mis-recognized words from the speech recognition system to be matched to the closest word in a desired vocabulary.