Problem:
An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly. For example: “deposit”, “dopiest”, “posited” and “topside” are anagrams. Given a dictionary of English words, find all sets of anagrams.
The simplest solution to this would be to take a word and search the entire dictionary till we find all the anagrams. Considering a dictionary of 230,000 words this would require a search of 230,000 x 230,000 = 52,900 x 106 which is not feasible considering the time it requires.
This problem is best solved by using a signature based method. This involves
- Selecting a signature.
- Collecting words with the same signature.
Consider we have a dictionary:
dictionary: pans pots opt snap stop tops
1) Selecting a signature
A signature is a key which is common to all the elements belonging to a class. In our example of anagrams, we can use a signature based on sorting the characters in each word in an alphabetical order. So for our example, we get,
dictionary: pans pots opt snap stop tops
signature: anps opst opt anps opst opst
2) Collecting words with the same signature
To do this, we sort our dictionary using the signatures as the key. So for our example, we get,
dictionary: pans snap opt pots stop tops
signature: anps anps opt opst opst opst
3) We have now grouped together all the anagrams in the dictionary.
anagrams: pans snap, opt, pots stop tops.
This algorithm can be best visualized using Tom Cargill’s Hand Waving technique: sort this way(with a horizontal wave of the hand) and then that way(a vertical wave).
Practically Used:
In the late 1970′s, Bell Labs deployed a “user-operated directory assistance” program that allowed employees to look up a number in a company telephone directory using a standard push-button telephone. To find the number of the designer of the system, Mike Lesk, one pressed “LESK*M*” (that is, “5375*6*”) and the system spoke his number. One problem that arises in such systems is that different names may have the same push-button encoding; when this happens in Lesk’s system, it asks the user for more information. Given a large file of names, how would you locate these false matches?
Have you ever used such signature based algorithms to solve your problems?
[Updated] Phil from Programming Praxis had posted this problem in his post. You can also find the implementation in different languages.
Related Posts:
Tags: programming


In “In our example of anagrams, we can use a signature based on
>>>sorting the words in an alphabetical order. <<<<"
i think it should read: sorting the **characters within the words** in alphabetical order.
[Reply]
Kevin Reply:
February 12th, 2010 at 11:08 am
You are correct. Thanks.
[Reply]
I solved this problem at my blog, Programming Praxis, which provides a collection of exercises, updated weekly, for the education and enjoyment of the savvy programmer.
[Reply]
Kevin Reply:
February 12th, 2010 at 4:13 pm
You sure did Phil. I was trying my hand at a Python implementation. Your post definitely will help me
[Reply]
John Bentley described this anagram search algorithm in his “Programming Pearls” book (http://netlib.bell-labs.com/cm/cs/pearls/, first edition published in 1985).
[Reply]
Kevin Reply:
February 13th, 2010 at 7:53 am
That is from where I have read about this problem
[Reply]