Build a recursive word finding algorithm with Python - Part 2

by Dan Duda

Requirements

  • Python 3.x
  • PyCharm Community (optional)
  • A text file of English words
  • Basic Python knowledge

Goto Part 1

Introduction

In the last tutorial we setup our project to read in a list of words from a text file and then prompt the user for a word. We then print if the word was found in the list or not. In this tutorial we’ll create an algorithm to check all the combinations of letters that we enter to see what words, if any, we can make from them.

Recursive functions are functions that call themselves. In theory any recursive algorithm can be written non-recursively using just loops but depending on the scenario it can be much more complex to write. The downside of recursive functions is that they can take up a lot more memory depending on the scenario. Each recursive call to the function means that execution must leave the current function call by adding another call onto the stack in memory. Depending on how deep the recursion goes you could end up with a stack overflow error. A recursive function will have a base case and a recursive case. The base case causes the recursive function to end (stop calling itself) whereas the recursive case will call itself again. We can visualize recursion using a simple example of calculating factorials. A factorial is the product of all positive integers less than or equal to n, where n is a non-negative integer, and is denoted by n!. So for example:

1
5! = 5 X 4 X 3 X 2 X 1 = 120

To construct a recursive function to solve this we first need to determine the “base case”. In this scenario the base case will be when n reaches 1. All other values should continue to call itself.

1
2
3
4
5
6
7
8
9
10
def factorial(n):
if n > 1:
# recursive case
return n * factorial(n - 1)
else:
# base case
return 1


print(factorial(5))

If we run this we get the answer 120.

Let’s visualize the code running for 4!.


For our word search algorithm we’ll start by creating an empty list called “found_words” to store the words that are found. Then we’ll loop over the letters that we’re searching one at a time. However, for each letter we’ll then loop over all the remaining letters adding each letter to our first letter, one at a time, each time checking against our “words” list to see if the combination of letters makes a word. If they do, then we add the word to our “found_words” list. Execution ends when there are no more remaining letters. At this point the function returns to the calling function (which in this case is itself), and so on and so on until it reaches the first call. Once all paths have returned we are done and can display the list of words that were found.

So, say we have the letters “abc”. We’re going to “kickoff” the algorithm by looping over those letters one by one. For each letter we’re going to call our function asking, give me all the words that start with this letter and use these remaining letters. So, the function will need to know two things, the combination of letters we’re testing so far and the remaining pool of letters. The remaining letters are the letters left over after we remove the current letter we’re searching on. The function then calls itself by looping over the remaining letters and appending the next letter onto to the current letter. Our “base case” is when there are no more letters.

1
2
3
4
5
6
7
8
9
10
11
12
Loop over “abc”.
Find(‘a’, ‘bc’)
Find(‘ab’, ‘c’)
Find(‘abc, ‘’) – now the func can return as there are no more letters
‘c’ was the only letter in loop so return again
Find(‘b’, ‘ac)
Find(‘ba’, ‘c’)
Find(‘bac’, ‘’) – again, no more letters so return
Return
Find(‘c’, ‘ab’)
Find(‘ca’, ‘b’)
Find(‘cab’, ‘’) – ‘cab’ is a word so add to list

For our function I’m going to ignore words less than 3 letters long since most word games ignore them. There will be 3 parameters to our function, the current word part that we want to check, the remaining pool of letters, and a list containing the words we’ve found so far. Our function will continue to call itself as long as there are letter remaining to check.

1
2
3
4
5
6
7
8
9
def find_words(word_part, remaining_letters, found_words):
if len(word_part) >= 3 and word_part not in found_words and word_part in words:
found_words.append(word_part)

for i in range(len(remaining_letters)):
remaining_letters_copy = remaining_letters.copy()
next_letter = remaining_letters_copy[i]
del remaining_letters_copy[i]
self.search_words(word_part + next_letter, remaining_letters_copy, found_words)

The first thing we do in our function is check whether the current word part is a word and if so we add it to our found_words list. We ignore words less than 3 letters long and we check if the word is already in our found_words list and if so we ignore it. The last part “and word_part in words” checks if the word exists in our word list. If it does we add it to our found_words list.

The second part of our function is the for loop. We loop of the remaining letters and recursivley call our function again. The remaining_letters parameter should be passed in as a list. This will enable us to use the list copy method on it. Inside the loop we make a copy of the remaining letters and remove the current letter. If we didn’t make a copy of the remaining letters first then removing the letter would remove it for each call to our function which isn’t what we want. Each recurisve call whould have its own copy of the remaining letters to work with. We then call our find_words function again passing in the current letter appended on to the previous word part and the new remaining letters as well as the list of found words.

Putting it all together our program should now look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def find_words(word_part, remaining_letters, found_words):
if len(word_part) >= 3 and word_part not in found_words and word_part in words:
found_words.append(word_part)

for i in range(len(remaining_letters)):
remaining_letters_copy = remaining_letters.copy()
next_letter = remaining_letters_copy[i]
del remaining_letters_copy[i]
self.search_words(word_part + next_letter, remaining_letters_copy, found_words)


f = open('words_alpha.txt', "r")
words = [l.rstrip() for l in f.readlines()]
f.close()

found = []
s = input('Enter a string: ')
while s != '':
letters = list(s.lower())
find_words('', letters, found)

print(found)

found.clear()
s = input('Enter a string: ')

We first get input from the user which is the string of letters making sure to convert to lower case since our word list is lower case. We convert that string to a list and store in letters. We then call our find_words function passing in an empty string for the word_part and the letters as the remaining letters. The reason we pass in an empty string is because our function will start by looping over the remaining letters which on the first call will be all the letters. We also pass in an empty list “found” to store any found words. We then print the list of found words that were found. We clear the list of found words before the next run otherwise we’d be adding on words to the last run.

When inputing “abc” and “book” I get the following:

1
2
3
4
Enter a string: abc
['abc', 'bac', 'cab']
Enter a string: book
['abc', 'bac', 'cab', 'boo', 'book', 'boko', 'kob']

If using the word list from “Part 1” then we get a lot of words that aren’t neccessarily everyday words. But this is okay for our purposes. Entering more than 4 letters might cause the program to appear to be not responding. Even on my fairly fast machine using the input of “cabinet” took over a minute to run. Obviously our algorithm does not seem very efficient. Let’s look at how to improve this in the next tutorial.

Goto Part 3