pyword Featured

An efficient Boggle solver using a trie

Connor Newton

Mar 8, 2023 • 5 min read

I like word games - classics such as Scrabble and Boggle, and contemporary ones such as Bananagrams and Wordle are often played or discussed ewith my family and friends. Playing these games over recent months I have observed and become interested in the distinct tactics between players and whether these could be quantified and analysed.

For example, I'm not bad (*brushes shoulder*) at Boggle, thanks to an apparent gift for identifying three letter words (each worth 1 point), while my typical opponents (suck it, Sónia, Mum!) identify fewer, longer words which are arguably worth less proportional to the time taken to identify them.

Similarly my Bananagrams grids tend to be more tightly packed collections of shorter words than loose, sprawling layouts of far more interesting and impressive ones.

Perhaps, I thought, there could be patterns emerging, and if so how do these different playing styles align with what is actually available to play or score? Is it true that there aren't enough 4, 5 and 6+ words on the Boggle grid to make them worth looking for, or do I just suck at finding them?

And so in pursuit of knowledge, and more importantly advantage in future games, writing a solver for one of these games seemed like a good first step. Conveniently, the notion for such a solver had already occurred to me...

The rules of Boggle

For the uninitiated, are quite simple. Letters on the 4x4 game grid are randomized and a three minute game timer started.

Unlike a simple word search where consecutive letters of a word follow a straight line, words in Boggle can be made up of a path of letters that weaves in many directions as long as consecutive letters are adjacent (including diagonally) and no letters are used more than once.

In the English version there is a special character qu which when used counts as the two letters under the condition that both must be used and in that order. This is interesting as this two-letter character informs the appropriate implementation of the trie for this application.

For scoring - longer words are worth more points. And any words found by more than one player are worth nothing. Bummer.

# Letters	Score
< 3	0
3-4	1
5	2
6	3
7	5
8+	11

A dictionary trie

A trie, or prefix tree, encodes the data for the keys it contains in the edges between nodes. The key associated with a node in the tree is defined by its position in the tree and is derived by traversing the tree from root to leaves and concatenating the values from the edges traversed.

This means that nodes in the trie have children associated with keys with a common prefix. It is this property that makes the trie useful for optimising word search problems when it is used to represent the dictionary of valid words.

@dataclass
class Trie(MutableSet[Sequence[str]]):
  next: Dict[str, Trie] = field(default_factory=dict)
  ok: bool = False
  
  def add(self, key: Sequence[str]) -> None:
    ...

Node structure for trie in Python

next:
  d:
    next:
      i:
        next:
          g:
            ok: yes
      o:
        ok: yes
        next:
          g:
            ok: yes
      u:
        next:
          g:
            ok: yes
  qu:  # [1]
  
    next:
      o:
        ok: yes

Trie data describing words "do", "dog", "dig", "dug" and "quo".

A boolean ok value associated with each node marks nodes where the prefix is a valid element in the set described by the trie. This is necessary in a dictionary trie where not all prefixes are valid words. In this trie, "do", "dog", "dig" and "dug" are all valid words whereas their mutual prefix "d" is not.

The structure of the trie can be used to efficiently prune word searches by traversing its edges corresponding to the characters encountered while exploring the game space. If there is no edge corresponding to a character that would be appendend during the search, then any words prefixed by such a result can never be a solution and thus the path can be pruned.

^{[1] Note how the qu character prefixing the word quo has informed the Python implementation. There is no char type in Python, only str which also satisfies Iterable[str] . Iterating "quo" will yield only "q", "u" and "o" and never "qu" meaning that the elements of the Set represented by Trie must be Sequence[str] rather than just str as one might expect.}

Solving the game

Potential prefixes can be generated by initiating a search from each position in the game grid and traversing according to the game rules:

Adjacent letters have x and y position values +/- 1 of the current position provided that position is on the grid.
Letters (i.e. positions) may not be re-used. This can be enforced by keeping the current prefix as a sequence of positions as a property of the search cursor.

Paths representing impossible prefixes in the dictionary are pruned by keeping the trie node representing the most recent character in the path as the other property of the search cursor. A traversal is only valid if the adjacent letter is a child of the cursor trie node.

Worked example

Consider the dictionary trie

next:
  d:
    next:
      i:
        next:
          d:
            ok: yes
          e:
            ok: yes
            next:
              d:
                ok: yes

Trie data describing words "did", "die" and "died".

and 2x2 game grid

    x=0 x=1
y=0   d   i
y=1   e   d

The search is seeded with cursors representing positions where the letter is the first character of a possible prefix in the dictionary. Since all words in the dictionary begin with d only those letters on the grid are eligible giving cursors

[(0,0)] .d
[(1,1)] .d

.d is not marked as ok and so while neither cursor represents a path of a valid words the search continues from each. Note that despite referring to the same trie node, the cursors are distinct given their paths.

From both positions the only valid traversal is to the i giving

[(0,0),(1,0)] .d.i
[(1,1),(1,0)] .d.i

and again yielding no valid words.

Iterating once more yields the first valid words

[(0,0),(1,0),(1,1)] .d.i.d
[(0,0),(1,0),(0,1)] .d.i.e
[(1,1),(1,0),(0,0)] .d.i.d
[(1,1),(1,0),(0,1)] .d.i.e

and notably distinct paths to the same valid word, though this does not score additional points in the game.

One more iteration yields

[(0,0),(1,0),(0,1),(1,1)] .d.i.e.d
[(1,1),(1,0),(0,1),(0,0)] .d.i.e.d

and the conclusion of the search.

Reference algorithm

Char = str
Position = Tuple[int, int]
Path = Sequence[Position]
Cursor = Tuple[Path, Trie]


class Grid(MutableMapping[Position, Char]):
    def __init__(self, size: Tuple[int, int] = (4, 4)) -> None:
        self.size = size
        ...
    
    def adj(self, to: Position) -> Iterator[Tuple[Position, Char]]: ...  


def solve(tr: Trie, g: Grid) -> Iterator[Path]:
    # Seed search with valid prefixes starting with each letter.
    s: List[Cursor] = [  # Stack
        ((c,), u) for c in g if (u := tr.next.get(g[c])) is not None
    ]
    
    while s:
        p, u = s.pop()
        
        # Solution if is a word.
        if u.ok:
            yield p
            
        # Extend search with possible nexts from unused adjacents.
        for c, w in g.adj(p[-1]):
            if c not in p and (v := u.next.get(w)) is not None:
                s.append(((*p, c), v))

A complete implemenation is available in my pyword repository on GitHub, along with solvers for some other word search games.