if{then}else

An efficient Boggle solver using a trie

Connor Newton — Wed, 08 Mar 2023 21:36:29 GMT

I like word games - classics such as Scrabble and Boggle, and contemporary ones such as Bananagrams and Wordle are often played or discussed ewith my family and friends. Playing these games over recent months I have observed and become interested in the distinct tactics between players and whether these could be quantified and analysed.

For example, I'm not bad (*brushes shoulder*) at Boggle, thanks to an apparent gift for identifying three letter words (each worth 1 point), while my typical opponents (suck it, Sónia, Mum!) identify fewer, longer words which are arguably worth less proportional to the time taken to identify them.

Similarly my Bananagrams grids tend to be more tightly packed collections of shorter words than loose, sprawling layouts of far more interesting and impressive ones.

Perhaps, I thought, there could be patterns emerging, and if so how do these different playing styles align with what is actually available to play or score? Is it true that there aren't enough 4, 5 and 6+ words on the Boggle grid to make them worth looking for, or do I just suck at finding them?

And so in pursuit of knowledge, and more importantly advantage in future games, writing a solver for one of these games seemed like a good first step. Conveniently, the notion for such a solver had already occurred to me...

The rules of Boggle

For the uninitiated, are quite simple. Letters on the 4x4 game grid are randomized and a three minute game timer started.

Unlike a simple word search where consecutive letters of a word follow a straight line, words in Boggle can be made up of a path of letters that weaves in many directions as long as consecutive letters are adjacent (including diagonally) and no letters are used more than once.

In the English version there is a special character qu which when used counts as the two letters under the condition that both must be used and in that order. This is interesting as this two-letter character informs the appropriate implementation of the trie for this application.

For scoring - longer words are worth more points. And any words found by more than one player are worth nothing. Bummer.

# Letters	Score
< 3	0
3-4	1
5	2
6	3
7	5
8+	11

A dictionary trie

A trie, or prefix tree, encodes the data for the keys it contains in the edges between nodes. The key associated with a node in the tree is defined by its position in the tree and is derived by traversing the tree from root to leaves and concatenating the values from the edges traversed.

This means that nodes in the trie have children associated with keys with a common prefix. It is this property that makes the trie useful for optimising word search problems when it is used to represent the dictionary of valid words.

@dataclass
class Trie(MutableSet[Sequence[str]]):
  next: Dict[str, Trie] = field(default_factory=dict)
  ok: bool = False
  
  def add(self, key: Sequence[str]) -> None:
    ...

Node structure for trie in Python

next:
  d:
    next:
      i:
        next:
          g:
            ok: yes
      o:
        ok: yes
        next:
          g:
            ok: yes
      u:
        next:
          g:
            ok: yes
  qu:  # [1]
  
    next:
      o:
        ok: yes

Trie data describing words "do", "dog", "dig", "dug" and "quo".

A boolean ok value associated with each node marks nodes where the prefix is a valid element in the set described by the trie. This is necessary in a dictionary trie where not all prefixes are valid words. In this trie, "do", "dog", "dig" and "dug" are all valid words whereas their mutual prefix "d" is not.

The structure of the trie can be used to efficiently prune word searches by traversing its edges corresponding to the characters encountered while exploring the game space. If there is no edge corresponding to a character that would be appendend during the search, then any words prefixed by such a result can never be a solution and thus the path can be pruned.

^{[1] Note how the qu character prefixing the word quo has informed the Python implementation. There is no char type in Python, only str which also satisfies Iterable[str] . Iterating "quo" will yield only "q", "u" and "o" and never "qu" meaning that the elements of the Set represented by Trie must be Sequence[str] rather than just str as one might expect.}

Solving the game

Potential prefixes can be generated by initiating a search from each position in the game grid and traversing according to the game rules:

Adjacent letters have x and y position values +/- 1 of the current position provided that position is on the grid.
Letters (i.e. positions) may not be re-used. This can be enforced by keeping the current prefix as a sequence of positions as a property of the search cursor.

Paths representing impossible prefixes in the dictionary are pruned by keeping the trie node representing the most recent character in the path as the other property of the search cursor. A traversal is only valid if the adjacent letter is a child of the cursor trie node.

Worked example

Consider the dictionary trie

next:
  d:
    next:
      i:
        next:
          d:
            ok: yes
          e:
            ok: yes
            next:
              d:
                ok: yes

Trie data describing words "did", "die" and "died".

and 2x2 game grid

    x=0 x=1
y=0   d   i
y=1   e   d

The search is seeded with cursors representing positions where the letter is the first character of a possible prefix in the dictionary. Since all words in the dictionary begin with d only those letters on the grid are eligible giving cursors

[(0,0)] .d
[(1,1)] .d

.d is not marked as ok and so while neither cursor represents a path of a valid words the search continues from each. Note that despite referring to the same trie node, the cursors are distinct given their paths.

From both positions the only valid traversal is to the i giving

[(0,0),(1,0)] .d.i
[(1,1),(1,0)] .d.i

and again yielding no valid words.

Iterating once more yields the first valid words

[(0,0),(1,0),(1,1)] .d.i.d
[(0,0),(1,0),(0,1)] .d.i.e
[(1,1),(1,0),(0,0)] .d.i.d
[(1,1),(1,0),(0,1)] .d.i.e

and notably distinct paths to the same valid word, though this does not score additional points in the game.

One more iteration yields

[(0,0),(1,0),(0,1),(1,1)] .d.i.e.d
[(1,1),(1,0),(0,1),(0,0)] .d.i.e.d

and the conclusion of the search.

Reference algorithm

Char = str
Position = Tuple[int, int]
Path = Sequence[Position]
Cursor = Tuple[Path, Trie]


class Grid(MutableMapping[Position, Char]):
    def __init__(self, size: Tuple[int, int] = (4, 4)) -> None:
        self.size = size
        ...
    
    def adj(self, to: Position) -> Iterator[Tuple[Position, Char]]: ...  


def solve(tr: Trie, g: Grid) -> Iterator[Path]:
    # Seed search with valid prefixes starting with each letter.
    s: List[Cursor] = [  # Stack
        ((c,), u) for c in g if (u := tr.next.get(g[c])) is not None
    ]
    
    while s:
        p, u = s.pop()
        
        # Solution if is a word.
        if u.ok:
            yield p
            
        # Extend search with possible nexts from unused adjacents.
        for c, w in g.adj(p[-1]):
            if c not in p and (v := u.next.get(w)) is not None:
                s.append(((*p, c), v))

A complete implemenation is available in my pyword repository on GitHub, along with solvers for some other word search games.

Computing earliest-arrival paths in temporal graphs

Connor Newton — Tue, 13 Nov 2018 19:35:22 GMT

(Need a quick reference? Scroll down for the algorithm pseudocode and a Go implementation.)

Shortest path analysis, and variants thereof, is a useful tool many of us use every day, particularly with regard to making travel decisions - "If I leave now, how long will it take to drive to work?" is a question we might ask our sat. nav. having overslept slightly and waking up to a 9am meeting invite that you're pretty sure wasn't there as you left yesterday.

Despite being disguised as a time-related question ("how long will it take to get there?") this problem is more easily treated as a distance-related¹ question since the routes that are available tend not to change depending on what time it is.

Dijkstra's shortest path algorithm [1] is a famous example of a solution to such a question - it operates on a graph and works by pruning the edges until each edge that remains in the tree leads to the next node on the shortest path to the destination. This graph may then serve as a useful reference to lookup the shortest path to the destination from any node, provided that the graph from which this tree was derived does not change.

If however, like me, you prefer to take the bus instead, the question might be reframed to be something like: "If I leave, what is the fastest route to work by public transport?". At first glance this seems like it could be the same problem, but there is a key difference in that a bus departs from specific locations at specific times and once it has departed, it can not be caught. In other words, the routes that are available to take change as time progresses. A graph that describes this kind of behaviour is not 'static' - instead it is described as time variant, dynamic, or temporal.

As a result we can not apply Djikstra or other algorithms that operate on static graphs - in a temporal graph these method are not reliable. Even though the shortest path may go next via one node, delays between opportunities to traverse the edges on this path may mean that the fastest path may be via a different one. It is also possible that the shortest path might never be available when time is considered. Instead we must look to a class of algorithms which take into account the properties of temporal edges.

Edges in a temporal graph differ from their static graph counterparts by having the properties of starting time and traversal time. These are the times at which the edge is able to be traversed and how much time will have elapsed by the moment of arriving at the adjacent node. In the context of catching a bus, for example, the starting time could be the scheduled deparature time and the traversal time the expected duration to the next stop. Traversal time can be seen as analogous to edge weight in non-temporal graphs, but starting time is fundamentally different in that once it is passed the edge may never be traversed - cue flash back to the trauma of running for, and then missing, a bus by merely seconds.

Computing earliest-arrival times in temporal graphs

The authors of [2] propose an algorithm for computing the earliest-arrival time from the start node x to every other node v starting at time t_α and bounded by end time t_ω.

Here is an equivalent pseudocode for [2] (Algorithm 1):

procedure earliest(G, x, tα, tω):

	for each vertex v in G do:
		t[v] ← ∞
	t[x] ← tα
	
	for each edge e = (u, v, t, λ) in edge stream of G do:
		if t + λ ≤ tω and t ≥ t[u] then:
			if t + λ < t[v] then:
				t[v] ← t + λ
		else if t ≥ tω:
			break loop
	
	return t[v] for each v ∈ V

Let's dissect the algorithm.

for each vertex v in G do:
	t[v] ← ∞
t[x] ← tα

In other words, maintain a map of earliest-arrival at each destination nodes, t[v]. Assume that x is reachable by time t_α, so set earliest-arrival at x, t[x] = t_α.

t[v] now represents the earliest-arrival time at v. By extension, t[v] is also the earliest-departure at v, since it is not possible to depart from a node you have not yet arrived at. If t[v] = ∞, then v is not reachable from x starting at time t_α ².

for each edge e = (u, v, t, λ) in edge stream of G do:

The algorithm processes what [2] describes as the 'edge stream' representation of the graph. This is simply the set of edges in the graph ordered by starting time, t. If we process the edges in ascending time order, then it is not necessary to store any historical results - the map will be consistent for the start time of every edge.

if t + λ ≤ tω and t ≥ t[u] then:

If the edge ending time (starting time, t, plus traversal time λ), t + λ, is outside the scope of the search then ignore it.

If the edge starting time is before the earliest-arrival time of the starting node then it is impossible to reach in time to traverse it, so also ignore it.

if t + λ < t[v] then:
	t[v] ← t + λ

If the edge ending time is before the earliest-arrival time of the ending node, then a new earliest-arrival time has been found.

else if t ≥ tω:
	break loop

Given that we're processing edges in ascending time order, this is just a short-circuit to avoid processing/ignoring the remainder of the edge stream.

return t[v] for each v ∈ V

After all edges have been processed then the resulting map, t[v] is the map of earliest-arrival time at v, starting at u at time t_α.

Computing earliest-arrival paths in temporal graphs

In most applications where knowing the earliest-arrival time is useful, knowing the path taken to realise that arrival-time is also required. [2], however, does not describe steps for constructing the earliest-arrival path.

Constructing the earliest-arrival path is possible with no modifications to the algorithm by maintaining the earliest-arrival path alongside the earliest-arrival time for each destination node.

Here is an extended pseudocode including earliest-arrival path construction:

procedure earliest(G, x, tα, tω):

	for each vertex v in G do:
		t[v] ← ∞
	t[x] ← tα
	p[x] ← {} // Empty list
	
	for each edge e = (u, v, t, λ) in edge stream of G do:
		if t + λ ≤ tω and t ≥ t[u] then:
			if t + λ < t[v] then:
				t[v] ← t + λ
				p[v] ← append(p[u], u)
		else if t ≥ tω:
			break loop
	
	return t[v], append(p[v], v) for each v ∈ V

And a brief explanation of why this works:

p[x] ← {} // Empty list

p[x] represents the nodes prior to x in the earliest-arrival path. Since x is the starting node, no nodes have been visited beforehand to get to x.

if t + λ < t[v] then:
	p[v] ← append(p[u], u)

Every time a new earliest-arrival time for v is discovered, we know that time t[v] is the edge ending time t + λ. It follows that the new earliest-arrival path preceding v, p[v] is the earliest-arrival path preceding u, p[u] plus u itself.

return append(p[v], v) for each v ∈ V

It should be clear that the complete earliest-arrival path from x to v is the earliest-arrival path preceding v, p[v] plus v itself.

Computing earliest-arrival paths in temporal graphs - a Go implementation

The following is an implementation in Go, the structure of which is inspired by similar algorithms from the excellent Gonum scientific and numerical library [3].

package path

import (
	"gonum.org/v1/gonum/graph"
	"gonum.org/v1/gonum/graph/temporal"
)

type Earliest struct {
	from  graph.Node
	at    uint64
	until uint64
	nodes map[int64]struct {
		earliest uint64
		via      []int64
	}
}

func (e *Earliest) set(v graph.Node, t uint64, p []int64) {
	e.nodes[v.ID()] = struct {
		earliest uint64
		via      []int64
	}{
		t,
		p,
	}
}

func EarliestArrivalFrom(g graph.TemporalStream, from graph.Node, at uint64, until uint64) Earliest {
	earliest := Earliest{
		from:  from,
		at:    at,
		until: until,
		nodes: make(map[int64]struct {
			earliest uint64
			via      []int64
		}),
	}
	earliest.set(from, at, []int64{})
	s := g.LineStream()
	for s.Next() {
		l := s.TemporalLine()
		u := l.From()
		uid := u.ID()
		eu, ok := earliest.nodes[uid]
		tl := l.At()
		dtl := l.Duration()
		if !ok {
			continue
		}
		if tl+dtl <= until && tl >= eu.earliest {
			v := l.To()
			vid := v.ID()
			ev, ok := earliest.nodes[vid]
			if !ok || tl+dtl < ev.earliest {
				earliest.set(v, tl+dtl, append(eu.via, uid))
			}
		} else if tl >= until {
			break
		}
	}
	return earliest
}

¹ Here 'distance' refers to the time-distance cost of traversing a length of road with a particular speed limit.

² I recommend avoiding floating-point representations of time (ain't nobody got time for debugging issues due to rounding errors), so get creative with representing infinity in the language of your choice e.g. in my Go implementation I use an unsigned type to represent discrete time and consider v not in map[v]uint to mean infinity.

[1] Djikstra's shortest path algorithm

[2] Wu et al, Path Problems in Temporal Graphs

[3] Gonum (github.com)