games

Solving Wordle with Algorithms

Over the past three weeks, I spent a lot of time with Wordle. Most of this time was spent figuring out different ways to model the game, both conceptually and with code. I explored how it varies from other strategy games, like chess, and what you are doing when you are solving Wordle.

In the process, I wrote a number of different algorithms for solving it, some more human and others less so, and I want to share it with you.

If you don’t already know, Wordle is a 5-letter English-word guessing game where players attempt to uncover a 5-letter word in 6 guesses or fewer. After each guess, the game provides feedback for each letter using a color-coded system:

  • Green indicates the letter is in the target word in the same position.

  • Yellow indicates the letter is in the target word but another position.

  • Gray indicates the letter is not in the target word at all.

Here’s an example from a recent game that played out in 4 guesses.

I am not on social media so I was very late to learn about this game. It had its fifteen minutes last year, but it still has a lot of players who play it regularly. It’s a wonderfully simple game with a great design that guards against player fatigue by limiting games to one puzzle per day.

Wordle doesn’t interest me that much as a player. I don’t really like word games but I do like language and I really like data. So, I quickly became engrossed in the data behind Wordle. I learned a lot of interesting things about this 5-letter subspace of the English language. For example: on average, after your first guess you have reduced the possible answers by a factor of 25! One guess filters out 96% of words on average. How crazy is that?

Here’s another one. Using a stable algorithm there is no way to solve the following 5 words in under 5 guesses without getting lucky: EATER, PENAL, STILL, TRUSS, and UNTIL. All other words can be solved (using only skill) in 4 guesses or under with the majority in only 3.

How it Differs from Strategy Games

If you know anything about chess, you know that computers — and good players — project several moves in advance, running through all the possible countermoves to find the optimal move in any given turn. This is possible because chess is a game where the objective never changes. The players don’t have to ask the chess board for feedback on the state of the game after each move. It’s readily apparent.

Wordle is different. Players do have to rely on the game to provide feedback after each move. This means that a chess-inspired solution that goes into the future and backtracks until it finds the optimal move is not a legitimate strategy for solving Wordle.

Programmers talk a lot about partitioning when they talk about games like Wordle. Partitioning is taking a set of something and reducing it down into non-overlapping subsets. In this case, those subsets would be can’t be and might be.

How this is done in Wordle is based on the feedback you get from the gameplay. For all the previously identified green, yellow, and gray letters you could assess each possible remaining 5-letter word:

  • Does it have green letters in the right spot? If so, put it in might be.

  • Does it have yellow letters AND are they in new positions? If so, put it in might be. (This is crucial because yellow letters don’t just tell you which letters are in the word, they tell you where they aren’t positioned.)

  • Does it have gray letters? If so, put it in can’t be.

This is good for computers but humans don’t work like this, at least not most of us. Partitioning is exclusionary, but the way humans solve Wordle is more about imagination. We are more likely to think:

What words could possibly work here?

Unfortunately, there’s no scenario where imagination is going to beat exclusionary tactics in a game of Wordle. In chess, computer-assisted humans are still the best chess teams, but this will never be true for Wordle.

In programming an algorithm to solve Wordle, I found myself conflicted about this. I knew the best way to do it is by following exclusionary tactics, but I also wanted to see what an imaginary algorithm might look like. Clearly, it’s impossible to have a computer think like a human, or imagine like a human, but I was hoping I could at least take some cues from the way we humans approach a game like Wordle.

What You Should Know About Wordle’s Lexicons

When I first started this obsession, I loaded up a dictionary of words that all modern personal computers have tucked away in the filesystem somewhere and I filtered that list down to only the 5-letter words. Soon after, however, I realized that due to some clever web-sniffing, other programmers had already collected all the valid Wordle words, even going into the future!

I’ll spare you the details, but apparently the programmers of Wordle didn’t think it was important to try to keep the future words a secret, and so the game downloads a long list of words, and decides what the current word is based on the player’s local date.

The Wordle lexicon is bifurcated between valid answers and valid guesses. The first list includes possible answers. It is 2315 words long and includes commonly used 5-letter words. The second list is 10637 words long and only includes words that are valid as guesses due to their obscurity (I’m looking at you, “ycond”). Altogether, that means that 12952 words are available for guessing up until the answer when only 2315 words are possible.

With these two word sets, I wrote one solver to solve Wordle inclusively, more like a human does, and another solver to solve it exclusively, more like a chess-solver.

The First Guess Makes a Remarkable Difference

Whether you are human or machine, the success of the game hinges on the quality of the first guess. The first guess is always blind, so you might as well start with the same word every time. Deciding what that word is, however, has created more than a little disagreement in the corners of the web where Wordlers lurk.

Many think SALET is the best starting word. The NYTimes’ Wordle Bot purports CRANE to be the best, which I find dubious and you’ll see why soon.

There are several ways to figure this out. The two that seemed the most distinct to me are to sort words based on their letter frequency and to sort words based on how well they partition the answer list. The first method is easier to grasp and more natural to the way humans think. The second is a little more obscure but I promise it’s not hard to understand.

Sorting Words Based on Letter Frequency

When looking at letter frequency, it’s not sufficient to think about relative letter frequency across all words. All that matters is letter frequency across 5-letter words, and more specifically, Wordle’s 2315 answers. Going further, the frequency of letters in specific positions is more important than letter frequency overall.

I crunched the numbers — and letters — and here’s what I found:

  • S is the most common starting letter, followed by C and B.

  • A is the most common 2nd letter followed by O and R.

  • A is also the most common 3rd letter followed by I and O.

  • E is the most common 4th letter followed by N and S.

  • E is also the most common 5th letter followed by Y and T.

From this, we can assign a word score for all the words in both answer sets, and here are the fifty highest scoring words in order:

SAINE, SOARE, SAICE, SLANE, SOILY, SLATE, SOAVE, SAMEY, SAUCE, SLICE, SHALE, SAVEY, SAUTE, SHARE, SOUCE, SHINE, SUITE, CRANE, SEITY, SLATY, SAINT, SLADE, SOAPY, SLIER, SONCE, SHONE, STANE, BRANE, SALET, SLAKE, SHIRE, SAURY, SAUCY, COATE, SEAMY, SLAVE, SHALY, SPANE, SHITE, SANER, CRINE, SNARE, STALE, CRATE, SARGE, SHORE, SUAVE, SOOEY, CRISE, SLIDE

According to this approach, the best possible starting word is SAINE.

Sorting Words Based on Their Partitioning Rate

There is an entire field in math dedicated to problems like this and it got its start in the mid 1900s with the most important masters thesis ever written. That thesis was written by Claude Shannon and it gave birth to the information age. That field is called information theory.

I’m not going to overload you with obscurity here, but I can tell you that as it relates to Wordle, the information contained in each guess is a measure of how much it reduces the total possible answer set.

If a guess partitions the possible answers to half the size it was before, then it is said to have 1-bit of information. If it partitions the answer set by a factor of 4, it’s said to have 2-bits (because 2²=4), and so on.

On average, starting words have around 4.5 bits of information which is pretty astounding when you think about it. This means that on average a single 5-letter word reduces the possible answers to a twenty-fifth of what it was to start! (2^4.5 ≅25). That leaves only 92 words on average!

The best possible starting word then is the single word that reduces the answer set, on average, the most. Depending on the target word, some starting words are going to perform better than others but we can’t know this ahead of time. The best bet is to go with the word that has the best average partitioning rate.

To find that word, I ran through all the possible starting words (12952 words from both word sets) and applied them to each of the answer words (2315 words), measuring how much that starting guess reduced the remaining answers based on green, yellow and gray feedback.

All Wordle words sorted by their first-guess partitioning rates.

Allow me to introduce you to the word with the highest partitioning rate: ROATE.

On average, ROATE has 5.3 bits of information. This means it will reduce the remaining answers by a factor of 38 or 97.4%. Wow, ROATE, that’s amazing!

On average it will filter out 2255 words, leaving just 60 possible answers. At its best, it can filter a few target words down to just one answer (but all words can do this for at least 1 other word). At worst, when the target word is SHINY, it still reduces down to 195 words. That’s 3.6 bits of information and a reduction factor of 11.87.

In case you’re wondering, the word with the worst partitioning rate…the word you should never consider starting with (and I doubt you would), is IMMIX. With only 3 unique letters this word on average has just a little more than 1-bit of information and leaves 969 words remaining. That’s borderline useless. Giving credit where credit is due if the target word is AXIOM, then IMMIX filters out all other words immediately. At least it has that going for it.

So, what should I start with? Should I start with SAINE or should I go with ROATE?

Well, don’t hate me, but it turns out that neither of these words is the best starting word when you consider the efficiency of the remaining guesses.

They both do well for starting guesses, but subsequent guesses based on remaining available letters are decidedly less efficient than starting with words like SALET and SLATE. Both of these words are high in both letter frequency (#6 and #29 respectively) and on partitioning rate (#23 and #27 respectively). Of the top 50 in both relative letter frequency and partitioning, they performed the best in my computations.

I mentioned earlier that the NYTimes’ Wordle Bot declares CRANE to be the best starting word. CRANE underperformed on both dimensions and in my final word path computations, so I can’t be sure why they think that’s the optimal starting word. It might have something to do with the fact that Wordle Bot only uses around 4500 words to reduce front-end computational demands.

From here on, we will move forward with SLATE because in the computations I’ve done, SALET led to fewer words solved in the highest number of guesses (6), but SLATE resulted in a lower overall average guess rate (3.676).

After the First Guess

After reducing the 2315 answer possibilities to 71 on average there are two different ways to proceed.

First, we can continue guessing letters we haven’t yet guessed until the count of remaining answers has been reduced to 1. This is a pretty good way to win in 6 guesses, but it’s not guaranteed, and it’s certainly not the best way to win in the fewest number of guesses because it ignores data hidden in the remaining answers.

By going this route there is a 25% chance of solving the puzzle in 3 guesses and an 80% chance of solving in 4 guesses. 10 words won’t be solved at all and 57 take the full 6 guesses. Here’s the distribution in this method.

A more sophisticated way to proceed involves collecting the letters in the remaining answers and then subtracting out all the letters we have guessed already. This provides us with a list of letters we should target in subsequent guesses to continue reducing the answer list as quickly as possible.

This is more focused than if we simply target letters we haven’t guessed yet. We gather that list, and we weight each letter based on how many words in the remaining answers it appears in, and then we find a word that has the highest score based on those individual letter weights.

Again, we’re asking an inclusive question: what words can we come up with that use the most number of these letters?

To make this clearer, let’s look at the target word HOUND.

  • After guessing SLATE there are 0 green, 0 yellow, and 5 gray letters, which leaves 221 words.

  • After guessing CRONY there is 1 green {N}, 1 yellow {O}, and 3 more gray letters {C,R,Y}, which leaves 9 words: BOUND, POUND, FOUND, DOING, MOUND, GOING, WOUND, HOUND, OWING.

As you can see this presents a problem for winning the game. 6 of those words have the same last 4 letters and different starting letters. If we try to guess our way through this, our partitioning rate is somewhere around 12% which is roughly a fourth of a bit of information in each guess. It’s awful.

By gathering the letters we can see that there are 7 Ds, 6 Us, 3 Gs, 3 Is, 2 Ws, 2 Hs, 1 B, 1 F, 1 P, and 1 M. If we switch our strategy to primarily targeting these letters, even if we have to waste a space on a previously used letter, then our partitioning rate goes way up.

The word HUMID in this case has all 5 of these letters and a partitioning rate of 89% (just over 3-bits of information) which is all we need to get down to the last word: HOUND.

By going this route, on average, we have a 40% chance of solving the word in 3 guesses and a 90% chance of solving it in 4. All puzzles are winnable and only 12 words require the full 6 guesses. The overall average of this method is 3.676 guesses per word.

Hopefully, the histograms here make it pretty clear that the second method has significantly brought down the upper end of the distribution.

Finding a Lower Bound on What is Possible

After going as far as I could with this inclusive-inspired model, I wanted to know how the results from a chess-solver solution would turn out. To determine this optimal word path for all 2315 answers, I implemented an algorithm that searches through every possible branch at each step and returns the shortest path to the target word.

However, I had to account for luck otherwise the shortest word path would be just guessing the target word in 1 guess every time and that would be no fun. But how do we avoid getting lucky?

To do this, I decided a valid solution must conform to the following conditions:

  1. It will start with the same word every time, in this case, SLATE.

  2. It will never land on the correct answer prematurely. This method will not guess a word from the answer list until that list has been pruned down to only 1 possible answer. We only want to guess the answer once we’ve partitioned out every other possibility. Our final guess should be 100% certain.

These stringent conditions work great for all but 8 words. For words like TRUSS and TRUST, the available guess words will run out before the answer list gets to 1. This is natural when you recognize that both words are spelled with the same 4 letters. At some point, guessing new words doesn’t help and it has to pick one of the remaining two words. Other word pairs like this are PANEL /PENAL, STILL /STILT, and UNLIT / UNTIL.

Here are the results:

An overwhelming number of words can be solved with just 3 guesses. The average number of guesses is 3.003. While I have done my best to account for lucky answers, I will admit that each choice made between the first guess and the final answer is not exactly statistically optimal based on letter frequency or partitioning rates.

Nevertheless, it’s pretty astounding for appreciating how short the word paths of most words are.

Let’s look at an example using the target word MEALY which has a path of SLATE>GORMY>MEALY.

  • After guessing SLATE there is 1 green {A}, 3 yellow {E,L} and 2 gray letters {M,Y} . This one guess leaves only 8 possible answers. 2307 words are gone just like that!

  • After guessing GORMY there are 1 more green {Y}, 1 more yellow {M}, and 3 more grays {G,O,R}. This leaves only 1 possible answer: MEALY.

OK, that was an easy one. Let’s try the worst target for SLATE in terms of 1st guess partitioning. That would be CRONY.

  • After guessing SLATE there are 0 green, 0 yellow, and 5 gray letters{C,N,O,R,Y}. There are still 221 words remaining, which is not great.

  • The next guess would be YCORD, for which there are 2 green {O,N}, 2 yellow {C,Y}, and 1 gray letters {R}. That’s all that is needed to filter the answers down to CRONY.

For the 5 words that require 5 guesses, here are their word paths:

  • SLATE>BUMPH>ZINCO>WATER>EATER

  • SLATE>DHIKR>FOGOU>PANEL>PENAL

  • SLATE>DHOBI>MUZZY>SPILT>STILL

  • SLATE>YUCCH>POORI>TRUST>TRUSS

  • SLATE>VUGGY>HUMBO>UNLIT>UNTIL

All the insights in this post were derived from my work on the following projects: