Wordle inferred from Twitter shared results.

This repository started exclusively as an effort to 'solve' the daily wordle with only shared scores on Twitter. However, I have also analyzed the Most Popular Wordle Openers Analysis using Twitter data.

The Notebooks:

More details on Solving Wordle with tweets in the Jupyter Notetbooks:

Predict from Tweets

Create Lookup Dictionaries

Discussion

The TwitterWordle class here can predict the wordle of the day only from public tweets. This is an alternative and maybe a little simpler approach than this excellent Ben Hamner's Kaggle project.

I can't decide if this is the coolest way to "solve" wordle or the dumbest. I do think it's a fun problem to try and extract a signal from what seems like noisy data.

The code now, by default, runs in a mode where it doesn't reveal the answer, it should print a SHA256 hash of the answer and compare it to a hashed answer dictionary to verify if it's correct (assuming the NY Times doesn't change the deterministic pre-shuffled order.) Similarly the plot, by default, will has the words on the x-axis.

Storing tweets is complicated, so best to download the data from kaggle, I'm not hosting any tweets in this repository. I have a helper function that uses searchtweets to download, if you have a Developer account and API keys.

Major differences from above Kaggle notebook:

No simulations of hypothetical games. I do use a similar word commonality lookup dictionary, and the same word frequency data.
No cosine similarity or comparison of specific (e.g. penultimate) guess. Only the list, and to some extent count, of all tweeted wordle score lines is needed.
Slightly different filtering of bad tweets Upon further reflection, the code now rejects obviously bad tweets. The kaggle data set does some light filtering. In general, using a minimum count threshold mostly eliminates a few fake or spurious scores that were posted, and the penalty term is small enough that even if the data set has a bunch of retweets of a tetris pattern, the algorithm still converged accurately. However, non-English wordles or people tweeting out multiple variants at once are both easy to detect so I'm now removing these.
100% Accuracy This algorithm has 100% accuracy from Wordles 210-233 (the original project initially failed on 223, it was successful on later reruns after some fixes. It also failed on 231,236,and 249.) With the restricted target list, TwitterWordle is 100% accurate (so far).
- Note: Starting on Feb 15, the NY Times removed a few target words from the official wordle list, such as 'papal' and 'agora.' Some people continue to tweet results from the unaltered list, presumably cached/saved versions. So, on Wordle 247, using the full 12,000+ dictionary, TwitterWordle did fail to solve correctly. However, the default mode is using the smaller target dictionary.
By default, the code only considers the known 2315 possible wordles. The Kaggle project doesn't give the wordle list special treatment, and runs simulations considering all 12K words as possible answers. While my wordlebot has rolled its own dictionary, I used the actual wordle list here.
- Use the keyword argument use_limited_targets = False to load precomputed dictionaries across the full 12972 word list, and the Create Lookup dictionary notebook can generate this larger set of dictionaries.
- The code still solves with (almost, see above) 100% accuracy using all 12K+ words, I solve the dataframe using the full list in this notebook at the end.
- As of Feb 15, 2021 I now use a revised dictionary after changes made by the New York Times.

The solve method of the main class returns an image of the top candidate scores, here is Wordle 223:

The Notebooks:

Predict from Tweets

Create Lookup Dictionaries

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.gitignore		.gitignore
Create Lookup dictionary.ipynb		Create Lookup dictionary.ipynb
LICENSE		LICENSE
Popular Wordle Openers.ipynb		Popular Wordle Openers.ipynb
Predict with Tweets.ipynb		Predict with Tweets.ipynb
README.md		README.md
TwitterWordle.py		TwitterWordle.py
_quarto.yml		_quarto.yml
dontlook.txt		dontlook.txt
first_guess_dashboard.py		first_guess_dashboard.py
first_word.py		first_word.py
get_tweets.py		get_tweets.py
hashed_lookup.json		hashed_lookup.json
hashed_lookup2.json		hashed_lookup2.json
helper.py		helper.py
load_database.py		load_database.py
nyt_opener_stats.png		nyt_opener_stats.png
openers.qmd		openers.qmd
tweet_script.py		tweet_script.py
unigram_freq.csv		unigram_freq.csv
wordle-all_2022-02-15.txt		wordle-all_2022-02-15.txt
wordle-dictionary-full.txt		wordle-dictionary-full.txt
wordle-targets_2022-02-15.txt		wordle-targets_2022-02-15.txt
wordletop5-2-superJumbo-v2.png		wordletop5-2-superJumbo-v2.png
zipped_counters2.pickle		zipped_counters2.pickle
zipped_counters_allwords.pickle		zipped_counters_allwords.pickle
zipped_counters_allwords_nyt.pickle		zipped_counters_allwords_nyt.pickle
zipped_counters_nyt_2022_02_15.json		zipped_counters_nyt_2022_02_15.json

License

astrowonk/TwitterWordle

Folders and files

Latest commit

History

Repository files navigation

Wordle inferred from Twitter shared results.

The Notebooks:

Discussion

The Notebooks:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages