Skip to content

bryant1410/readmesfix

Repository files navigation

Fix GitHub's Markdown headings

Because I'm tired of running into broken READMEs!

GitHub changed the way ATX headers are parsed in Markdown files. This caused many repos' READMEs to have their headings suddenly broken, and albeit time have passed, many are still broken.

vmarkovtsev created a dataset (CC BY-NC 4.0) containing the repos with more than 50 stars that contain READMEs broken in this way. So I created this script to iterate through the list and create a PR to fix each of them.

Set up

Caution: this is an automated script to create Pull Requests. Please be cautious to avoid creating spam with it.

The script works on Python 3.6+. To install its dependencies:

pip install -r requirements.txt

To run it, you first need to configure a Personal Access Token with repo:public_repo scope to be able to fork projects and to create pull requests. Then:

export GITHUB_ACCESS_TOKEN=<YOUR ACCESS TOKEN>
./readmesfix.py

It will start processing each repo in the file (one by line) by cloning it, finding its Markdown files, checking if they should be fixed, forking them and creating a pull request. Take into account GitHub API rate limiting, so avoid overwhelming it by making the script much faster.

To select a different dataset than top_broken.tsv:

./readmesfix.py --dataset dataset_file

Testing

To test this script:

python -m unittest discover