Skip to content

Commit

Permalink
Refresh check-spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
jsoref committed Mar 21, 2024
1 parent e0c9f4a commit f8f73e2
Show file tree
Hide file tree
Showing 8 changed files with 272 additions and 1,663 deletions.
1 change: 1 addition & 0 deletions .github/actions/spelling/README.md
Expand Up @@ -12,6 +12,7 @@ File | Purpose | Format | Info
[line_forbidden.patterns](line_forbidden.patterns) | Patterns to flag in checked lines | perl regular expression (order matters, first match wins) | [patterns](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-patterns)
[expect.txt](expect.txt) | Expected words that aren't in the dictionary | one word per line (sorted, alphabetically) | [expect](https://github.com/check-spelling/check-spelling/wiki/Configuration#expect)
[advice.md](advice.md) | Supplement for GitHub comment when unrecognized words are found | GitHub Markdown | [advice](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-advice)
[block-delimiters.list](block-delimiters.list) | Define block begin/end markers to ignore lines of text | line with _literal string_ for **start** followed by line with _literal string_ for **end** | [block ignore](https://github.com/check-spelling/check-spelling/wiki/Feature%3A-Block-Ignore#status)

Note: you can replace any of these files with a directory by the same name (minus the suffix)
and then include multiple files inside that directory (with that suffix) to merge multiple files together.
15 changes: 15 additions & 0 deletions .github/actions/spelling/block-delimiters.list
@@ -0,0 +1,15 @@
# Public Keys
-----BEGIN PUBLIC KEY-----
-----END PUBLIC KEY-----

# RSA Private Key
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----

# GPG Public Key
-----BEGIN PGP PUBLIC KEY BLOCK-----
-----END PGP PUBLIC KEY BLOCK-----

# Certificates
-----BEGIN CERTIFICATE-----
-
96 changes: 66 additions & 30 deletions .github/actions/spelling/candidate.patterns
Expand Up @@ -8,7 +8,7 @@
^.*\b[Cc][Ss][Pp][Ee][Ll]{2}:\s*[Dd][Ii][Ss][Aa][Bb][Ll][Ee]-[Ll][Ii][Nn][Ee]\b

# patch hunk comments
^\@\@ -\d+(?:,\d+|) \+\d+(?:,\d+|) \@\@ .*
^@@ -\d+(?:,\d+|) \+\d+(?:,\d+|) @@ .*
# git index header
index (?:[0-9a-z]{7,40},|)[0-9a-z]{7,40}\.\.[0-9a-z]{7,40}

Expand All @@ -26,7 +26,7 @@ index (?:[0-9a-z]{7,40},|)[0-9a-z]{7,40}\.\.[0-9a-z]{7,40}
# data url in quotes
([`'"])data:(?:[^ `'"].*?|)(?:[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,}).*\g{-1}
# data url
data:[-a-zA-Z=;:/0-9+]*,\S*
\bdata:[-a-zA-Z=;:/0-9+]*,\S*

# https/http/file urls
(?:\b(?:https?|ftp|file)://)[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]
Expand Down Expand Up @@ -74,7 +74,7 @@ magnet:[?=:\w]+
# AWS SNS
\bsns\.[-0-9a-z]+.amazonaws\.com/[-\w/&#%_?:=]*
# AWS VPC
#vpc-\w+
vpc-\w+

# While you could try to match `http://` and `https://` by using `s?` in `https?://`, sometimes there
# YouTube url
Expand Down Expand Up @@ -152,6 +152,9 @@ themes\.googleusercontent\.com/static/fonts/[^/\s"]+/v\d+/[^.]+.
# GHSA
GHSA(?:-[0-9a-z]{4}){3}

# GitHub actions
\buses:\s+[-\w.]+/[-\w./]+@[-\w.]+

# GitLab commit
\bgitlab\.[^/\s"]*/\S+/\S+/commit/[0-9a-f]{7,16}#[0-9a-f]{40}\b
# GitLab merge requests
Expand Down Expand Up @@ -210,7 +213,7 @@ accounts\.binance\.com/[a-z/]*oauth/authorize\?[-0-9a-zA-Z&%]*
# medium link
\blink\.medium\.com/[a-zA-Z0-9]+
# medium
\bmedium\.com/\@?[^/\s"]+/[-\w]+
\bmedium\.com/@?[^/\s"]+/[-\w]+

# microsoft
\b(?:https?://|)(?:(?:download\.visualstudio|docs|msdn2?|research)\.microsoft|blogs\.msdn)\.com/[-_a-zA-Z0-9()=./%]*
Expand Down Expand Up @@ -275,7 +278,7 @@ slack://[a-zA-Z0-9?&=]+
[0-9a-f]{32}\@o\d+\.ingest\.sentry\.io\b

# Twitter markdown
\[\@[^[/\]:]*?\]\(https://twitter.com/[^/\s"')]*(?:/status/\d+(?:\?[-_0-9a-zA-Z&=]*|)|)\)
\[@[^[/\]:]*?\]\(https://twitter.com/[^/\s"')]*(?:/status/\d+(?:\?[-_0-9a-zA-Z&=]*|)|)\)
# Twitter hashtag
\btwitter\.com/hashtag/[\w?_=&]*
# Twitter status
Expand Down Expand Up @@ -330,7 +333,7 @@ ipfs://[0-9a-zA-Z]{3,}
[^"\s]+/gitweb/\S+;h=[0-9a-f]+

# HyperKitty lists
/archives/list/[^@/]+\@[^/\s"]*/message/[^/\s"]*/
/archives/list/[^@/]+@[^/\s"]*/message/[^/\s"]*/

# lists
/thread\.html/[^"\s]+
Expand All @@ -348,7 +351,7 @@ ipfs://[0-9a-zA-Z]{3,}
\bopen\.spotify\.com/embed/playlist/\w+

# Mastodon
\bmastodon\.[-a-z.]*/(?:media/|\@)[?&=0-9a-zA-Z_]*
\bmastodon\.[-a-z.]*/(?:media/|@)[?&=0-9a-zA-Z_]*

# scastie
\bscastie\.scala-lang\.org/[^/]+/\w+
Expand Down Expand Up @@ -390,13 +393,13 @@ ipfs://[0-9a-zA-Z]{3,}
(?:\\(?:u00|x)1[Bb]|\x1b|\\u\{1[Bb]\})\[\d+(?:;\d+|)m

# URL escaped characters
\%[0-9A-F][A-F](?=[A-Za-z])
%[0-9A-F][A-F](?=[A-Za-z])
# lower URL escaped characters
\%[0-9a-f][a-f](?=[a-z]{2,})
%[0-9a-f][a-f](?=[a-z]{2,})
# IPv6
\b(?:[0-9a-fA-F]{0,4}:){3,7}[0-9a-fA-F]{0,4}\b
# c99 hex digits (not the full format, just one I've seen)
#0x[0-9a-fA-F](?:\.[0-9a-fA-F]*|)[pP]
0x[0-9a-fA-F](?:\.[0-9a-fA-F]*|)[pP]
# Punycode
\bxn--[-0-9a-z]+
# sha
Expand All @@ -423,7 +426,7 @@ sha\d+:[0-9]*[a-f]{3,}[0-9a-f]*
# uuid:
\b[0-9a-fA-F]{8}-(?:[0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}\b
# hex digits including css/html color classes:
(?:[\\0][xX]|\\u|[uU]\+|#x?|\%23)[0-9_a-fA-FgGrR]*?[a-fA-FgGrR]{2,}[0-9_a-fA-FgGrR]*(?:[uUlL]{0,3}|[iu]\d+)\b
(?:[\\0][xX]|\\u|[uU]\+|#x?|%23)[0-9_a-fA-FgGrR]*?[a-fA-FgGrR]{2,}[0-9_a-fA-FgGrR]*(?:[uUlL]{0,3}|[iu]\d+)\b
# integrity
integrity=(['"])(?:\s*sha\d+-[-a-zA-Z=;:/0-9+]{40,})+\g{-1}

Expand All @@ -441,7 +444,7 @@ integrity=(['"])(?:\s*sha\d+-[-a-zA-Z=;:/0-9+]{40,})+\g{-1}
Name\[[^\]]+\]=.*

# IServiceProvider / isAThing
#\b(?:I|isA)(?=(?:[A-Z][a-z]{2,})+\b)
\b(?:I|isA)(?=(?:[A-Z][a-z]{2,})+(?:[A-Z]|\b))

# crypt
(['"])\$2[ayb]\$.{56}\g{-1}
Expand All @@ -455,18 +458,25 @@ Name\[[^\]]+\]=.*
# scala modules
("[^"]+"\s*%%?\s*){2,3}"[^"]+"

# Intel intrinsics
_mm_\w+

# Input to GitHub JSON
content: (['"])[-a-zA-Z=;:/0-9+]*=\g{-1}

# This does not cover multiline strings, if your repository has them,
# you'll want to remove the `(?=.*?")` suffix.
# The `(?=.*?")` suffix should limit the false positives rate
# printf
%(?:(?:(?:hh?|ll?|[jzt])?[diuoxn]|l?[cs]|L?[fega]|p)(?=[a-z]{2,})|(?:X|L?[FEGA]|p)(?=[a-zA-Z]{2,}))(?=[_a-zA-Z]+\b)(?!%)(?=.*?['"])
%(?:(?:(?:hh?|ll?|[jzt])?[diuoxn]|l?[cs]|L?[fega]|p)(?=[a-z]{2,})|(?:X|L?[FEGA]|p)(?=[a-zA-Z]{2,}))(?!%)(?=[_a-zA-Z]+(?!%)\b)(?=.*?['"])

# Alternative printf
# %s
%(?:s(?=[a-z]{2,}))(?!%)(?=[_a-zA-Z]+(?!%)\b)(?=.*?['"])

# Python string prefix / binary prefix
# Note that there's a high false positive rate, remove the `?=` and search for the regex to see if the matches seem like reasonable strings
(?<!')\b(?:B|BR|Br|F|FR|Fr|R|RB|RF|Rb|Rf|U|UR|Ur|b|bR|br|f|fR|fr|r|rB|rF|rb|rf|u|uR|ur)'(?=[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})
(?<!['"])\b(?:B|BR|Br|F|FR|Fr|R|RB|RF|Rb|Rf|U|UR|Ur|b|bR|br|f|fR|fr|r|rB|rF|rb|rf|u|uR|ur)['"](?=[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})

# Regular expressions for (P|p)assword
\([A-Z]\|[a-z]\)[a-z]+
Expand All @@ -483,7 +493,7 @@ content: (['"])[-a-zA-Z=;:/0-9+]*=\g{-1}
# javascript replace regex
\.replace\(/[^/\s"]{3,}/[gim]*\s*,
# assign regex
= /[^*]*?(?:[a-z]{3,}|[A-Z]{3,}|[A-Z][a-z]{2,}).*/
= /[^*].*?(?:[a-z]{3,}|[A-Z]{3,}|[A-Z][a-z]{2,}).*/[gi]?(?=\W|$)
# perl regex test
[!=]~ (?:/.*/|m\{.*?\}|m<.*?>|m([|!/@#,;']).*?\g{-1})

Expand All @@ -493,6 +503,9 @@ content: (['"])[-a-zA-Z=;:/0-9+]*=\g{-1}
# perl run
perl(?:\s+-[a-zA-Z]\w*)+

# C network byte conversions
(?:\d|\bh)to(?!ken)(?=[a-z])|to(?=[adhiklpun]\()

# Go regular expressions
regexp?\.MustCompile\(`[^`]*`\)

Expand All @@ -506,7 +519,7 @@ regexp?\.MustCompile\(`[^`]*`\)
sed 's/(?:[^/]*?[a-zA-Z]{3,}[^/]*?/){2}

# node packages
(["'])\@[^/'" ]+/[^/'" ]+\g{-1}
(["'])@[^/'" ]+/[^/'" ]+\g{-1}

# go install
go install(?:\s+[a-z]+\.[-@\w/.]+)+
Expand Down Expand Up @@ -542,17 +555,20 @@ customObjectInstantitationMethod
\.fa-[-a-z0-9]+

# bearer auth
(['"])Bear[e][r] .*?\g{-1}
(['"])[Bb]ear[e][r] .*?\g{-1}

# bearer auth
\b[Bb]ear[e][r]:? [-a-zA-Z=;:/0-9+.]+

# basic auth
(['"])Basic [-a-zA-Z=;:/0-9+]{3,}\g{-1}
(['"])[Bb]asic [-a-zA-Z=;:/0-9+]{3,}\g{-1}

# base64 encoded content
([`'"])[-a-zA-Z=;:/0-9+]+=\g{-1}
#([`'"])[-a-zA-Z=;:/0-9+]{3,}=\g{-1}
# base64 encoded content in xml/sgml
>[-a-zA-Z=;:/0-9+]+=</
#>[-a-zA-Z=;:/0-9+]{3,}=</
# base64 encoded content, possibly wrapped in mime
(?:^|[\s=;:?])[-a-zA-Z=;:/0-9+]{50,}(?:[\s=;:?]|$)
#(?:^|[\s=;:?])[-a-zA-Z=;:/0-9+]{50,}(?:[\s=;:?]|$)

# encoded-word
=\?[-a-zA-Z0-9"*%]+\?[BQ]\?[^?]{0,75}\?=
Expand All @@ -566,14 +582,14 @@ customObjectInstantitationMethod
# systemd mode
systemd.*?running in system mode \([-+].*\)$

# Lorem
# Update Lorem based on your content (requires `ge` and `w` from https://github.com/jsoref/spelling; and `review` from https://github.com/check-spelling/check-spelling/wiki/Looking-for-items-locally )
# grep '^[^#].*lorem' .github/actions/spelling/patterns.txt|perl -pne 's/.*i..\?://;s/\).*//' |tr '|' "\n"|sort -f |xargs -n1 ge|perl -pne 's/^[^:]*://'|sort -u|w|sed -e 's/ .*//'|w|review -
# Warning, while `(?i)` is very neat and fancy, if you have some binary files that aren't proper unicode, you might run into:
## Operation "substitution (s///)" returns its argument for non-Unicode code point 0x1C19AE (the code point will vary).
## You could manually change `(?i)X...` to use `[Xx]...`
## or you could add the files to your `excludes` file (a version after 0.0.19 should identify the file path)
# Lorem
(?:\w|\s|[,.])*\b(?i)(?:amet|consectetur|cursus|dolor|eros|ipsum|lacus|libero|ligula|lorem|magna|neque|nulla|suscipit|tempus)\b(?:\w|\s|[,.])*
# ... Operation "substitution (s///)" returns its argument for non-Unicode code point 0x1C19AE (the code point will vary).
# ... You could manually change `(?i)X...` to use `[Xx]...`
# ... or you could add the files to your `excludes` file (a version after 0.0.19 should identify the file path)
(?:(?:\w|\s|[,.])*\b(?i)(?:amet|consectetur|cursus|dolor|eros|ipsum|lacus|libero|ligula|lorem|magna|neque|nulla|suscipit|tempus)\b(?:\w|\s|[,.])*)

# Non-English
[a-zA-Z]*[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź][a-zA-Z]{3}[a-zA-ZÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź]*|[a-zA-Z]{3,}[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź]|[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź][a-zA-Z]{3,}
Expand All @@ -591,17 +607,26 @@ systemd.*?running in system mode \([-+].*\)$
# latex (check-spelling >= 0.0.22)
#\\\w{2,}\{

# American Mathematical Society (AMS) / Doxygen
TeX/AMS

# File extensions
\*\.[+\w]+,

# eslint
"varsIgnorePattern": ".+"

# Windows short paths
[/\\][^/\\]{5,6}~\d{1,2}[/\\]

# cygwin paths
/cygdrive/[a-zA-Z]/(?:Program Files(?: \(.*?\)| ?)(?:/[-+.~\\/()\w ]+)*|[-+.~\\/()\w])+

# in check-spelling@v0.0.22+, printf markers aren't automatically consumed
# printf markers
#(?<!\\)\\[nrt](?=[a-z]{2,})
# alternate markers if you run into latex and friends
#(?<!\\)\\[nrt](?=[a-z]{2,})(?=.*['"`])
# alternate printf markers if you run into latex and friends
(?<!\\)\\[nrt](?=[a-z]{2,})(?=.*['"`])

# apache
a2(?:en|dis)
Expand Down Expand Up @@ -630,13 +655,24 @@ W/"[^"]+"
# Compiler flags (linker)
,-B

# libraries
\blib(?!rar(?:ies|y))(?=[a-z])

# WWNN/WWPN (NAA identifiers)
\b(?:0x)?10[0-9a-f]{14}\b|\b(?:0x|3)?[25][0-9a-f]{15}\b|\b(?:0x|3)?6[0-9a-f]{31}\b

# iSCSI iqn (approximate regex)
\biqn\.[0-9]{4}-[0-9]{2}(?:[\.-][a-z][a-z0-9]*)*\b

# curl arguments
\b(?:\\n|)curl(?:\s+-[a-zA-Z]{1,2}\b)*(?:\s+-[a-zA-Z]{3,})(?:\s+-[a-zA-Z]+)*
\b(?:\\n|)curl(?:\.exe|)(?:\s+-[a-zA-Z]{1,2}\b)*(?:\s+-[a-zA-Z]{3,})(?:\s+-[a-zA-Z]+)*
# set arguments
\bset(?:\s+-[abefimouxE]{1,2})*\s+-[abefimouxE]{3,}(?:\s+-[abefimouxE]+)*
\b(?:bash|sh|set)(?:\s+-[abefimouxE]{1,2})*\s+-[abefimouxE]{3,}(?:\s+-[abefimouxE]+)*
# tar arguments
\b(?:\\n|)g?tar(?:\.exe|)(?:(?:\s+--[-a-zA-Z]+|\s+-[a-zA-Z]+|\s[ABGJMOPRSUWZacdfh-pr-xz]+\b)(?:=[^ ]*|))+
# tput arguments -- https://man7.org/linux/man-pages/man5/terminfo.5.html -- technically they can be more than 5 chars long...
\btput\s+(?:(?:-[SV]|-T\s*\w+)\s+)*\w{3,5}\b
# macOS temp folders
/var/folders/\w\w/[+\w]+/(?:T|-Caches-)/
# github runner temp folders
/home/runner/work/_temp/[-_/a-z0-9]+
14 changes: 11 additions & 3 deletions .github/actions/spelling/excludes.txt
Expand Up @@ -6,11 +6,10 @@
(?:^|/)go\.sum$
(?:^|/)package(?:-lock|)\.json$
(?:^|/)Pipfile$
(?:^|/)projectcalico\.mdx$
(?:^|/)pyproject.toml
(?:^|/)requirements(?:-dev|-doc|-test|)\.txt$
(?:^|/)vendor/
(?:|$^ 91.67% - excluded 11/12)(?:^|/)felixconfig\.mdx$
/config/vocabularies/CalicoDocs/[^/]+$
\.a$
\.ai$
\.all-contributorsrc$
Expand Down Expand Up @@ -65,7 +64,6 @@
\.qm$
\.s$
\.sig$
\.snap$
\.so$
\.svgz?$
\.sys$
Expand All @@ -84,6 +82,16 @@
\.zip$
^\.github/actions/spelling/
^\Q.github/workflows/spelling.yml\E$
^\Qcalico-cloud/reference/resources/felixconfig.mdx\E$
^\Qcalico-cloud_versioned_docs/version-19-1/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise_versioned_docs/version-3.16/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise_versioned_docs/version-3.17/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise_versioned_docs/version-3.18-2/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise_versioned_docs/version-3.18/reference/resources/felixconfig.mdx\E$
^\Qcalico-enterprise_versioned_docs/version-3.19-1/reference/resources/felixconfig.mdx\E$
^\Qcalico_versioned_docs/version-3.25/reference/resources/felixconfig.mdx\E$
^\Qcalico_versioned_docs/version-3.26/reference/resources/felixconfig.mdx\E$
^\Qstatic/.nojekyll\E$
^\Qstatic/css/swagger-ui.css\E$
^\Qstatic/json/calico-api-swagger.json\E$
Expand Down

0 comments on commit f8f73e2

Please sign in to comment.