Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check spelling should ignore language keywords based on file type #2

Open
MaT1g3R opened this issue Apr 20, 2020 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@MaT1g3R
Copy link

MaT1g3R commented Apr 20, 2020

No description provided.

@MaT1g3R
Copy link
Author

MaT1g3R commented Apr 20, 2020

Maybe we could have a directory for allowed words, then each of the file could look like this:

ft: *.py
words:
repr
addi
...........

@jsoref
Copy link
Member

jsoref commented Apr 22, 2020

Working approach today

To define multiple language sets, one can:

.github/workflows/spelling.yml:

name: Spell checking
on:
  push:
    branches:
      - "**"
    tags-ignore:
      - "**"
  schedule:
    # * is a special character in YAML so you have to quote this string
    - cron: '5 * * * *'

jobs:
  spell-check:
    name: Spell checker
    runs-on: ubuntu-latest
    continue-on-error: true
    strategy:
      matrix:
        language: [python, java]
    steps:
    - uses: actions/checkout@v2.0.0
      with:
        fetch-depth: 2
    - uses: check-spelling/check-spelling@0.0.16-alpha
      env:
        bucket: .github/actions/spelling
        project: ${{ matrix.language }}
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  • .github/actions/spelling/common/:
    • advice.txt:
      You can include advice explaining how to update things and when to put items into common/ instead of into the specific directories.
    • dictionary.txt:
      Your global dictionary
    • excludes.txt
      Your global excludes
    • patterns.txt:
      Your global patterns
    • ⚠️ You probably do not want to include whitelist in common/.
  • .github/actions/spelling/python/:
    • advice.txt:
      symlink to ../common/advice.txt
    • dictionary/:
      • dictionary.txt:
        symlink to ../../common/dictionary.txt
      • python.txt:
        Python dictionary
    • excludes/:
      • excludes.txt:
        symlink to ../../common/excludes.txt
      • python.txt:
        Rules to exclude non python files
    • only.txt:
      \.py$
      
    • whitelist/:
      • whitelist.txt:
        Python whitelist
  • .github/actions/spelling/java/:
    • advice.txt:
      symlink to ../common/advice.txt
    • dictionary/:
      • dictionary.txt:
        symlink to ../../common/dictionary.txt
      • java.txt:
        Java dictionary
    • excludes/:
      • excludes.txt:
        symlink to ../../common/excludes.txt
    • only.txt:
      \.java$
      
      • java.txt:
        Rules to exclude non java files
    • whitelist/:
      • whitelist.txt:
        Java whitelist

The downside to this is that you will get distinct reports (i.e. potentially multiple comments) for spelling errors from different kinds of files.
I think, generally, a developer is only writing changes in one language at a time.
Also, there's a limit that I've hit where if there are more than ~10 (?) items to report, the rest aren't tagged, I suspect that this would mean each category would get its own 10 😃 .

-- I'd love to get some implementation experience here. I made a simple version of the above as a Proof of Concept. It does work. The next step would be properly setting up dictionaries for the programming languages...

It is possible for an action to pass data off to another stage, so it might be possible for me to add support for coalescing reports.

If one wanted to be truly adventurous, the workflow file allows scoping jobs to changed files.

Thoughts

Organization common config

GitHub itself has the concept of a magical per organization .github repository.
There's a request to be able to use that for actions.
-- Afaiu, that isn't actually supported yet.

Per organization shared rules

Because of the way configurations work for this action, you could just add an extra uses: actions/checkout@v2.0.0 to retrieve common elements into another directory and rely on symlink(s) to pull in those items.

Support for only

I'm aware that I have excludes but not includes. I'd been trying to figure out the right way to approach mixing them which is why I hadn't addressed it.

What I've done is a simple exclude excludes first followed by drop anything not included by only. That's fairly easy to reason through. If people need something more complicated, they could include some negative lookaheads in their excludes, or not put some things in their excludes and then rely on more complicated only rules.

Offhand, I think this is probably the most important item to help make the workaround I describe at the top work. And it's pretty easy for me to write, so, I'll probably look into it this week today. -- 🌟 implemented in 0.0.16-alpha.

Docker / Build speed

I'm currently using docker. I need to experiment w/ not using docker to see how it works. If it works well, I might move out of docker. Otherwise, I might try building a docker image publishing it into the github docker repository and having the action use that. -- 🌟 As of 0.0.16 this code uses Node.js instead, it saved ~30s (which for tiny repos was the vast majority of the time).

Complicated config files

I'd like to avoid having to build/maintain a dictionary for each file I open. I know that I could cache it, but I really don't want to have to try to invent my own caching algorithm.

I also want to make it easy for people to add to the word list files w/o having to worry about things like indentation.

I've seen problems w/ other consumers involving line endings:

  • one consumer had a user whose editor flipped the line endings on the whitelist which in an older version resulted in the whitelist being effectively discarded
  • another project had .gitconfig which effectively did that globally

I've since added support for all line endings so that specific problem isn't a thing.

I really don't want to have to worry about tabs vs. spaces or various indentation rules.

I'd rather avoid implementing my own linter for my config file.

I have added some basic linting -- one to report when patterns (regular expressions) are broken -- because that's pretty easy to mess up regular expressions and the results can be very confusing.

@jsoref
Copy link
Member

jsoref commented Aug 14, 2020

I've promoted the preceding comment to the wiki as Configuration: Multiple programming languages. The features for this are all in 0.0.16-alpha. (It's silly for me to repeatedly update a comment in an issue ....)

The rest of the request should be addressed by Feature: Area dictionaries which I'm starting to flesh out -- I've created some of the basis for it in 0.0.17-alpha (which I hope to release real-soon-now), I think the initial area dictionaries will come within the next couple of releases.

Some disclaimers:

  • I'm still not certain about how I want to version area dictionaries.
    I've only started thinking about versioning a main dictionary.

@jsoref jsoref added the enhancement New feature or request label May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants