New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ignoring multi-line blocks (e.g. code snippets in markdown files) #9
Comments
Unfortunately, there's currently no provision to handle multi-line things. I'm still thinking about how to do it :-(. My concern is that people don't necessarily ignore binary files, which means there's a risk of the tool having to pull the entire file into memory. I think to address this, I'll probably have to switch to a mode where I build my own state machine :-(. |
Ah, ok that makes sense! Thanks for your speedy response too. I can workaround by just adding all problematic words to |
Yeah... agreed. You can cheat by adding a regexp for So far, I've cheated by manually checking such files and then excluding the files. But that isn't great. |
Thanks, I'll use that in some places. This seems to work for single-line backtick escaping: In some of the examples you put a |
If you mean things like:
It assumes that a url is of the form: It would not catch:
It's as opposed to: \bdocs\.google\.com/[a-z]+/d/(?:e/|)[0-9a-zA-Z_-]+/ To avoid tripping over:
|
Ah yes ok, was wondering why that didn't need escaping, or if it was signifying the beginning of the regex. |
I've added some extra comments to the entries in the wiki. Let me know if they're helpful. I suppose I could actually include some examples of how they break (as I did above).... |
Hey @LukeStorry and @jsoref, Thanks for the great work. This is not really my domain, otherwise I'd offer to help. |
I'm still very much thinking about how I want to implement this feature since it's really the ability to parse arbitrary file content, and I also want to make sure I don't mess up line numbering (the current version doesn't worry about line numbering much, but 0.0.18-alpha which I hope to release shortly will be reporting each thing it finds w/ lines and columns, so not messing up line numbering will be even more important then). One approach would be for me to just import someone else's parser. I haven't actively looked, but I'm fairly pessimistic about the odds that I'd find one I liked. Handling markdown snippets separately from other files today@LukeStorry: fwiw, there is one approach that you could try today... Note that what I'm writing below is entirely untested, there's a version in the wiki which actually worked. Use: spelling workflow w/ matrix
name: Spell checking
on:
pull_request_target:
push:
issue_comment:
pull_request_review_comment:
jobs:
spell-check:
name: "Spell Checker"
runs-on: ubuntu-latest
continue-on-error: true
strategy:
matrix:
area: ["code", "markdown"]
steps:
- name: checkout-merge
if: "contains(github.event_name, 'pull_request')"
uses: actions/checkout@v2.0.0
with:
ref: refs/pull/${{github.event.pull_request.number}}/merge
- name: checkout
if: "!contains(github.event_name, 'pull_request')"
uses: actions/checkout@v2.0.0
- uses: check-spelling/check-spelling@prerelease
with:
config: ".github/actions/spelling/${{ matrix.area }}"
experimental_apply_changes_via_bot: 1 Follow the advice in the wiki for how to set up symlinks (note that Markdown patternsIn the
You'll want to actively use General thoughts on improving adoption@pudgereyem: thanks for trying. I'm currently in the process of making the adoption easier. looking at your blog repo, you could try using: https://github.com/check-spelling/spell-check-this/blob/prerelease/.github/workflows/spelling.yml talk to the bot to update expect.txtSpecifically, if you make a PR to the default branch which has the above workflow, it'll let you automatically accept its suggestions for automatic updates for no newline at eofLooking at your repository, I think I want it to automatically fix no-newline-at-eof -- the code already exists in the repository, so doing it would be pretty harmless. Unix tools (including git) really hate files that are missing the newline-at-eof, but Windows tools tend to make it really hard to understand how to fix this. automatic updates for excludesI also want to set it up to be able to automatically update Building a list of correctionsThe other thing that people would probably benefit from is my little Google Sheet that I use to calculate replacements... Google Sheets is really good at identifying real typos and suggesting real replacements. -- Eventually, I intend to offer something like that via check-spelling: Suggest corrections, but that's really a long ways away (long after I handle multi-line blocks). |
@jsoref thanks for the quick response.
Cool, I'll give it a try later! Btw, can you give me an example of a word that you'd add to
Yeah, agreed. I missed that.
What's the intended workflow for the Google Sheet? I pasted the misspelled words into the All the best, |
n.b. I've copied most of the content from this comment into https://github.com/check-spelling/check-spelling/wiki/Configuration#allow and https://github.com/check-spelling/check-spelling/wiki/Configuration#expect Allow(see area dictionaries for other examples):
... They're really words, just not in the ancient base dictionary. They might not be used today in your project, but there's no reason for the spell checker to complain to a contributor tomorrow because it's foreseeable that they might be. – I'll try to update the wiki to clarify this at some point. Fwiw, this month I've finally started working on a way to collect things like this in a vaguely systematic way. (I made a different draft last summer but didn't like where it went.) ExpectSome arbitrary strings that are in test files that aren't really words. They should be removed if the test are changed/removed. Allow vs ExpectRoughly if it's a proper noun of some sort of exists in the real world outside the project, it's a good candidate for NoteThe bot doesn't really care. You could put everything into The second tab of the sheet tries to describe the workflow. |
I have a block of text with a python The same happens with Suggestions? |
You could add Similarly check-spelling splits on certain case transitions, so it sees |
I went with the option of adding them to
🙈 |
That doesn't ignore the word, it ignores the line containing the word. If you only want to ignore the word, you'd use |
Fwiw, I'm starting to play with Feature: Block Ignore. I ran into some pain involving partial line error reporting, so my initial draft will basically ignore the entire contents of a begin line through the entire contents of an end line. I'll play with it for a while. I'm not committing to shipping with this feature in the next version, but at least I am starting to iterate on this problem... My initial version is a pure string marker as opposed to a regular expression... I'm not wed to that design choice (and I don't like some of my initial choices -- the file format will almost certainly change....) |
I'm trying to add in exclusion for backticked code snippets so the contents don't get spell checked.
I've tried both
\x60{1,3}[\s\S]+?\x60{1,3}
and\
[\n.]+?`` and variations thereof, but can't seem to get the action to ignore the snippets.For example:
Section 4 - Encryption
Here is a sample encrypton function:
The
encrypt
function takes one parameter...The text was updated successfully, but these errors were encountered: