Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name the files for what they act on #10

Open
briandfoy opened this issue Mar 14, 2021 · 10 comments
Open

Name the files for what they act on #10

briandfoy opened this issue Mar 14, 2021 · 10 comments
Labels
enhancement New feature or request
Projects

Comments

@briandfoy
Copy link

briandfoy commented Mar 14, 2021

It would be easier to me to keep track of which file does what it the filename included the thing it operated on and what sort of entry it expects"

  • allow -> allow_words
  • reject -> remove_words
  • patterns -> exclude_word_patterns
  • expect -> allow_words
  • excludes -> exclude_file_patterns
  • only -> allow_file_patterns

And, it reads like allow and expect do the same thing.

You might already be doing this, because the action seems to looks for more specific filenames than the ones you advertise, so maybe you only need to update the read-me.

@jsoref
Copy link
Member

jsoref commented Mar 14, 2021

Sorry, I should have moved master forward to 0.0.17 -- fixed.


I could definitely migrate to a _words or _file_patterns suffix -- I do support some previous file names, so another migration wouldn't be particularly problematic.

Note: only.txt doesn't do what you're thinking. Consider:

  • test/case.txt
  • test/favicon.ico
  • source/file.txt
  • images/apple.png
  • images/pear.gif
  • images/README.txt

excludes.txt:

\.png$
\.gif$
\.ico$

only.txt:

^test/

The excludes file will filter the files to consider to:

  • test/case.txt
  • test/favicon.ico
  • source/file.txt
  • images/apple.png
  • images/pear.gif
  • images/README.txt

i.e.

  • test/case.txt
  • source/file.txt
  • images/README.txt

And then only is applied and the result is:

  • test/case.txt

(corpus - excludes) & only => files_to_check


Allow is really a way to add words to the Dictionary. Reject is a way to remove words from the Dictionary.

(Dictionary + Allow) - Reject => active_dictionary


  1. files_to_check
  2. filter lines by patterns => lines as seen by tool
  3. lines as seen by tool => words split by tool (including some fix-ups for fancy quotation marks, but mostly just dropping all characters that aren't letters or quotation marks) => pending_words
  4. pending_words - active_dictionary => hopefully_expected
  5. hopefully_expected - expect => words to complain about
  6. expect - hopefully_expected => words that can be removed from expect (mentioned only if it complains)

I'm definitely working to improve things, and hopefully this discussion will result in improvements (if only to the documentation, but likely to the implementation).


Fwiw, I'm going to give a short presentation talking about Check Spelling on Thursday to https://github.com/keptn/keptn, my current slides:

https://docs.google.com/presentation/d/13U8a9ibqp7B1UrE8HWk7PsnZLImIwgp06vopLdmcX-s/edit?usp=sharing

@jsoref
Copy link
Member

jsoref commented Mar 14, 2021

The easiest names would be:

  • dictionary_additions (allow)
  • dictionary_removals (reject)
  • file_ignore (excludes)
  • file_exclusive (only) -- I'm not a huge fan of this as it's somewhat close to excludes, but..
  • line_pattern_masks (patterns)
  • expected_words (expect)

How do those sound?

@briandfoy
Copy link
Author

Yeah, those would be fine.

@jsoref
Copy link
Member

jsoref commented Mar 14, 2021

I need to think about this a little longer before I commit to them. The first two sets are pretty reasonable, the last two don't really fit a pattern (kind + action) which is frustrating. While I'm willing to make changes, I try not to publish too many iterations.

I'm currently looking into switching back to using the node12 GitHub host. It's the only one with API access that I need to store data which would help w/:
bot based updates

@jsoref
Copy link
Member

jsoref commented Apr 8, 2021

I'm thinking word_expectations instead for the last one -- I've placed my current thoughts here: https://github.com/check-spelling/check-spelling/wiki/Feature:-Easier-to-understand-filenames

At the very least, I'm going to add support for advice.md (unless I pick a different base name).

Does anyone have a preferred file extension to mean "this file contains regular expressions"? I might go with .filter or .p5re or .pcre or .re.

@briandfoy
Copy link
Author

I have this idea that you shouldn't separate these into exact or pattern matches. Just use pattern matches for everything, and use the gitignore patterns.

I wouldn't give these files an extension at all.

@jsoref
Copy link
Member

jsoref commented Apr 11, 2021

Would you suggest using https://metacpan.org/pod/Parse::Gitignore?

I could definitely use gitignore format for file_ignore and file_exclusive (I hadn't spent the time looking into a CPAN module for it). I was also worried about how long it'd take to get my tool running. (I'm not sure about the best practices for third party modules in Perl land...)

The dictionary / dictionary_additions / word_expectations files are specifically words, so that's still a different family.

The reason I like word_expectations is that it's a way to temporarily say "yes, I know we're using this non dictionary word, when we remove it, we can stop allowing it". If I switch to patterns, then it's pretty hard to say "hey, you aren't using this pattern anymore, you should probably get rid of it so you don't accidentally let something in later". Obviously I can technically keep track of pattern hits, but it's harder to explain...

dictionary_removal currently works on patterns
line_masks works on patterns
-- these two obviously can't use .gitignore as the format, and I feel the second one benefits from being Perl regular expressions.

@briandfoy
Copy link
Author

Text::Gitignore is better.

@jsoref
Copy link
Member

jsoref commented Apr 11, 2021

Fwiw, I'm going to be releasing 0.0.18-alpha shortly (hopefully) w/ work that I started last summer. It takes me a long time to get happy with changes -- I like to think them through/experiment before committing to things.

My most recent adventure (where I didn't experiment before implementing) was trying to rename the default branch to main -- which broke things (not surprisingly, but... sigh). -- The older default branch will get its own deprecation warning with the 0.0.18 release (saying "please upgrade to either @main or a pinned version).

I suspect that this change will be in a 0.0.19 or 0.0.20.

For people curious about semantic-versioning, I'm not absolutely committed to doing so, but, currently I add features and deprecation warnings, I don't think I've actually removed anything (other than some slight bug fixes, which I don't really consider API breaks -- although I do highlight them in the release notes). When I start removing support for older filenames/file formats/configurations, I'll either bump the minor or the major version.

@jsoref jsoref added the enhancement New feature or request label May 4, 2021
@jsoref
Copy link
Member

jsoref commented Aug 24, 2021

I've made my first draft of this feature:

b573bad

It looks like this feature is one of the ones that confirms a medium version bump. I'm going from 0.0.19 to 0.1.0.

I'm not quite sure when I'll get there, but it was a big item on my todo list.

Thankfully, it turned out to be pretty easy to implement.

Sadly, it will require a lot of testing, and I'm going to want to convert a lot of internal consumers so that they don't drown under warnings.

I've updated https://github.com/check-spelling/check-spelling/wiki/Possible-features to mostly match the current status of things including adding gitignore to my todo list.

Fwiw, I suspect as part of going from 0.x.y to 1.0.0 that I will drop a bunch of deprecated features. This could potentially include dropping support for the older file names here. But, I'm definitely going to need to experiment. And I might actually try one release where the current names aren't deprecated (in fact, I probably will). So it's possible that 0.1.0 will support these names and the old names w/ no deprecation warnings and a 0.1.1 would start warning about the old names. -- I still need to experiment a bit.

@jsoref jsoref added this to In progress in 0.1.0 Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
0.1.0
In progress
Development

No branches or pull requests

2 participants