Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell support (config, workflow to detect/not fix) and make it fix few typos #26315

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

yarikoptic
Copy link
Contributor

More about codespell: https://github.com/codespell-project/codespell .

I personally introduced it to dozens if not hundreds of projects already and so far only positive feedback.

CI workflow has 'permissions' set only to 'read' so also should be safe.

  • There were numerous commits where codespell was used "manually" to fixup some of the typos.

  • CI: use codespell, flake8 to check code before merging? #13694 still contemplates idea of running codespell on PRs

  • An early attempt to introduce codespell CI was abandoned: BLD: Codespell for NumPy CI #19594 -- IMHO now is better time!

  • I did lapack_lite last in the separate commits if there is desire to not touch it (although I saw commits fixing typos in it as well).

  • I did some variable renames to take them away from "close to a typo" form, might be worth having a look and suggesting alternative names.

  • I did fix older changelogs as I think there is no point to cherish typos in them, but they could also be ignored.

@charris
Copy link
Member

charris commented Apr 19, 2024

Please don't touch the changelogs, they match the title used in the corresponding PR, typos and all.

@@ -262,7 +262,7 @@ conflict of interests include, but are not limited to:
All members of the Council shall disclose to the rest of the Council any
conflict of interest they may have. Members with a conflict of interest
in a particular issue may participate in Council discussions on that
issue, but must recuse themselves from voting on the issue.
issue, but must recurse themselves from voting on the issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of a adding automated spell-checking if we'll have to ignore it making mistakes like this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rright -- I even ran into this particular one before -- recuse is indeed rare and I failed to spot it in review, sorry about that. I will whitelist it.

@yarikoptic
Copy link
Contributor Author

Thanks for quick feedback @charris and @ngoldbaum !

I

  • rebased (apparently there were new typos in main)
  • ran with current master of codespell to ensure picking up as many typos as possible (thought that typos detected by CI were due to mine being outdated but it just needed rebase)
  • whitelisted some more per feedback above (do not touch changelogs, recuse is ok so we might recuse into folders at some point as a side-effect ;-) )
  • adjusted some patches to make lint satisfied (one change with spaces around = was for "old" code, but triggered to be checked since I changed the line)

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, this found many typos. On the other, it has definite opinions about variable names in tests, not all are valid. This is kind of like black in that it operates on the entire code base, and like black seems to be over-strict.

>>> newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> newfp
>>> fp_new = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> fp_new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did codespell do this? It seems unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I did that in 639693e :

commit 639693ee4fa82173f5444e76cfec47704d655ea6
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Fri Apr 19 10:02:05 2024 -0400

    DOC: adjust variables in example to be fp_{kind} instead of {k}fp and fp{k}
    
    I think it makes them more consistent (always a suffix) and also more
    readable since more descriptive of the taken action.
    
    This is triggered to avoid "looks-like-a-typo" hits  in codespell for fpr and likely
    others

>>> fpr = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> fpr.flags.writeable
>>> fp_ro = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))
>>> fp_ro.flags.writeable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above... too close to for typo etc.

@@ -816,7 +816,7 @@ def test_einsum_fixedstridebug(self):
tp = np.tensordot(A, B, axes=(0, 0))
assert_equal(es, tp)
# The following is the original test case from the bug report,
# made repeatable by changing random arrays to aranges.
# made repeatable by changing random arrays to arranges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, adding arange and aranges to whitelist

arro = np.zeros((4, 4))
arr = arro[::-1, ::-1]
arr0 = np.zeros((4, 4))
arr = arr0[::-1, ::-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above and specifically commit 8afd1bf

padd = np.repeat(np.min(arr), 513)
rarr = np.concatenate((arr, padd))
arr_ = np.repeat(np.min(arr), 513)
rarr = np.concatenate((arr, arr_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broadcasti = self.setall(i)
assert broadcasti == [i] * self.nlanes
broadcast_i = self.setall(i)
assert broadcast_i == [i] * self.nlanes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -18,7 +18,7 @@
# communal Conda bin directory. DLLs from NumPy's dependencies must also be
# collected to capture MKL, OpenBlas, OpenMP, etc.
from PyInstaller.utils.hooks import conda_support
datas = conda_support.collect_dynamic_libs("numpy", dependencies=True)
_ = conda_support.collect_dynamic_libs("numpy", dependencies=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have a left-side at all

@@ -37,7 +37,7 @@ def _make_along_axis_idx(arr_shape, indices, axis):
shape_ones = (1,) * indices.ndim
dest_dims = list(range(axis)) + [None] + list(range(axis+1, indices.ndim))

# build a fancy index, consisting of orthogonal aranges, with the
# build a fancy index, consisting of orthogonal arranges, with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

pyproject.toml Outdated
# Skip all shortish (up to 5) ACRONYMS
ignore-regex = '"worl"|https://citeseerx\.ist\.psu\.edu|\b([A-Z][a-z]{1,9}|[A-Z]{1,5})\b|.*codespell-ignore.*|/seh/'
# some common hits and some "questionable" and false positives (rare words)
ignore-words-list = 'nd,nin,numers,numer,inout,ccompiler,fo,CCompiler,dout,ans,ser,ot,asign,siz,alow,fro,delimitor,alo,dum,desig,recuse,implementor'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be folded into a multiline statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha -- it seems to can. Splitting into multiple lines

@@ -334,7 +334,7 @@ eliminating their use internally and two new C-API functions,

have been added together with a complementary flag,
``NPY_ARRAY_WRITEBACKIFCOPY``. Using the new functionality also requires that
some flags be changed when new arrays are created, to wit:
some flags be changed when new arrays are created:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave these files out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipping and dropping manual 21d95b2

@yarikoptic
Copy link
Contributor Author

commented on above review by @mattip , dropped a commit, rebased the latter ones, reran codespell (so now no changes to release notes), force pushing now.

@yarikoptic
Copy link
Contributor Author

uff, main got new typos since previous time, rebasing/rerunning again!

Used
  codespellit list-hits-sorted | sed -n -e '/6 /,9999p' | tac | f2 | tr '\n' ','
to find and sort
… fp{k}

I think it makes them more consistent (always a suffix) and also more
readable since more descriptive of the taken action.

This is triggered to avoid "looks-like-a-typo" hits  in codespell for fpr and likely
others
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w -i 3 -C 2",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
… automagically but ignoring the failure due to ambigous typos

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w || :",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
…us typos

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w -i 3 -C 2",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Funny thing is that it does not catch it on 'main' for some reason
but catches here... it is likely because

  python tools/linter.py --branch origin/main

works on the diff only (too fast on main) and thus whenever those
spaces were introduced, it did not check and never flipped since then.
I changed that line, and it was checked and flagged.
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants