Fix all mypy typechecking errors, add a lot of type annotations #1399

oprypin · 2023-11-01T00:17:31Z

Closes Full type annotations #1389

Functions that don't have any type annotations aren't checked by mypy, they are skipped for now in this commit

waylan

I got much less that half way through this and keep seeing the same issues reoccurring and gave up. See my comments below and please make the adjustments everywhere, not just where I've noted them.

markdown/core.py

waylan · 2023-11-01T13:00:57Z

markdown/blockprocessors.py

@@ -426,7 +426,9 @@ def get_items(self, block: str) -> list[str]:
                if not items and self.TAG == 'ol':
                    # Detect the integer value of first list item
                    INTEGER_RE = re.compile(r'(\d+)')
-                    self.STARTSWITH = INTEGER_RE.match(m.group(1)).group()
+                    int_match = INTEGER_RE.match(m.group(1))
+                    assert int_match is not None


Not sure what you are trying to accomplish here. Does this make it possible for Markdown content to cause an error? If so, that would be unacceptable.

With lines 429-431 reverted:

$ mypy markdown markdown/blockprocessors.py:429: error: Item "None" of "Match[str] | None" has no attribute "group" [union-attr] Found 1 error in 1 file (checked 33 source files)

The code before:

self.STARTSWITH = RE.match(string).group()

where RE.match(string) can be None, and None.group() is an error

The code after:

int_match = RE.match(string) assert int_match is not None self.STARTSWITH = int_match.group()

The old code doesn't care to check for the None case where it would violently error with AttributeError. mypy doesn't let it slide and exposes a potential bug.

The new code directly checks for it and errors with a clearer message.

There is no change regarding which situations an error does or does not happen.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.

So, now we know about a bug which we didn't know about before. That is good. But this is not the way to address it. Under no circumstances should source text cause Markdown to raise an error. In fact, that is one of the primary goals of the project as documented on the home page. Therefore, this is not the proper way to fix the bug. The error needs to be silenced and some reasonable text needs to be included in the output (depending on the conditions under which the issue arises).

waylan · 2023-11-01T13:15:48Z

markdown/core.py

-                output_file = writer(output, errors="xmlcharrefreplace")
-                output_file.write(html)
+                output_writer = writer(output, errors="xmlcharrefreplace")
+                output_writer.write(html)


Not sure why this change was made at all. Note that the Contributing Guide states:

Legacy code which does not follow the guidelines should only be updated if and when other changes (bug fix, feature addition, etc.) are being made to that section of code. While new features should be given names that follow modern Python naming conventions, existing names should be preserved to avoid backward incompatible changes.

I realize that in this instance the variable name is not exposed outside of this method so there is no concern over a backward incompatible change, but the general principle remains. Please, let's refrain from making unnecessary changes just for personal preference.

100% of code body changes in this pull request are in response to error messages produced by mypy. Don't need to tell me this regarding unrelated changes because I'm always the first one to say this as well.

This particular change is because mypy doesn't like that these two differently-typed values share the same variable name.

With lines 425-429 reverted:

$ mypy markdown markdown/core.py:427: error: Incompatible types in assignment (expression has type "StreamReader", variable has type "StreamReaderWriter") [assignment] Found 1 error in 1 file (checked 33 source files)

I embrace Python not being strongly typed. If that means making changes like this, then no thank you.

markdown/core.py

waylan · 2023-11-01T13:22:43Z

markdown/core.py

-def markdownFromFile(**kwargs: Any):
+def markdownFromFile(
+    *,
+    input: str | BinaryIO | None = None,
+    output: str | BinaryIO | None = None,
+    encoding: str | None = None,
+    **kwargs: Any
+) -> None:


I don't recall why the specific keyword parameters were removed some years ago, but why did you feel the need to add them back in? I'm not opposed to it if there is a good reason related to this scope of this PR, but I would like to here the reason.

However, I am more concerned about why you added position arguments (*)?

The function's signature doesn't change, I'm just formalizing the de-facto signature. The * is to start named-only parameters, not positional-only parameters. They were also named-only before.

waylan · 2023-11-01T13:22:49Z

markdown/extensions/admonition.py

-            sibling = self.current_sibling
+            prev_sibling = self.current_sibling
            block, the_rest = self.detab(block, self.content_indent)
            self.current_sibling = None
            self.content_indent = 0
-            return sibling, block, the_rest
+            return prev_sibling, block, the_rest


Again, we are making unnecessary out-of-scope changes.

This change was made in response to mypy errors. Again this is because mypy doesn't like that a variable name is reused with different types in different places. Which is really a limitation of mypy, most other type checkers would deal with this perfectly fine.

With lines 78-82 reverted:

$ mypy markdown markdown/extensions/admonition.py:84: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element") [assignment] markdown/extensions/admonition.py:87: error: Incompatible types in assignment (expression has type "None", variable has type "Element") [assignment] markdown/extensions/admonition.py:101: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element") [assignment] markdown/extensions/admonition.py:113: error: Incompatible types in assignment (expression has type "None", variable has type "Element") [assignment]

waylan · 2023-11-01T13:23:46Z

markdown/extensions/admonition.py

@@ -143,6 +148,7 @@ def run(self, parent, blocks):
                p.text = title
                p.set('class', self.CLASSNAME_TITLE)
        else:
+            assert sibling is not None


Can this raise an error based on Markdown input? If so, this is unacceptable.

The code already has a problem where sibling might be None and sibling.tag will be an error. mypy correctly detects and reports it, and requires us to make an explicit assertion, which indicates that we either know that this will never actually happen, or perhaps don't care if it does but then the error will be more explicit.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.

With line 151 reverted:

$ mypy markdown markdown/extensions/admonition.py:152: error: Item "None" of "Element | None" has no attribute "tag" [union-attr] markdown/extensions/admonition.py:152: error: Item "None" of "Element | None" has no attribute "text" [union-attr] markdown/extensions/admonition.py:153: error: Item "None" of "Element | None" has no attribute "text" [union-attr] markdown/extensions/admonition.py:154: error: Item "None" of "Element | None" has no attribute "text" [union-attr] markdown/extensions/admonition.py:155: error: Argument 1 to "SubElement" has incompatible type "Element | None"; expected "Element" [arg-type] markdown/extensions/admonition.py:158: error: Incompatible types in assignment (expression has type "Element | None", variable has type "Element") [assignment]

waylan · 2023-11-01T13:25:40Z

markdown/extensions/codehilite.py

        """ Find code blocks and store in `htmlStash`. """
        blocks = root.iter('pre')
        for block in blocks:
            if len(block) == 1 and block[0].tag == 'code':
                local_config = self.config.copy()
+                text = block[0].text
+                assert text is not None


Another assert....

Another place where the code currently assumes that something is present there and violently errors if there's no match. mypy doesn't let it slide.

TBD: It is worth checking whether this bug case can actually happen (in current code already). In any case we should be happy that mypy flagged this.

With lines 275-278 reverted:

$ mypy markdown markdown/extensions/codehilite.py:276: error: Argument 1 to "code_unescape" of "HiliteTreeprocessor" has incompatible type "str | None"; expected "str" [arg-type] Found 1 error in 1 file (checked 33 source files)

waylan · 2023-11-01T13:26:01Z

markdown/extensions/def_list.py


        raw_block = blocks.pop(0)
        m = self.RE.search(raw_block)
+        assert m is not None


another assert...

Another place where the code currently assumes that the match will happen. mypy doesn't let it slide. I see here that it is 100% safe to assume that the match actually happens, based on the outcome of test. But it is OK that mypy forces us to write out this assumption explicitly, both so the writer double-checks this and that the reader doesn't need to think that maybe this is a bug.

With line 47 reverted:

$ mypy markdown markdown/extensions/def_list.py:48: error: Item "None" of "Match[str] | None" has no attribute "start" [union-attr] markdown/extensions/def_list.py:49: error: Item "None" of "Match[str] | None" has no attribute "end" [union-attr] markdown/extensions/def_list.py:56: error: Item "None" of "Match[str] | None" has no attribute "group" [union-attr] markdown/extensions/def_list.py:58: error: Item "None" of "Match[str] | None" has no attribute "group" [union-attr] Found 4 errors in 1 file (checked 33 source files)

waylan · 2023-11-01T13:26:48Z

markdown/extensions/def_list.py

-        if not terms and sibling is None:
-            # This is not a definition item. Most likely a paragraph that
-            # starts with a colon at the beginning of a document or list.
-            blocks.insert(0, raw_block)
-            return False
-        if not terms and sibling.tag == 'p':
-            # The previous paragraph contains the terms
-            state = 'looselist'
-            terms = sibling.text.split('\n')
-            parent.remove(sibling)
-            # Acquire new sibling
-            sibling = self.lastChild(parent)
-        else:
-            state = 'list'
+        state = 'list'
+        if not terms:
+            if sibling is None:
+                # This is not a definition item. Most likely a paragraph that
+                # starts with a colon at the beginning of a document or list.
+                blocks.insert(0, raw_block)
+                return False
+            if sibling.tag == 'p':
+                # The previous paragraph contains the terms
+                state = 'looselist'
+                assert sibling.text is not None
+                terms = sibling.text.split('\n')
+                parent.remove(sibling)
+                # Acquire new sibling
+                sibling = self.lastChild(parent)


This appears to be out-of-scope as well.

This is again in response to mypy errors, unfortunately these kind of gymnastics are necessary sometimes to make it happy.

With lines 61-75 reverted:

$ mypy markdown markdown/extensions/def_list.py:66: error: Item "None" of "Element | None" has no attribute "tag" [union-attr] markdown/extensions/def_list.py:69: error: Item "None" of "Element | None" has no attribute "text" [union-attr] markdown/extensions/def_list.py:69: error: Item "None" of "str | Any | None" has no attribute "split" [union-attr] markdown/extensions/def_list.py:70: error: Argument 1 to "remove" of "Element" has incompatible type "Element | None"; expected "Element" [arg-type]

oprypin · 2023-11-01T15:27:59Z

See my comments below and please make the adjustments everywhere, not just where I've noted them.

The adjustments would basically mean reverting the changes, but almost none of them can be reverted, because then mypy errors appear.

So this would just mean giving up trying to check the code with mypy.
Maybe there is some misconception, but the point of running the actual mypy checks is that it also checks function bodies and can complain about potential problems until you write code in a way that makes it clear to mypy that there is no problem.

oprypin · 2023-11-01T16:11:43Z

Extracted simpler changes to Add and improve type annotations #1401

waylan · 2023-11-01T17:11:01Z

Maybe there is some misconception

Python not being strongly typed if a feature which I embrace. I will continue to write my code that way. Anything which forces me to write strongly typed code in a language which is not strongly typed is unwelcome.

I see type annotations as documentation. The code dictates the annotations, not the other way around. Of course, over time as updates and changes are made to the code, the annotations could get out-of-sync with the code. Therefore, it is reasonable to include a type checker in the CI to ensure that doesn't happen. However, the type checker should only be checking the already defined annotations. There is no need to type check an unannotated private variable which is never used outside of a single method (for example).

If mypy cannot serve that purpose, then it is the wrong tool for the job.

waylan · 2023-11-01T17:58:51Z

I am going to close this in favor of #1401. I suspect that mypy is not the right fit for this project. However, if it can be configured to fit, then this can be reopened. Or, if a different tool is proposed, that should be in a new PR.

oprypin · 2023-11-01T18:01:22Z

I'll want to sync this branch to latest if #1401 is merged, and then leave this for posterity. Not sure if I'll be able to do that if it's closed.

oprypin · 2023-11-01T18:14:40Z

Yeah mypy is pretty bad!
Sadly other checkers would ask for even more code changes :D

waylan · 2023-11-01T18:37:50Z

Okay, I'll leave it open for now.

oprypin · 2023-11-02T01:12:24Z

By disabling a large amount of mypy checks, I reduced this to a much smaller change #1402 that is based on #1401.
It will ensure the type annotations don't fall into disrepair.

facelessuser · 2023-11-02T01:17:41Z

I think if you are using types, using mypy is very important to keep things from getting broken. I also agree that there is nothing wrong with limiting the checks and/or doing per line ignores for particularly tricky things that can be resolved later if there is interest.

waylan · 2023-11-02T16:08:03Z

I have taken a step back and taken a fresh look at the overall matter of annotations and type checking. I think perhaps I dove in with a few misunderstandings and miscomprehensions. Sorry about that; especially if I have led anyone down a path of doing work I would never approve. That was not my intention.

So to be clear. upon reflection, the following are my general expectations.

This library should always be a dynamically typed library. I am opposed to this library being forcibly strongly typed. That probably means we will never have a py.typed file (or whatever it is called) and that should not be a goal to work towards.
The only purpose of annotations is to serve as documentation. The code dictates the annotations, not the other way around. In other words, we do not edit code to fit the annotations, we edit the annotations to fit the code. Of course, occasionally, the specific format of code may prevent defining annotations. In that case, an edit to the format (but not functionality) of the code is acceptable to allow defining annotations (the recent edits to some NamedTuple objects is one such example of this).

Of course, type checking may occasionally reveal an otherwise unknown bug in the code. And it would only be reasonable for those bugs to be fixed. However, the fix should always be framed as a fix to the now known bug, not a change to the code to match the type annotations. In other words, I would expect a test case which fails before the fix is applied and succeeds after regardless of any type annotations.
Of course, the type annotations should remain in sync with the code. I would expect some tooling to check that on the CI server. Presumably, we would add another linter like tool. The tool should never error or warn on missing annotations. The only task of the tool is to verify that the type annotations which exist agree with the code. If any fail, then a PR would not be accepted until an appropriate fix is made. Of course, an appropriate fix could mean the annotation needs to be updated, or it could mean the code needs to be updated. Is this an intentional change to the type of an object or did the type get changed inadvertently?

There are a few things we will need to do to meet the above goals.

Item 1 above requires no action expect to perhaps remove some proposed changes.

Item 2 is mostly complete. Thanks for the hard work of everyone involved.

Item 3 seems to be the problem. Originally I thought that mypy could meet our needs. After all, the page entitled Using mypy with an existing codebase in their documentation talks about how the tool can be used to progressively work through your code. However, on closer inspection, I see that the idea is to work through the code to gradually make it strongly typed. In fact, the suggestion is to do so before any annotations are added. That seems to suggest that perhaps mypy can only check annotations if the code is strongly typed. That is not a useful tool for our purpose. Unfortunately, I do not have any answers for how we can address this item. It occurs to me that it might not actually be possible to accomplish what I am proposing here.

oprypin · 2023-11-02T16:49:11Z

I'll want to sync this branch to latest if #1401 is merged, and then leave this for posterity. Not sure if I'll be able to do that if it's closed.

I did the merge, can be closed now :D

oprypin added 3 commits November 1, 2023 01:11

Fix all mypy typechecking errors, enable the check

b79d59d

Functions that don't have any type annotations aren't checked by mypy, they are skipped for now in this commit

Add a lot of type annotations and fixes

e743883

Add py.typed file

5db9b6f

waylan requested changes Nov 1, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into annot

905b46f

oprypin mentioned this pull request Nov 1, 2023

Language Server does not find extension attribute: Cannot access member "Meta" for type "Markdown" Member "Meta" is unknown #1383

Open

waylan added the rejected The pull request is rejected for the stated reasons. label Nov 1, 2023

oprypin added 3 commits November 2, 2023 17:37

Merge remote-tracking branch 'origin/master' into annot

de64ef2

Revert extension-only attributes

c9121d7

Fix

70676a3

waylan closed this Nov 2, 2023

waylan mentioned this pull request Nov 8, 2023

Full type annotations #1389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix all mypy typechecking errors, add a lot of type annotations #1399

Fix all mypy typechecking errors, add a lot of type annotations #1399

oprypin commented Nov 1, 2023

waylan left a comment

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

waylan Nov 1, 2023

oprypin Nov 1, 2023

oprypin commented Nov 1, 2023

oprypin commented Nov 1, 2023 •

edited

waylan commented Nov 1, 2023 •

edited

waylan commented Nov 1, 2023

oprypin commented Nov 1, 2023

oprypin commented Nov 1, 2023

waylan commented Nov 1, 2023

oprypin commented Nov 2, 2023 •

edited

facelessuser commented Nov 2, 2023

waylan commented Nov 2, 2023

oprypin commented Nov 2, 2023

Fix all mypy typechecking errors, add a lot of type annotations #1399

Fix all mypy typechecking errors, add a lot of type annotations #1399

Conversation

oprypin commented Nov 1, 2023

waylan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oprypin commented Nov 1, 2023

oprypin commented Nov 1, 2023 • edited

waylan commented Nov 1, 2023 • edited

waylan commented Nov 1, 2023

oprypin commented Nov 1, 2023

oprypin commented Nov 1, 2023

waylan commented Nov 1, 2023

oprypin commented Nov 2, 2023 • edited

facelessuser commented Nov 2, 2023

waylan commented Nov 2, 2023

oprypin commented Nov 2, 2023

oprypin commented Nov 1, 2023 •

edited

waylan commented Nov 1, 2023 •

edited

oprypin commented Nov 2, 2023 •

edited