Psycopg analysis #3051

erictraut · 2022-02-16T01:51:18Z

Documented analysis of psycopg library and recommendations for changes in pyright. Posting as a PR to encourage feedback in the form of PR comments.

This PR won't be merged into the main branch. Its branch will be deleted after the discussion is complete and we settle on a final plan.

…s in pyright. Posting as a PR to encourage feedback in the form of PR comments.

Akuli

I like this overall! Just one nit.

Akuli · 2022-02-16T11:15:56Z

specs/psycopg_analysis.md

+
+Today, Pyright treats unannotated symbols in a "py.typed" library as "Unknown" if type checking is enabled. When typeCheckingMode is "off", it falls back to its inference logic. This effectively "punishes" users who want to use Pyright and Pylance for static type checking. I recommend that we change this behavior and always fall back on type inference logic when annotations are not present. This change will unfortunately reduce the visibility of missing annotations, so I worry that it will slow efforts to improve type completeness and consistency across the Python ecosystem, but I think it represents a pragmatic tradeoff.
+
+As a mitigation for this lack of visibility, we might want to modify Pyright and Pylance to display "ambiguous types" in a way that differentiates them from "unambiguous types". For example, we could prepend a `~` character to indicate that the type originated from a "py.typed" library and was inferred in a way that might be ambiguous.


I'm not convinced that most users will care about consistency so much that a prefix is useful. For example, mypy prepends a * prefix in front of inferred types, and nobody seems to find it useful: python/mypy#10076

I see one place where it can matter. Should type errors be reported for ambiguous types? A fair amount of false positives get produced if you let pyright infer types on tensorflow mostly because of it's dynamic magic. A common basic one x + y is a type error for tensors since math operations are patched on.

It'd be ideal for me if ambiguous types had hover/suggestions but did not trigger errors. Although if it's possible to disable this inference with useLibraryCodeForTypes I could also just continue to set that to False.

My concern goes away given this is restricted to py.typed. I would expect a library that claims to be typed to have minimal false positives from ambiguous errors or be willing to improve them at a reasonable pace. If a library has a lot dynamic magic in public api and doesn't type hint/stub it appropriately then I think it's misclaiming to be "py.typed".

hmc-cs-mdrissi · 2022-02-17T04:54:37Z

specs/psycopg_analysis.md

+
+Today, Pyright treats unannotated symbols in a "py.typed" library as "Unknown" if type checking is enabled. When typeCheckingMode is "off", it falls back to its inference logic. This effectively "punishes" users who want to use Pyright and Pylance for static type checking. I recommend that we change this behavior and always fall back on type inference logic when annotations are not present. This change will unfortunately reduce the visibility of missing annotations, so I worry that it will slow efforts to improve type completeness and consistency across the Python ecosystem, but I think it represents a pragmatic tradeoff.


Will this continue to be controlled by, useLibraryCodeForTypes? I have that set to False as inferred types for tensorflow produce a good number of false positives that they've caused a fair amount of annoyance.

This document is specifically talking about "py.typed" libraries. The useLibraryCodeForTypes setting does not apply to "py.typed" libraries. Since tensorflow is not currently "py.typed", useLibraryCodeForTypes would still apply.

dlax · 2022-02-18T10:04:41Z

specs/psycopg_analysis.md

+The Python package `psycopg` is a "py.typed" package with inlined type annotations.
+
+Users of Pylance reported that they were not receiving completion suggestions for certain symbols imported from psycopg. Running "pyright --verifytypes" on the library revealed that 349 of 1472 public symbols exported from the library were lacking type annotations and were therefore treated by Pyright’s logic as an `Unknown` type.


Many things are not specific to psycopg; e.g., while trying to run pyright --verifytypes on click (tried quickly, because this raises a lot or "errors"), I found a lot of similar issues.

Which makes me wonder if it'd be worth to have a "primer" similar to https://github.com/hauntsaninja/mypy_primer for this verifytypes?

(Note: I'm not a user of pyright myself, mostly coming from psycopg perspective.)

Bibo-Joshi

Thank you very much @erictraut for taking the time to write this up!

I agree with the main points of your analysis except for a few questions that I left below. However, I'm not sure what your intended goal of this discussion is. In the Discussion and Recommendations paragraphs, you mainly address Pyright and Pylance and changes to be made in these. But this thread was mainly triggered by python/typing#1058 (reply in thread), where the punchline is

[…] I would prefer that your tools [Pyright] "features" don't become a spec.

So what I had hoped and expected after your comment python/typing#1058 (reply in thread) was a proposal on how to adjust the guide at https://typing.readthedocs.io/en/latest/source/libraries.html.

IMO the major points of both your analysis and the thread at python/typing#1058 are

there are some situations where types can be inferred with little risk of ambiguity
ambiguous type hints are often more helpful than no type hints at all
rules for type inference are not standardized

Putting these all together, I guide on what a py.typed lib could IMO look somewhat like this

Definition of what the public interface is (as is currently)
Definition on how to unambiguously annotate the public interface (as currently)
Explanation of ambiguously types and their limitations. This should include the punshlines
1. A type checker should aim to infer a reasonable type hint even for not annotated symbols. So if the inference is "reasonably unambiguous" (to be judged by the libs author), one can omit annotations
2. examples of cases where type inference is likely to be unambiguous (i.e. the cases analysed in the thread)
3. Explanation that type inference is not standardized and hence this has some culprits, including inconsistencey across different checkers as well as speed

Additionally, a thought that I had while reading your analyis was: Why is type inference not standardized? TBH, I'm little involved in the development of Python itself and I can only assume that this just hasn't developed yet. If there has been discussion on this and standardizing type inteference was rejected for specific reasons, I'd be happy to hear those.

In case standardizing type inference is something one could aim for in the future (would probably require a PEP?), I had the idea of a "benchmark"/set of test cases. This would be a collection of code snippets accompanied by the types that a type checker should infer from those snippets. A type checker could then include this benchmark in the unit tests.
Such a benchmark would surely have to evolve over time, given that the typing system is still evolving. It also wouldn't have to be complete from the beginning, but special cases could be added step by step. A type checker could e.g. say "compatible with type inference standard X.Y".

What are your thoughts on this idea?

Bibo-Joshi · 2022-02-19T12:27:21Z

specs/psycopg_analysis.md

+- an integer literal token, optionally preceded by a `-` token (inferred to be `int`)
+- a float literal token, optionally preceded by a `-` token (inferred to be `float`)
+- a str literal token (inferred to be `str`)
+- a bytes literal token (inferred to be `bytes`)
+- False or True token (inferred to be `bool`)


How should Literal be handled in this context? I'm asking, because this already came up in python/typing#913 (comment).

I would agree that with self.name = "Bob", the inferred type should be str instead of Literal["Bob"] and that for the latter case it should be explicitly annotated as self.name: Literal["Bob"] = "Bob". If this is the canonical convention, then this paragraph should probably mention it.

One can even extend the question to Final and ClassVar, both of which I personally would handle analogously

Bibo-Joshi · 2022-02-19T12:27:48Z

specs/psycopg_analysis.md

+It is probably safe to assume that all Python type checkers will infer the same types for this pattern if the following conditions are met:
+
+1. Only one assignment is made to the variable and the assignment is not conditional (e.g. not within an `if` or `try` statement).
+2. The RHS of the assignment is one of the literal expression forms mentioned in the previous section.


same question as above

Bibo-Joshi · 2022-02-19T12:42:14Z

specs/psycopg_analysis.md

+### Other cases
+
+The cases discussed above cover all but 49 of the "symbols with unknown types" in `psycopg`. The remainder can be categorized as follows.
+
+Instance Variables
+
+- (1) initialized with enum value
+- (1) initialized with member access expression that accesses a property


I assuge that you don't intend to handle these cases differently than done currently?

class Foo: enum_value = SomeEnum.MEMBER property_value = SomeClass().some_property

Here I would argue that enum can safely be inferred to be SomeEnum and if SomeClass.some_property has an explicit type annotation, then property_value can safely be inferedd to be of that type. The second case is ofc less clear than the first one.

erictraut · 2022-02-19T13:23:44Z

@Bibo-Joshi, I don't think it's feasible to standardize type inference rules, nor do I think it's wise to attempt. As I mention in the doc, significant innovation has occurred with respect to type inference, and we're likely to see additional innovation over time. Also, type inference logic reflects the philosophies and design priorities of individual type checkers. Some type checkers value "looseness" (fewer false positives) over "strictness" (catching all potential type violations at the expense of some false positives). You can see from my document how difficult it would be to agree on the type inference rules for even the most basic of circumstances. Specifying the "correct" inference behavior in every case would take years of effort.

If the intent of such an effort is to eliminate type ambiguity, I would argue that it would be a wasted effort because we already have a well-defined way to eliminate such ambiguity: type annotations.

erictraut · 2022-02-26T15:37:43Z

Thanks for all the input. I've implemented the recommendations with a few small variations based on input. I'm closing this PR because it's no longer necessary.

Documented analysis of psycopg library and recommendations for change…

d4a5ea9

…s in pyright. Posting as a PR to encourage feedback in the form of PR comments.

Akuli reviewed Feb 16, 2022

View reviewed changes

hmc-cs-mdrissi reviewed Feb 17, 2022

View reviewed changes

dlax reviewed Feb 18, 2022

View reviewed changes

Bibo-Joshi reviewed Feb 19, 2022

View reviewed changes

erictraut closed this Feb 26, 2022

erictraut deleted the psycopg_analysis branch June 12, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Psycopg analysis #3051

Psycopg analysis #3051

erictraut commented Feb 16, 2022

Akuli left a comment

Akuli Feb 16, 2022

hmc-cs-mdrissi Feb 17, 2022 •

edited

hmc-cs-mdrissi Feb 17, 2022

hmc-cs-mdrissi Feb 17, 2022

erictraut Feb 17, 2022

dlax Feb 18, 2022

Bibo-Joshi left a comment •

edited

Bibo-Joshi Feb 19, 2022

Bibo-Joshi Feb 19, 2022

Bibo-Joshi Feb 19, 2022

erictraut commented Feb 19, 2022

erictraut commented Feb 26, 2022


		Today, Pyright treats unannotated symbols in a "py.typed" library as "Unknown" if type checking is enabled. When typeCheckingMode is "off", it falls back to its inference logic. This effectively "punishes" users who want to use Pyright and Pylance for static type checking. I recommend that we change this behavior and always fall back on type inference logic when annotations are not present. This change will unfortunately reduce the visibility of missing annotations, so I worry that it will slow efforts to improve type completeness and consistency across the Python ecosystem, but I think it represents a pragmatic tradeoff.

		As a mitigation for this lack of visibility, we might want to modify Pyright and Pylance to display "ambiguous types" in a way that differentiates them from "unambiguous types". For example, we could prepend a `~` character to indicate that the type originated from a "py.typed" library and was inferred in a way that might be ambiguous.

		The Python package `psycopg` is a "py.typed" package with inlined type annotations.

		Users of Pylance reported that they were not receiving completion suggestions for certain symbols imported from psycopg. Running "pyright --verifytypes" on the library revealed that 349 of 1472 public symbols exported from the library were lacking type annotations and were therefore treated by Pyright’s logic as an `Unknown` type.

Psycopg analysis #3051

Psycopg analysis #3051

Conversation

erictraut commented Feb 16, 2022

Akuli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmc-cs-mdrissi Feb 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bibo-Joshi left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erictraut commented Feb 19, 2022

erictraut commented Feb 26, 2022

hmc-cs-mdrissi Feb 17, 2022 •

edited

Bibo-Joshi left a comment •

edited