Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psycopg analysis #3051

Closed
wants to merge 1 commit into from
Closed

Psycopg analysis #3051

wants to merge 1 commit into from

Conversation

erictraut
Copy link
Collaborator

Documented analysis of psycopg library and recommendations for changes in pyright. Posting as a PR to encourage feedback in the form of PR comments.

This PR won't be merged into the main branch. Its branch will be deleted after the discussion is complete and we settle on a final plan.

…s in pyright. Posting as a PR to encourage feedback in the form of PR comments.
Copy link

@Akuli Akuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this overall! Just one nit.


Today, Pyright treats unannotated symbols in a "py.typed" library as "Unknown" if type checking is enabled. When typeCheckingMode is "off", it falls back to its inference logic. This effectively "punishes" users who want to use Pyright and Pylance for static type checking. I recommend that we change this behavior and always fall back on type inference logic when annotations are not present. This change will unfortunately reduce the visibility of missing annotations, so I worry that it will slow efforts to improve type completeness and consistency across the Python ecosystem, but I think it represents a pragmatic tradeoff.

As a mitigation for this lack of visibility, we might want to modify Pyright and Pylance to display "ambiguous types" in a way that differentiates them from "unambiguous types". For example, we could prepend a `~` character to indicate that the type originated from a "py.typed" library and was inferred in a way that might be ambiguous.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that most users will care about consistency so much that a prefix is useful. For example, mypy prepends a * prefix in front of inferred types, and nobody seems to find it useful: python/mypy#10076

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one place where it can matter. Should type errors be reported for ambiguous types? A fair amount of false positives get produced if you let pyright infer types on tensorflow mostly because of it's dynamic magic. A common basic one x + y is a type error for tensors since math operations are patched on.

It'd be ideal for me if ambiguous types had hover/suggestions but did not trigger errors. Although if it's possible to disable this inference with useLibraryCodeForTypes I could also just continue to set that to False.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern goes away given this is restricted to py.typed. I would expect a library that claims to be typed to have minimal false positives from ambiguous errors or be willing to improve them at a reasonable pace. If a library has a lot dynamic magic in public api and doesn't type hint/stub it appropriately then I think it's misclaiming to be "py.typed".

Comment on lines +151 to +152

Today, Pyright treats unannotated symbols in a "py.typed" library as "Unknown" if type checking is enabled. When typeCheckingMode is "off", it falls back to its inference logic. This effectively "punishes" users who want to use Pyright and Pylance for static type checking. I recommend that we change this behavior and always fall back on type inference logic when annotations are not present. This change will unfortunately reduce the visibility of missing annotations, so I worry that it will slow efforts to improve type completeness and consistency across the Python ecosystem, but I think it represents a pragmatic tradeoff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this continue to be controlled by, useLibraryCodeForTypes? I have that set to False as inferred types for tensorflow produce a good number of false positives that they've caused a fair amount of annoyance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document is specifically talking about "py.typed" libraries. The useLibraryCodeForTypes setting does not apply to "py.typed" libraries. Since tensorflow is not currently "py.typed", useLibraryCodeForTypes would still apply.

Comment on lines +5 to +7
The Python package `psycopg` is a "py.typed" package with inlined type annotations.

Users of Pylance reported that they were not receiving completion suggestions for certain symbols imported from psycopg. Running "pyright --verifytypes" on the library revealed that 349 of 1472 public symbols exported from the library were lacking type annotations and were therefore treated by Pyright’s logic as an `Unknown` type.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many things are not specific to psycopg; e.g., while trying to run pyright --verifytypes on click (tried quickly, because this raises a lot or "errors"), I found a lot of similar issues.

Which makes me wonder if it'd be worth to have a "primer" similar to https://github.com/hauntsaninja/mypy_primer for this verifytypes?

(Note: I'm not a user of pyright myself, mostly coming from psycopg perspective.)

Copy link

@Bibo-Joshi Bibo-Joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @erictraut for taking the time to write this up!

I agree with the main points of your analysis except for a few questions that I left below. However, I'm not sure what your intended goal of this discussion is. In the Discussion and Recommendations paragraphs, you mainly address Pyright and Pylance and changes to be made in these. But this thread was mainly triggered by python/typing#1058 (reply in thread), where the punchline is

[…] I would prefer that your tools [Pyright] "features" don't become a spec.

So what I had hoped and expected after your comment python/typing#1058 (reply in thread) was a proposal on how to adjust the guide at https://typing.readthedocs.io/en/latest/source/libraries.html.

IMO the major points of both your analysis and the thread at python/typing#1058 are

  • there are some situations where types can be inferred with little risk of ambiguity
  • ambiguous type hints are often more helpful than no type hints at all
  • rules for type inference are not standardized

Putting these all together, I guide on what a py.typed lib could IMO look somewhat like this

  1. Definition of what the public interface is (as is currently)
  2. Definition on how to unambiguously annotate the public interface (as currently)
  3. Explanation of ambiguously types and their limitations. This should include the punshlines
    1. A type checker should aim to infer a reasonable type hint even for not annotated symbols. So if the inference is "reasonably unambiguous" (to be judged by the libs author), one can omit annotations
    2. examples of cases where type inference is likely to be unambiguous (i.e. the cases analysed in the thread)
    3. Explanation that type inference is not standardized and hence this has some culprits, including inconsistencey across different checkers as well as speed

Additionally, a thought that I had while reading your analyis was: Why is type inference not standardized? TBH, I'm little involved in the development of Python itself and I can only assume that this just hasn't developed yet. If there has been discussion on this and standardizing type inteference was rejected for specific reasons, I'd be happy to hear those.

In case standardizing type inference is something one could aim for in the future (would probably require a PEP?), I had the idea of a "benchmark"/set of test cases. This would be a collection of code snippets accompanied by the types that a type checker should infer from those snippets. A type checker could then include this benchmark in the unit tests.
Such a benchmark would surely have to evolve over time, given that the typing system is still evolving. It also wouldn't have to be complete from the beginning, but special cases could be added step by step. A type checker could e.g. say "compatible with type inference standard X.Y".

What are your thoughts on this idea?

Comment on lines +83 to +87
- an integer literal token, optionally preceded by a `-` token (inferred to be `int`)
- a float literal token, optionally preceded by a `-` token (inferred to be `float`)
- a str literal token (inferred to be `str`)
- a bytes literal token (inferred to be `bytes`)
- False or True token (inferred to be `bool`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should Literal be handled in this context? I'm asking, because this already came up in python/typing#913 (comment).

I would agree that with self.name = "Bob", the inferred type should be str instead of Literal["Bob"] and that for the latter case it should be explicitly annotated as self.name: Literal["Bob"] = "Bob". If this is the canonical convention, then this paragraph should probably mention it.

One can even extend the question to Final and ClassVar, both of which I personally would handle analogously

It is probably safe to assume that all Python type checkers will infer the same types for this pattern if the following conditions are met:

1. Only one assignment is made to the variable and the assignment is not conditional (e.g. not within an `if` or `try` statement).
2. The RHS of the assignment is one of the literal expression forms mentioned in the previous section.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

Comment on lines +104 to +111
### Other cases

The cases discussed above cover all but 49 of the "symbols with unknown types" in `psycopg`. The remainder can be categorized as follows.

Instance Variables

- (1) initialized with enum value
- (1) initialized with member access expression that accesses a property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assuge that you don't intend to handle these cases differently than done currently?

class Foo:
    enum_value = SomeEnum.MEMBER
    property_value = SomeClass().some_property

Here I would argue that enum can safely be inferred to be SomeEnum and if SomeClass.some_property has an explicit type annotation, then property_value can safely be inferedd to be of that type. The second case is ofc less clear than the first one.

@erictraut
Copy link
Collaborator Author

@Bibo-Joshi, I don't think it's feasible to standardize type inference rules, nor do I think it's wise to attempt. As I mention in the doc, significant innovation has occurred with respect to type inference, and we're likely to see additional innovation over time. Also, type inference logic reflects the philosophies and design priorities of individual type checkers. Some type checkers value "looseness" (fewer false positives) over "strictness" (catching all potential type violations at the expense of some false positives). You can see from my document how difficult it would be to agree on the type inference rules for even the most basic of circumstances. Specifying the "correct" inference behavior in every case would take years of effort.

If the intent of such an effort is to eliminate type ambiguity, I would argue that it would be a wasted effort because we already have a well-defined way to eliminate such ambiguity: type annotations.

@erictraut
Copy link
Collaborator Author

Thanks for all the input. I've implemented the recommendations with a few small variations based on input. I'm closing this PR because it's no longer necessary.

@erictraut erictraut closed this Feb 26, 2022
@erictraut erictraut deleted the psycopg_analysis branch June 12, 2023 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants