New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type annotations and refactor to use TokenDict type #6497
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising! It would be beneficial to prefix the commit subjects with Auth:
.
e332726
to
0e4e829
Compare
c20301e
to
813327a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also like to see some rework on how the changes are split into commits, but it might be easier to talk about it in person.
6e6f766
to
30ec45a
Compare
5ce0642
to
ed076e7
Compare
lib/rucio/common/types.py
Outdated
@@ -13,8 +13,12 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
from uuid import UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you reintroducing UUID
? I thought we agreed that the correct type is str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it from the typeddict. Will only keep it where the tests fails, i.e.:
Found 2 new problems in /home/runner/work/rucio/rucio/lib/rucio/core/account_limit.py
- 1 errors with message """Expression of type "Unknown | UUID" cannot be assigned to declared type "str | None"""".
Candidates: line 293
- 1 errors with message """Argument of type "str | None" cannot be assigned to parameter "__key" of type "UUID" in function "get"
Type "str | None" cannot be assigned to type "UUID"
"str" is incompatible with "UUID"""".
Candidates: line 294
lib/rucio/common/utils.py
Outdated
def ssh_sign(private_key, message): | ||
def ssh_sign(private_key: str, message: Union[str, bytes]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see any current uses of this function where the message
argument is bytes
?
@@ -27,6 +27,7 @@ | |||
import rucio.core.rse | |||
from rucio.common import exception | |||
from rucio.common.config import config_get_bool | |||
from rucio.common.types import InternalAccount, AccountAttributesDict, AccountDict, AccountUsageModelDict, IdentityDict, UsageDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use isort
if you don’t want to do the work manually. 😄
signature = ssh_sign(PRIVATE_KEY, 'sign_something_else') | ||
signature = base64.b64decode(ssh_sign(PRIVATE_KEY, 'sign_something_else')) | ||
|
||
result = get_auth_token_ssh(account='root', signature=signature, appid='test', ip='127.0.0.1', vo=vo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not too fond of the situation we ended up in. ssh_sign()
converts bytes
to str
, which you have to convert back to bytes
to use get_auth_token_ssh()
.
rucio/lib/rucio/web/rest/flaskapi/v1/auth.py
Lines 1201 to 1229 in e85183f
# decode the signature which must come in base64 encoded | |
try: | |
signature += '=' * ((4 - len(signature) % 4) % 4) # adding required padding | |
signature = base64.b64decode(signature) | |
except TypeError: | |
return generate_http_error_flask( | |
status_code=401, | |
exc=CannotAuthenticate.__name__, | |
exc_msg=f'Cannot authenticate to account {account} with malformed signature', | |
headers=headers | |
) | |
try: | |
result = get_auth_token_ssh(account, signature, appid, ip, vo=vo) | |
except AccessDenied: | |
return generate_http_error_flask( | |
status_code=401, | |
exc=CannotAuthenticate.__name__, | |
exc_msg=f'Cannot authenticate to account {account} with given credentials', | |
headers=headers | |
) | |
if not result: | |
return generate_http_error_flask( | |
status_code=401, | |
exc=CannotAuthenticate.__name__, | |
exc_msg=f'Cannot authenticate to account {account} with given credentials', | |
headers=headers | |
) |
I wonder if it’s worth propagating the the payload as it is, as an str
, down to the core layer. There, inside get_auth_token_ssh()
, we put the decoding operation and raise let it raise a TypeError
if it fails. The code in the REST layer can then be simplified.
But probably this is going too far, so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in this follow-up PR: #6649
ed076e7
to
9870978
Compare
9870978
to
a1b8b4b
Compare
@dchristidis I've addressed all your most recent comments and rebased the PR into separate commits in a way that makes sense. I've also had to rebase some stuff from master, so "Compare" shows some of those changes as well, which might make it hard to review the diff. |
lib/rucio/common/utils.py
Outdated
from typing import TYPE_CHECKING, Callable, Optional | ||
from typing import Any, TYPE_CHECKING, Callable, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isort
is your friend!
return [row._asdict() for row in session.execute(query)] | ||
return [cast(IdentityDict, row._asdict()) for row in session.execute(query)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so much a comment for you, but more so for me: understand the necessity to use cast()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what occurs when I don't use cast
:
Found 5 new problems in /home/runner/work/rucio/rucio/lib/rucio/core/account.py
- 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[IdentityDict]"""".
Candidates: line 248
- 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[AccountAttributesDict]"""".
Candidates: line 278
- 1 errors with message """Expression of type "Dict[str, Any]" cannot be assigned to return type "UsageDict"
"Dict[str, Any]" is incompatible with "UsageDict"""".
Candidates: line 377
- 1 errors with message """Expression of type "list[dict[str, Any]]" cannot be assigned to return type "list[AccountUsageModelDict]"""".
Candidates: line 398
- 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[UsageDict]"""".
Candidates: line 424
See here for reference: https://github.com/rdimaio/rucio/actions/runs/8206991793/job/22447307300#step:9:10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Dict[str, Any]
hint must be coming from SQLAlchemy itself. But by using cast()
we effectively lose type checking:
$ cat --number test.py
1 #!/usr/bin/env python3
2
3 from typing import TypedDict, cast
4
5 class Struct(TypedDict):
6 foo: str
7 bar: int
8
9 def test1() -> Struct:
10 return {'foo': 'foo', 'bar': 0}
11
12 def test2() -> Struct:
13 return {'baz': None}
14
15 def test3() -> Struct:
16 return cast(Struct, {'baz': None})
$ mypy test.py
test.py:13: error: Missing keys ("foo", "bar") for TypedDict "Struct" [typeddict-item]
test.py:13: error: Extra key "baz" for TypedDict "Struct" [typeddict-unknown-key]
Found 2 errors in 1 file (checked 1 source file)
Does that mean that we need to avoid the use of _asdict()
and to_dict()
altogether? How annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using cast
, this seems to work fine:
UsageDict(**session.execute(query).one()._asdict())
see test commit for reference: rdimaio@cff7fa2
And pyright report: https://github.com/rdimaio/rucio/actions/runs/8233368875/job/22512678528
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid that I failed to verify that it will raise an error in case of a mismatch:
$ cat --number test.py
1 #!/usr/bin/env python3
2
3 from typing import TypedDict
4
5 from sqlalchemy import select
6 from sqlalchemy.orm import Session
7
8 from rucio.db.sqla.models import Identity
9 from rucio.db.sqla.session import read_session
10
11
12 # This is wrong.
13 class IdentityDict(TypedDict):
14 foo: str
15 bar: int
16
17
18 @read_session
19 def list_identities(*, session: Session) -> list[IdentityDict]:
20 query = select(
21 Identity.identity
22 )
23 return [IdentityDict(**result._asdict()) for result in session.execute(query)]
24
25 l = list_identities()
$ pyright test.py
0 errors, 0 warnings, 0 informations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6. Incompatible TypedDict, passing parameters directly, no extra parameters
#!/usr/bin/env python3
from typing import Any, TypedDict
class IncompatibleDict(TypedDict):
foo: str
bar: int
def print_dict() -> IncompatibleDict:
test_dict = {
"name": "test",
"version": 3
}
d = IncompatibleDict(foo=test_dict['name'], bar=test_dict["version"])
return(d)
l = print_dict()
Pyright doesn't complain
❯ python3 -m pyright lib/rucio/core/test.py ─╯
0 errors, 0 warnings, 0 informations
Mypy complains
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "foo" has type "str") [typeddict-item]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "bar" has type "int") [typeddict-item]
Found 2 errors in 1 file (checked 1 source file)
7. Incompatible TypedDict, passing parameters directly, extra parameters
#!/usr/bin/env python3
from typing import Any, TypedDict
class IncompatibleDict(TypedDict):
foo: str
bar: int
def print_dict() -> IncompatibleDict:
test_dict = {
"name": "test",
"version": 3
}
d = IncompatibleDict(foo=test_dict['name'], bar=test_dict["version"], extra="unexpected")
return(d)
l = print_dict()
Pyright complains
❯ python3 -m pyright lib/rucio/core/test.py ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:13:9 - error: No overloads for "__init__" match the provided arguments
Argument types: (Unknown, Unknown, Literal['unexpected']) (reportCallIssue)
1 error, 0 warnings, 0 informations
Mypy complains
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:13: error: Extra key "extra" for TypedDict "IncompatibleDict" [typeddict-unknown-key]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "foo" has type "str") [typeddict-item]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "bar" has type "int") [typeddict-item]
Found 3 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8. Incompatible Pydantic model, passing parameters directly, static type checking
#!/usr/bin/env python3
from pydantic import BaseModel, ConfigDict
class IncompatibleModel(BaseModel):
foo: int
bar: int
def print_class() -> IncompatibleModel:
test_dict = {
"name": "test",
"version": 3
}
return(IncompatibleModel(foo=test_dict['name'], bar=test_dict['version']))
l = print_class()
Pydantic complains here, as we saw earlier. From the point of view of the static type checkers:
Pyright doesn't complain
❯ python3 -m pyright lib/rucio/core/test.py ─╯
0 errors, 0 warnings, 0 informations
Mypy complains
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:13: error: Argument "foo" to "IncompatibleModel" has incompatible type "object"; expected "int" [arg-type]
lib/rucio/core/test.py:13: error: Argument "bar" to "IncompatibleModel" has incompatible type "object"; expected "int" [arg-type]
Found 2 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9. Incompatible dataclass, passing parameters directly
#!/usr/bin/env python3
from typing import Any, cast
from dataclasses import dataclass, field
@dataclass
class IncompatibleClass:
foo: int
bar: int
def print_class() -> IncompatibleClass:
test_dict = {
"name": "test",
"version": 3
}
d = IncompatibleClass(foo=test_dict['name'], bar=test_dict['version'])
return(d)
l = print_class()
Pyright doesn't complain
❯ python3 -m pyright lib/rucio/core/test.py ─╯
0 errors, 0 warnings, 0 informations
Mypy complains (ambiguously)
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:15: error: Argument "foo" to "IncompatibleClass" has incompatible type "object"; expected "int" [arg-type]
lib/rucio/core/test.py:15: error: Argument "bar" to "IncompatibleClass" has incompatible type "object"; expected "int" [arg-type]
Found 2 errors in 1 file (checked 1 source file)
Mypy complains even if I change foo
to str
- its problem seems to be that the type for foo
and bar
is not explicitly mentioned.
10. Incompatible dataclass, passing parameters directly
#!/usr/bin/env python3
from typing import Any, cast
from dataclasses import dataclass, field
@dataclass
class IncompatibleClass:
foo: int
bar: int
def print_class() -> IncompatibleClass:
name = "test"
version = 3
d = IncompatibleClass(foo=name, bar=version)
return(d)
l = print_class()
Pyright complains
❯ python3 -m pyright lib/rucio/core/test.py ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:13:31 - error: Argument of type "Literal['test']" cannot be assigned to parameter "foo" of type "int" in function "__init__"
"Literal['test']" is incompatible with "int" (reportArgumentType)
1 error, 0 warnings, 0 informations
Mypy complains
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:13: error: Argument "foo" to "IncompatibleClass" has incompatible type "str"; expected "int" [arg-type]
Found 1 error in 1 file (checked 1 source file)
11. Incompatible TypedDict, passing parameters directly, adding explicitly types for the input arguments
#!/usr/bin/env python3
from typing import Any, TypedDict
class IncompatibleDict(TypedDict):
foo: int
bar: int
def print_dict() -> IncompatibleDict:
name: str = "test"
version: int = 3
d = IncompatibleDict(foo=name, bar=version)
return(d)
l = print_dict()
Pyright complains
❯ python3 -m pyright lib/rucio/core/test.py ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:11:30 - error: Argument of type "Literal['test']" cannot be assigned to parameter "foo" of type "int" in function "__init__"
"Literal['test']" is incompatible with "int" (reportArgumentType)
1 error, 0 warnings, 0 informations
Mypy complains
❯ python3 -m mypy lib/rucio/core/test.py ─╯
lib/rucio/core/test.py:11: error: Incompatible types (expression has type "str", TypedDict item "foo" has type "int") [typeddict-item]
Found 1 error in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion
- Static type checking: mypy seems to be slightly stricter compared to pyright - I think it would be worth considering we move to mypy.
- Runtime type checking: I think this should be necessary to catch cases where static type checking would not be enough. I opened an issue to discuss this: Runtime type checking, Pydantic, SQLModel #6544
In the long term, I think the best approach would be to use mypy+pydantic and use pydantic models for struct typing. For now, it seems to be fine to use TypedDict
, passing arguments directly, instead of passing them as kwargs, and adding explicit type annotations for the input arguments, so that the static type checkers are able to notice any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant thread #6615
return {'bytes': 0, 'files': 0, 'updated_at': None} | ||
return UsageDict({'bytes': 0, 'files': 0, 'updated_at': None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied here: #6497 (comment)
from typing import TYPE_CHECKING | ||
from typing import cast, TYPE_CHECKING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you hate sorting? Why do you hate order? 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ isort --diff sort.py
--- /var/home/dchristidis/sort.py:before 2024-03-08 20:42:16.143365
+++ /var/home/dchristidis/sort.py:after 2024-03-08 20:42:28.548160
@@ -1 +1 @@
-from typing import cast, TYPE_CHECKING
+from typing import TYPE_CHECKING, cast
def get_local_account_usage(account, rse_id=None, *, session: "Session"): | ||
def get_local_account_usage(account: InternalAccount, rse_id: Optional[UUID] = None, *, session: "Session") -> list[RSELocalAccountUsageDict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that UUID
is wrong and I would like to get to the bottom of this.
lib/rucio/core/authentication.py
Outdated
from rucio.core.oidc import validate_jwt | ||
from rucio.core.oidc import validate_jwt, token_dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me, it should be the other way around: the token_dictionary()
in rucio.core.authentication
should be kept and the one in rucio.core.oidc
replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a circular import:
ImportError: cannot import name 'validate_jwt' from partially initialized module 'rucio.core.oidc' (most likely due to a circular import) (/opt/venv/lib64/python3.9/site-packages/rucio/core/oidc.py)
execution for reference: https://github.com/rucio/rucio/actions/runs/8207133900/job/22447782460?pr=6497
a1b8b4b
to
9a56b4d
Compare
There are no usages of ssh_sign where "message" is anything other than a string, and it is best if we always expect it to be a string. Follow-up from this PR: rucio#6497
Follow-up from this PR: rucio#6497
Sorry for leaving this unattended until now. I am breaking down this PR into smaller PRs to help move things forward. Each PR will be for a different aspect of this PR:
|
Follow-up from this PR: rucio#6497
There are no usages of ssh_sign where "message" is anything other than a string, and it is best if we always expect it to be a string. Follow-up from this PR: #6497
There are no usages of ssh_sign where "message" is anything other than a string, and it is best if we always expect it to be a string. Follow-up from this PR: #6497
#6454 - Adding various type annotations, and replacing the method:
with a specific datatype
TokenDict
in order to be more specific with the return types.