Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations and refactor to use TokenDict type #6497

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rdimaio
Copy link
Contributor

@rdimaio rdimaio commented Feb 15, 2024

#6454 - Adding various type annotations, and replacing the method:

def token_dictionary(token: models.Token):
    return {'token': token.token, 'expires_at': token.expired_at}

with a specific datatype TokenDict in order to be more specific with the return types.

Copy link
Contributor

@dchristidis dchristidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! It would be beneficial to prefix the commit subjects with Auth: .

lib/rucio/api/authentication.py Outdated Show resolved Hide resolved
lib/rucio/common/types.py Outdated Show resolved Hide resolved
lib/rucio/common/utils.py Outdated Show resolved Hide resolved
lib/rucio/core/account.py Outdated Show resolved Hide resolved
lib/rucio/core/account.py Outdated Show resolved Hide resolved
lib/rucio/core/account_counter.py Outdated Show resolved Hide resolved
lib/rucio/core/account_counter.py Outdated Show resolved Hide resolved
lib/rucio/core/authentication.py Outdated Show resolved Hide resolved
lib/rucio/core/authentication.py Outdated Show resolved Hide resolved
lib/rucio/core/oidc.py Outdated Show resolved Hide resolved
@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch 5 times, most recently from e332726 to 0e4e829 Compare February 19, 2024 18:37
@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch 3 times, most recently from c20301e to 813327a Compare February 22, 2024 14:11
Copy link
Contributor

@dchristidis dchristidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also like to see some rework on how the changes are split into commits, but it might be easier to talk about it in person.

lib/rucio/core/account.py Outdated Show resolved Hide resolved
lib/rucio/core/account.py Outdated Show resolved Hide resolved
lib/rucio/core/account_counter.py Outdated Show resolved Hide resolved
lib/rucio/core/account_limit.py Outdated Show resolved Hide resolved
lib/rucio/core/authentication.py Outdated Show resolved Hide resolved
lib/rucio/core/authentication.py Outdated Show resolved Hide resolved
lib/rucio/core/oidc.py Outdated Show resolved Hide resolved
@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch 3 times, most recently from 6e6f766 to 30ec45a Compare February 28, 2024 15:11
@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch 2 times, most recently from 5ce0642 to ed076e7 Compare February 28, 2024 15:36
@rdimaio rdimaio requested a review from dchristidis March 5, 2024 13:32
@@ -13,8 +13,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from uuid import UUID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you reintroducing UUID? I thought we agreed that the correct type is str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it from the typeddict. Will only keep it where the tests fails, i.e.:

Found 2 new problems in /home/runner/work/rucio/rucio/lib/rucio/core/account_limit.py
  - 1 errors with message """Expression of type "Unknown | UUID" cannot be assigned to declared type "str | None"""".
    Candidates: line 293
  - 1 errors with message """Argument of type "str | None" cannot be assigned to parameter "__key" of type "UUID" in function "get"
                               Type "str | None" cannot be assigned to type "UUID"
                                 "str" is incompatible with "UUID"""".
    Candidates: line 294

def ssh_sign(private_key, message):
def ssh_sign(private_key: str, message: Union[str, bytes]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any current uses of this function where the message argument is bytes?

@@ -27,6 +27,7 @@
import rucio.core.rse
from rucio.common import exception
from rucio.common.config import config_get_bool
from rucio.common.types import InternalAccount, AccountAttributesDict, AccountDict, AccountUsageModelDict, IdentityDict, UsageDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use isort if you don’t want to do the work manually. 😄

Comment on lines -149 to 148
signature = ssh_sign(PRIVATE_KEY, 'sign_something_else')
signature = base64.b64decode(ssh_sign(PRIVATE_KEY, 'sign_something_else'))

result = get_auth_token_ssh(account='root', signature=signature, appid='test', ip='127.0.0.1', vo=vo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not too fond of the situation we ended up in. ssh_sign() converts bytes to str, which you have to convert back to bytes to use get_auth_token_ssh().

# decode the signature which must come in base64 encoded
try:
signature += '=' * ((4 - len(signature) % 4) % 4) # adding required padding
signature = base64.b64decode(signature)
except TypeError:
return generate_http_error_flask(
status_code=401,
exc=CannotAuthenticate.__name__,
exc_msg=f'Cannot authenticate to account {account} with malformed signature',
headers=headers
)
try:
result = get_auth_token_ssh(account, signature, appid, ip, vo=vo)
except AccessDenied:
return generate_http_error_flask(
status_code=401,
exc=CannotAuthenticate.__name__,
exc_msg=f'Cannot authenticate to account {account} with given credentials',
headers=headers
)
if not result:
return generate_http_error_flask(
status_code=401,
exc=CannotAuthenticate.__name__,
exc_msg=f'Cannot authenticate to account {account} with given credentials',
headers=headers
)

I wonder if it’s worth propagating the the payload as it is, as an str, down to the core layer. There, inside get_auth_token_ssh(), we put the decoding operation and raise let it raise a TypeError if it fails. The code in the REST layer can then be simplified.

But probably this is going too far, so feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in this follow-up PR: #6649

@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch from ed076e7 to 9870978 Compare March 7, 2024 17:48
@rdimaio
Copy link
Contributor Author

rdimaio commented Mar 7, 2024

@dchristidis I've addressed all your most recent comments and rebased the PR into separate commits in a way that makes sense.

I've also had to rebase some stuff from master, so "Compare" shows some of those changes as well, which might make it hard to review the diff.

from typing import TYPE_CHECKING, Callable, Optional
from typing import Any, TYPE_CHECKING, Callable, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isort is your friend!

Comment on lines -247 to +248
return [row._asdict() for row in session.execute(query)]
return [cast(IdentityDict, row._asdict()) for row in session.execute(query)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much a comment for you, but more so for me: understand the necessity to use cast().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what occurs when I don't use cast:

Found 5 new problems in /home/runner/work/rucio/rucio/lib/rucio/core/account.py
  - 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[IdentityDict]"""".
    Candidates: line 248
  - 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[AccountAttributesDict]"""".
    Candidates: line 278
  - 1 errors with message """Expression of type "Dict[str, Any]" cannot be assigned to return type "UsageDict"
                               "Dict[str, Any]" is incompatible with "UsageDict"""".
    Candidates: line 377
  - 1 errors with message """Expression of type "list[dict[str, Any]]" cannot be assigned to return type "list[AccountUsageModelDict]"""".
    Candidates: line 398
  - 1 errors with message """Expression of type "list[Dict[str, Any]]" cannot be assigned to return type "list[UsageDict]"""".
    Candidates: line 424

See here for reference: https://github.com/rdimaio/rucio/actions/runs/8206991793/job/22447307300#step:9:10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Dict[str, Any] hint must be coming from SQLAlchemy itself. But by using cast() we effectively lose type checking:

$ cat --number test.py
     1	#!/usr/bin/env python3
     2	
     3	from typing import TypedDict, cast
     4	
     5	class Struct(TypedDict):
     6	    foo: str
     7	    bar: int
     8	
     9	def test1() -> Struct:
    10	    return {'foo': 'foo', 'bar': 0}
    11	
    12	def test2() -> Struct:
    13	    return {'baz': None}
    14	
    15	def test3() -> Struct:
    16	    return cast(Struct, {'baz': None})
$ mypy test.py 
test.py:13: error: Missing keys ("foo", "bar") for TypedDict "Struct"  [typeddict-item]
test.py:13: error: Extra key "baz" for TypedDict "Struct"  [typeddict-unknown-key]
Found 2 errors in 1 file (checked 1 source file)

Does that mean that we need to avoid the use of _asdict() and to_dict() altogether? How annoying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using cast, this seems to work fine:

UsageDict(**session.execute(query).one()._asdict())

see test commit for reference: rdimaio@cff7fa2

And pyright report: https://github.com/rdimaio/rucio/actions/runs/8233368875/job/22512678528

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m afraid that I failed to verify that it will raise an error in case of a mismatch:

$ cat --number test.py 
     1	#!/usr/bin/env python3
     2	
     3	from typing import TypedDict
     4	
     5	from sqlalchemy import select
     6	from sqlalchemy.orm import Session
     7	
     8	from rucio.db.sqla.models import Identity
     9	from rucio.db.sqla.session import read_session
    10	
    11	
    12	# This is wrong.
    13	class IdentityDict(TypedDict):
    14	    foo: str
    15	    bar: int
    16	
    17	
    18	@read_session
    19	def list_identities(*, session: Session) -> list[IdentityDict]:
    20	    query = select(
    21	        Identity.identity
    22	    )
    23	    return [IdentityDict(**result._asdict()) for result in session.execute(query)]
    24	
    25	l = list_identities()
$ pyright test.py 
0 errors, 0 warnings, 0 informations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6. Incompatible TypedDict, passing parameters directly, no extra parameters

#!/usr/bin/env python3
from typing import Any, TypedDict

class IncompatibleDict(TypedDict):
    foo: str
    bar: int

def print_dict() -> IncompatibleDict:
    test_dict = {
        "name": "test",
        "version": 3
    }
    d = IncompatibleDict(foo=test_dict['name'], bar=test_dict["version"])
    return(d)


l = print_dict()

Pyright doesn't complain

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
0 errors, 0 warnings, 0 informations 

Mypy complains

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "foo" has type "str")  [typeddict-item]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "bar" has type "int")  [typeddict-item]
Found 2 errors in 1 file (checked 1 source file)

7. Incompatible TypedDict, passing parameters directly, extra parameters

#!/usr/bin/env python3
from typing import Any, TypedDict

class IncompatibleDict(TypedDict):
    foo: str
    bar: int

def print_dict() -> IncompatibleDict:
    test_dict = {
        "name": "test",
        "version": 3
    }
    d = IncompatibleDict(foo=test_dict['name'], bar=test_dict["version"], extra="unexpected")
    return(d)


l = print_dict()

Pyright complains

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
  /home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:13:9 - error: No overloads for "__init__" match the provided arguments
    Argument types: (Unknown, Unknown, Literal['unexpected']) (reportCallIssue)
1 error, 0 warnings, 0 informations 

Mypy complains

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:13: error: Extra key "extra" for TypedDict "IncompatibleDict"  [typeddict-unknown-key]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "foo" has type "str")  [typeddict-item]
lib/rucio/core/test.py:13: error: Incompatible types (expression has type "object", TypedDict item "bar" has type "int")  [typeddict-item]
Found 3 errors in 1 file (checked 1 source file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8. Incompatible Pydantic model, passing parameters directly, static type checking

#!/usr/bin/env python3
from pydantic import BaseModel, ConfigDict

class IncompatibleModel(BaseModel):
    foo: int
    bar: int

def print_class() -> IncompatibleModel:
    test_dict = {
        "name": "test",
        "version": 3
    }
    return(IncompatibleModel(foo=test_dict['name'], bar=test_dict['version']))

l = print_class()

Pydantic complains here, as we saw earlier. From the point of view of the static type checkers:

Pyright doesn't complain

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
0 errors, 0 warnings, 0 informations 

Mypy complains

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:13: error: Argument "foo" to "IncompatibleModel" has incompatible type "object"; expected "int"  [arg-type]
lib/rucio/core/test.py:13: error: Argument "bar" to "IncompatibleModel" has incompatible type "object"; expected "int"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9. Incompatible dataclass, passing parameters directly

#!/usr/bin/env python3
from typing import Any, cast
from dataclasses import dataclass, field

@dataclass
class IncompatibleClass:
    foo: int
    bar: int

def print_class() -> IncompatibleClass:
    test_dict = {
        "name": "test",
        "version": 3
    }
    d = IncompatibleClass(foo=test_dict['name'], bar=test_dict['version'])
    return(d)


l = print_class()

Pyright doesn't complain

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
0 errors, 0 warnings, 0 informations 

Mypy complains (ambiguously)

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:15: error: Argument "foo" to "IncompatibleClass" has incompatible type "object"; expected "int"  [arg-type]
lib/rucio/core/test.py:15: error: Argument "bar" to "IncompatibleClass" has incompatible type "object"; expected "int"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)

Mypy complains even if I change foo to str - its problem seems to be that the type for foo and bar is not explicitly mentioned.

10. Incompatible dataclass, passing parameters directly

#!/usr/bin/env python3
from typing import Any, cast
from dataclasses import dataclass, field

@dataclass
class IncompatibleClass:
    foo: int
    bar: int

def print_class() -> IncompatibleClass:
    name = "test"
    version = 3
    d = IncompatibleClass(foo=name, bar=version)
    return(d)

l = print_class()

Pyright complains

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
  /home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:13:31 - error: Argument of type "Literal['test']" cannot be assigned to parameter "foo" of type "int" in function "__init__"
    "Literal['test']" is incompatible with "int" (reportArgumentType)
1 error, 0 warnings, 0 informations 

Mypy complains

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:13: error: Argument "foo" to "IncompatibleClass" has incompatible type "str"; expected "int"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

11. Incompatible TypedDict, passing parameters directly, adding explicitly types for the input arguments

#!/usr/bin/env python3
from typing import Any, TypedDict

class IncompatibleDict(TypedDict):
    foo: int
    bar: int

def print_dict() -> IncompatibleDict:
    name: str = "test"
    version: int = 3
    d = IncompatibleDict(foo=name, bar=version)
    return(d)


l = print_dict()

Pyright complains

❯ python3 -m pyright lib/rucio/core/test.py                                                                                                       ─╯
/home/rdm/tech/rucio/rucio/lib/rucio/core/test.py
  /home/rdm/tech/rucio/rucio/lib/rucio/core/test.py:11:30 - error: Argument of type "Literal['test']" cannot be assigned to parameter "foo" of type "int" in function "__init__"
    "Literal['test']" is incompatible with "int" (reportArgumentType)
1 error, 0 warnings, 0 informations 

Mypy complains

❯ python3 -m mypy lib/rucio/core/test.py                                                                                                          ─╯
lib/rucio/core/test.py:11: error: Incompatible types (expression has type "str", TypedDict item "foo" has type "int")  [typeddict-item]
Found 1 error in 1 file (checked 1 source file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion

  • Static type checking: mypy seems to be slightly stricter compared to pyright - I think it would be worth considering we move to mypy.
  • Runtime type checking: I think this should be necessary to catch cases where static type checking would not be enough. I opened an issue to discuss this: Runtime type checking, Pydantic, SQLModel #6544

In the long term, I think the best approach would be to use mypy+pydantic and use pydantic models for struct typing. For now, it seems to be fine to use TypedDict, passing arguments directly, instead of passing them as kwargs, and adding explicit type annotations for the input arguments, so that the static type checkers are able to notice any issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant thread #6615

Comment on lines -378 to +379
return {'bytes': 0, 'files': 0, 'updated_at': None}
return UsageDict({'bytes': 0, 'files': 0, 'updated_at': None})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied here: #6497 (comment)

Comment on lines -16 to +17
from typing import TYPE_CHECKING
from typing import cast, TYPE_CHECKING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you hate sorting? Why do you hate order? 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ isort --diff sort.py 
--- /var/home/dchristidis/sort.py:before	2024-03-08 20:42:16.143365
+++ /var/home/dchristidis/sort.py:after	2024-03-08 20:42:28.548160
@@ -1 +1 @@
-from typing import cast, TYPE_CHECKING
+from typing import TYPE_CHECKING, cast

Comment on lines -271 to +276
def get_local_account_usage(account, rse_id=None, *, session: "Session"):
def get_local_account_usage(account: InternalAccount, rse_id: Optional[UUID] = None, *, session: "Session") -> list[RSELocalAccountUsageDict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that UUID is wrong and I would like to get to the bottom of this.

Comment on lines 35 to 36
from rucio.core.oidc import validate_jwt
from rucio.core.oidc import validate_jwt, token_dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me, it should be the other way around: the token_dictionary() in rucio.core.authentication should be kept and the one in rucio.core.oidc replaced.

Copy link
Contributor Author

@rdimaio rdimaio Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a circular import:

ImportError: cannot import name 'validate_jwt' from partially initialized module 'rucio.core.oidc' (most likely due to a circular import) (/opt/venv/lib64/python3.9/site-packages/rucio/core/oidc.py)

execution for reference: https://github.com/rucio/rucio/actions/runs/8207133900/job/22447782460?pr=6497

@rdimaio rdimaio force-pushed the patch-6454-type-annotation branch from a1b8b4b to 9a56b4d Compare March 8, 2024 17:56
rdimaio added a commit to rdimaio/rucio that referenced this pull request Apr 8, 2024
There are no usages of ssh_sign where "message" is anything
other than a string, and it is best if we always expect it
to be a string.

Follow-up from this PR: rucio#6497
rdimaio added a commit to rdimaio/rucio that referenced this pull request Apr 8, 2024
@rdimaio
Copy link
Contributor Author

rdimaio commented Apr 8, 2024

Sorry for leaving this unattended until now.

I am breaking down this PR into smaller PRs to help move things forward. Each PR will be for a different aspect of this PR:

rdimaio added a commit to rdimaio/rucio that referenced this pull request Apr 8, 2024
bari12 pushed a commit that referenced this pull request Apr 17, 2024
bari12 pushed a commit that referenced this pull request Apr 17, 2024
bari12 pushed a commit that referenced this pull request Apr 26, 2024
There are no usages of ssh_sign where "message" is anything
other than a string, and it is best if we always expect it
to be a string.

Follow-up from this PR: #6497
bari12 pushed a commit that referenced this pull request Apr 26, 2024
bari12 pushed a commit that referenced this pull request Apr 26, 2024
bari12 pushed a commit that referenced this pull request Apr 26, 2024
There are no usages of ssh_sign where "message" is anything
other than a string, and it is best if we always expect it
to be a string.

Follow-up from this PR: #6497
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants