Improve type hints for str vs bytes #72

aronbierbaum · 2022-09-02T16:07:00Z

No description provided.

lxml-stubs/cssselect.pyi

lxml-stubs/etree.pyi

aronbierbaum · 2022-09-02T16:12:00Z

lxml-stubs/etree.pyi

+_StrOrBytes = Union[str, bytes]
+_ValueType = Union[str, bytes, QName]
+_InputDictAnyStr = Dict[_StrOrBytes, _StrOrBytes]


Add type aliases for parameters that allow str or bytes on both Python 2 and Python 3.

aronbierbaum · 2022-09-02T16:12:37Z

lxml-stubs/etree.pyi

    ) -> _Element: ...
    def write(
        self,
        file: _FileSource,
-        encoding: _AnyStr = ...,
-        method: _AnyStr = ...,
+        encoding: Optional[_StrOrBytes] = ...,


This should be marked as Optional since None is allowed.

In fact, even the unicode (or str) type is allowed as value. Although passing the string name "unicode" is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.

Unlike the global tostring function this only accepts unicode or bytes.

elm_tree = etree.fromstring("<test/>").getroottree() print(type(elm_tree)) with open("test.xml", "wb") as out_file: elm_tree.write(out_file, encoding = str)

Traceback (most recent call last): File "C:\Source\p5\taccs\unicode_xml.py", line 82, in <module> elm_tree.write(out_file, encoding = str) File "src/lxml/etree.pyx", line 2055, in lxml.etree._ElementTree.write TypeError: unbound method str.upper() needs an argument

Then let's start the deprecation by making sure users pass only strings.

aronbierbaum · 2022-09-02T16:13:05Z

lxml-stubs/etree.pyi

        pretty_print: bool = ...,
        xml_declaration: Any = ...,
        with_tail: Any = ...,
        standalone: bool = ...,
+        doctype: _StrOrBytes = ...,


Updated for new parameters.

aronbierbaum · 2022-09-08T12:58:20Z

@scoder Who should I contact to get reviews of this pull request?

scoder

Thanks, that's a good improvement.

I like the idea of separating input and output declarations.
https://en.wikipedia.org/wiki/Robustness_principle

lxml-stubs/etree.pyi

scoder · 2022-09-08T15:43:19Z

lxml-stubs/etree.pyi

    ) -> _Element: ...
    def write(
        self,
        file: _FileSource,
-        encoding: _AnyStr = ...,
-        method: _AnyStr = ...,
+        encoding: Optional[_StrOrBytes] = ...,


In fact, even the unicode (or str) type is allowed as value. Although passing the string name "unicode" is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.

scoder · 2022-09-08T15:49:49Z

lxml-stubs/etree.pyi

-        encoding: _AnyStr = ...,
-        method: _AnyStr = ...,
+        encoding: Optional[_StrOrBytes] = ...,
+        method: str = ...,


Can be unicode in Py2, essentially the same as encoding above (but not None).

Great point, changed to _StrOrBytes. Also updated the tostring overrides.

scoder · 2022-09-08T15:52:57Z

lxml-stubs/etree.pyi

        compression: int = ...,
        exclusive: bool = ...,
+        inclusive_ns_prefixes: List[Union[str, bytes]] = ...,


Why is this different from the declaration below in write_c14n?

Updated to be Iterable[_StrOrBytes] like write_c14n.

scoder · 2022-09-08T15:59:49Z

lxml-stubs/etree.pyi

+    def __setitem__(self, key: _TagName, value: _ValueType) -> None: ...
+    def __delitem__(self, key: _TagName) -> None: ...


Right, good catch.

scoder · 2022-09-08T16:01:01Z

lxml-stubs/etree.pyi

+    _ListAnyStr = Union[List[str], List[bytes]]
+    _DictAnyStr = Union[Dict[str, str], Dict[bytes, bytes]]
+_StrOrBytes = Union[str, bytes]
+_ValueType = Union[str, bytes, QName]


This seems a very generic and unclear name. Isn't it just the same as _TagName?

You are correct it has the same type as _TagName but it felt odd using _TagName for both parameters below. I have no problem changing it if you prefer. We could also change this to _ValueType = _TagName so that we maintain readability.

def set(self, key: _TagName, value: _ValueType) -> None: ...

You are right that both are different for the _Attrib dict (and .get()/.set()), but the first really is a _TagName.

scoder · 2022-09-08T16:02:16Z

lxml-stubs/etree.pyi

    def update(
        self,
        sequence_or_dict: Union[
-            _Attrib, Mapping[_AnyStr, _AnyStr], Sequence[Tuple[_AnyStr, _AnyStr]]
+            _Attrib, Mapping[_ValueType, _ValueType], Sequence[Tuple[_ValueType, _ValueType]]


Key and value are not quite the same here, I think. The key would probably accept a QName, i.e. should be _TagName.

Updated to use _TagName for the key and first tuple value.

scoder · 2022-09-08T16:05:25Z

lxml-stubs/etree.pyi

@@ -409,7 +429,7 @@ class XMLParser(_FeedParser):
 class HTMLParser(_FeedParser):
    def __init__(
        self,
-        encoding: Optional[_AnyStr] = ...,
+        encoding: Optional[_StrOrBytes] = ...,


Since this appears quite a few times, it seems worth another alias, e.g. _Identifier.

scoder · 2022-09-08T16:11:07Z

lxml-stubs/etree.pyi

+    def end(self, tag: _StrOrBytes) -> None: ...
+    def pi(self, target: _StrOrBytes, data: Optional[_StrOrBytes] = ...) -> Any: ...
+    def start(self, tag: _StrOrBytes, attrib: Dict[_StrOrBytes, _StrOrBytes]) -> None: ...


It's probably worth using _TagName for the tag argument.

Updated start and end to take tag: _TagName

lxml-stubs/etree.pyi

aronbierbaum · 2022-09-09T12:47:41Z

@scoder Could you take another look at this, I believe I have handled most issues but there are a few outstanding questions.

scoder · 2023-05-04T12:26:40Z

Triggering a fresh test run

scoder · 2024-01-08T10:53:37Z

Triggering a fresh test run

scoder · 2024-01-08T11:05:32Z

I've fixed the dependency induced test failures in the master branch, but this PR still has a couple of own failures. Could you please see if you can resolve them?

aronbierbaum commented Sep 2, 2022

View reviewed changes

lxml-stubs/cssselect.pyi Show resolved Hide resolved

aronbierbaum commented Sep 2, 2022

View reviewed changes

lxml-stubs/etree.pyi Show resolved Hide resolved

aronbierbaum commented Sep 2, 2022

View reviewed changes

aronbierbaum force-pushed the python3_str branch from 7736c43 to 2851617 Compare September 6, 2022 19:41

scoder reviewed Sep 8, 2022

View reviewed changes

lxml-stubs/etree.pyi Outdated Show resolved Hide resolved

aronbierbaum force-pushed the python3_str branch from 28132d4 to 80e9ad4 Compare September 8, 2022 17:22

Improve type hints for str vs bytes

1ebe412

aronbierbaum force-pushed the python3_str branch from 80e9ad4 to 1ebe412 Compare September 8, 2022 17:27

scoder closed this May 4, 2023

scoder reopened this May 4, 2023

scoder added 5 commits May 4, 2023 14:29

Merge branch 'master' into python3_str

670a16a

Fix recently added references to _AnyStr

424192c

Fix declaration of _InputDictAnyStr

26ee198

Fix formatting

b3e36fd

Merge branch 'master' into python3_str

fb6ccee

scoder closed this Jan 8, 2024

scoder reopened this Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type hints for str vs bytes #72

Improve type hints for str vs bytes #72

aronbierbaum commented Sep 2, 2022

aronbierbaum Sep 2, 2022

aronbierbaum Sep 2, 2022

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022

scoder Sep 8, 2022

aronbierbaum Sep 2, 2022

aronbierbaum commented Sep 8, 2022

scoder left a comment

scoder Sep 8, 2022

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022 •

edited

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022

scoder Sep 8, 2022

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022

scoder Sep 8, 2022

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022

scoder Sep 8, 2022 •

edited

scoder Sep 8, 2022

aronbierbaum Sep 8, 2022

aronbierbaum commented Sep 9, 2022

scoder commented May 4, 2023

scoder commented Jan 8, 2024

scoder commented Jan 8, 2024

		def __setitem__(self, key: _TagName, value: _ValueType) -> None: ...
		def __delitem__(self, key: _TagName) -> None: ...

Improve type hints for str vs bytes #72

Are you sure you want to change the base?

Improve type hints for str vs bytes #72

Conversation

aronbierbaum commented Sep 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aronbierbaum commented Sep 8, 2022

scoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aronbierbaum Sep 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scoder Sep 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aronbierbaum commented Sep 9, 2022

scoder commented May 4, 2023

scoder commented Jan 8, 2024

scoder commented Jan 8, 2024

aronbierbaum Sep 8, 2022 •

edited

scoder Sep 8, 2022 •

edited