Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove socket.inet_pton #318

Merged
merged 16 commits into from
Feb 28, 2024
Merged

Conversation

elliotwutingfeng
Copy link
Contributor

@elliotwutingfeng elliotwutingfeng commented Feb 22, 2024

Closes #317

Changes

  • Remove socket.inet_pton which does not reject IPv4 addresses and IPv6 (dual) addresses with leading IPv4 zeroes on macOS.
  • Add test_looks_like_ipv6

Unresolved

I've also run checks on Python 3.8.7 which was affected by the leading zeroes inconsistency in the standard library method ipaddress.IPv4Address (see #292). It turns out ipaddress.IPv6Address is also affected by the same inconsistency (for trailing IPv4 portion).

I would propose copying the implementation of ipaddress.IPv6Address or some other implementation like the one in Go, modifying it to avoid the leading zeroes bug even on older Python versions, and removing portions that we don't need, like checks if input is of str type, though it means we will have to ensure that our implementation is up-to-date with upstream. Upside is we will have more control over parsing bugs.

@elliotwutingfeng
Copy link
Contributor Author

elliotwutingfeng commented Feb 27, 2024

The ipaddress.IPv6Address inconsistency for older patch versions will disappear on its own when Python 3.9 goes EOL.

Copy link
Owner

@john-kurkowski john-kurkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loving the shorter code! If we can fix/skip CI, I'm tempted to accept the PR as-is, and call it a day. I'm wary of taking on IP parsing code, when a fix is already available in years-old Python patch releases.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
tldextract/remote.py Show resolved Hide resolved
@elliotwutingfeng
Copy link
Contributor Author

elliotwutingfeng commented Feb 28, 2024

GitHub Action setup-python for Windows does not have Python 3.8.12 or later. Should we skip the leading zeroes test for Windows + Python 3.8?

@john-kurkowski
Copy link
Owner

Yeah, skip it. Maybe it can be a single version >= 3.8.12 check? That is, platform-independent?

@@ -38,8 +38,7 @@ def md5(*args: bytes) -> hashlib._Hash:


def get_pkg_unique_identifier() -> str:
"""
Generate an identifier unique to the python version, tldextract version, and python instance.
"""Generate an identifier unique to the python version, tldextract version, and python instance.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what generated these linebreak diffs? I'm just curious, to reduce noise in our tooling. The latest version of black . and ruff format . don't seem to care about this change either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happened here elliotwutingfeng@d7001c6. I was using ruff v0.2.1, but it doesn't seem to be causing it.

@john-kurkowski john-kurkowski marked this pull request as ready for review February 28, 2024 18:47
@john-kurkowski
Copy link
Owner

Thanks for hopping on this!

@john-kurkowski john-kurkowski merged commit 1f77877 into john-kurkowski:master Feb 28, 2024
27 checks passed
@elliotwutingfeng
Copy link
Contributor Author

elliotwutingfeng commented Feb 29, 2024

Python versions where the ipaddress standard library erroneously accepts leading IPv4 zeroes in IPv4 addresses / IPv6 (dual) addresses.

  • Python 3.8.0 to 3.8.11 inclusive
  • Python 3.9.0 to 3.9.4 inclusive

Reference: https://docs.python.org/3/library/ipaddress.html#ipaddress.IPv4Address (referenced hyperlink is for ipaddress.IPv4Address but applies to ipaddress.IPv6Address dual addresses too)

bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Mar 29, 2024
https://build.opensuse.org/request/show/1163368
by user mia + anag+factory
- Update to 5.1.2:
  * Remove socket.inet_pton, to fix platform-dependent IP parsing
    #gh/john-kurkowski/tldextract#318
  * Use non-capturing groups for IPv4 address detection, for a
    slight speed boost
    #gh/john-kurkowski/tldextract#323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

platform-dependent parsing of IP addresses
2 participants