Skip to content

miute/urlstd

Repository files navigation

urlstd

PyPI PyPI - Python Version PyPI - License CI codecov

urlstd is a Python implementation of the WHATWG URL Living Standard.

This library provides URL class, URLSearchParams class, and low-level APIs that comply with the URL specification.

Supported APIs

  • URL class

    • class urlstd.parse.URL(url: str, base: Optional[str | URL] = None)
      • canParse: classmethod can_parse(url: str, base: Optional[str | URL] = None) -> bool
      • stringifier: __str__() -> str
      • href: readonly property href: str
      • origin: readonly property origin: str
      • protocol: property protocol: str
      • username: property username: str
      • password: property password: str
      • host: property host: str
      • hostname: property hostname: str
      • port: property port: str
      • pathname: property pathname: str
      • search: property search: str
      • searchParams: readonly property search_params: URLSearchParams
      • hash: property hash: str
      • URL equivalence: __eq__(other: Any) -> bool and equals(other: URL, exclude_fragments: bool = False) → bool
  • URLSearchParams class

    • class urlstd.parse.URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)
      • size: __len__() -> int
      • append: append(name: str, value: str | int | float) -> None
      • delete: delete(name: str, value: Optional[str | int | float] = None) -> None
      • get: get(name: str) -> str | None
      • getAll: get_all(name: str) -> tuple[str, ...]
      • has: has(name: str, value: Optional[str | int | float] = None) -> bool
      • set: set(name: str, value: str | int | float) -> None
      • sort: sort() -> None
      • iterable<USVString, USVString>: __iter__() -> Iterator[tuple[str, str]]
      • stringifier: __str__() -> str
  • Low-level APIs

  • Compatibility with standard library urllib

    • urlstd.parse.urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult

      urlstd.parse.urlparse() ia an alternative to urllib.parse.urlparse(). Parses a string representation of a URL using the basic URL parser, and returns urllib.parse.ParseResult.

Basic Usage

To parse a string into a URL:

from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# → <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>

To parse a string into a URL with using a base URL:

url = URL('?ffi&🌈', base='http://example.org')
url  # → <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search  # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params  # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params  # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search  # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url)  # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='

To validate a URL string:

from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/')  # → True
URLValidator.is_valid('https://user:password@example.org/')  # → False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid  # → False
validity.validation_errors  # → 1
validity.descriptions[0]  # → "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
URL.can_parse('file:///C|/demo')  # → True
URLValidator.is_valid('file:///C|/demo')  # → False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity)  # → False
validity.valid  # → False
validity.validation_errors  # → 1
validity.descriptions[0]  # → "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"

To parse a string into a urllib.parse.ParseResult with using a base URL:

import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
pr  # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query)  # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
pr  # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251')  # → 'a&#255;b'
html.unescape('a&#255;b')  # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
pr  # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252')  # → 'aÿb'

Logging

urlstd uses standard library logging for validation error. Change the logger log level of urlstd if needed:

logging.getLogger('urlstd').setLevel(logging.ERROR)

Dependencies

Installation

  1. Configuring environment variables for icupy (ICU):

    • Windows:

      • Set the ICU_ROOT environment variable to the root of the ICU installation (default is C:\icu). For example, if the ICU is located in C:\icu4c:

        set ICU_ROOT=C:\icu4c

        or in PowerShell:

        $env:ICU_ROOT = "C:\icu4c"
      • To verify settings using icuinfo (64 bit):

        %ICU_ROOT%\bin64\icuinfo

        or in PowerShell:

        & $env:ICU_ROOT\bin64\icuinfo
    • Linux/POSIX:

      • If the ICU is located in a non-regular place, set the PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables. For example, if the ICU is located in /usr/local:

        export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
      • To verify settings using pkg-config:

        $ pkg-config --cflags --libs icu-uc
        -I/usr/local/include -L/usr/local/lib -licuuc -licudata
  2. Installing from PyPI:

    pip install urlstd

Running Tests

Install dependencies:

pipx install tox
# or
pip install --user tox

To run tests and generate a report:

git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt

See result: tests/wpt/report.html

License

MIT License.