Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID selector syntax #24

Open
SimonSapin opened this issue Apr 1, 2013 · 3 comments
Open

ID selector syntax #24

SimonSapin opened this issue Apr 1, 2013 · 3 comments

Comments

@SimonSapin
Copy link
Contributor

The spec’ed syntax for ID selectors is # followed by an identifier, not any hash token. See the "type" flag on hash tokens in css-syntax.

@Gallaecio
Copy link
Member

I’m not sure if this is a feature request or a bug 😅

@SimonSapin
Copy link
Contributor Author

This is a bug report. The tokenizer code does not match the Selectors specification:

https://drafts.csswg.org/selectors/#id-selectors

An ID selector consists of a “number sign” (U+0023, #) immediately followed by the ID value, which must be a CSS identifier.

The syntax for identifiers has some restrictions. For example it cannot start wit a digit, so 37signals is not a valid identifier.

A # character followed by an identifier is parsed in CSS 2 as a HASH token, in CSS Syntax Level 3 as a <hash-token>. (The two definitions should be equivalent, just written in a different style.) But the hash token is less restrictive than that (because it is also used in e.g. color: #00FF55;).

Some inputs like #37signals do tokenize as a hash token, but are not made of # followed by a valid identifiers (again because of starting with a digit, in this example). So they should not parse as ID selectors. CSS Syntax’s tokenizer algorithm sets a type flag for exactly this purpose.

In Python terms, css_to_xpath("#37signals") should raise a SelectorSyntaxError.

@Gallaecio
Copy link
Member

I’ve been looking into this, and I see that a lot of people has gotten bitten by this when trying to style elements with such IDs, which are valid HTML IDs, but for some reason not valid CSS IDs.

Instead of adjusting our code to the standard, I wonder if it would not be better to keep support for this syntax, and instead indicate this explicitly as a feature of cssselect in https://cssselect.readthedocs.io/en/latest/#supported-selectors. Otherwise, this change can break existing selectors, and require a harder-to-read syntax for these cases.

Unless there’s a good reason not to allow this syntax, i.e. a selector that should work 1 way and because of supporting this syntax it works unexpectedly, I really think it would be best to keep the implementation as it is.

@annbgn I’m sorry to suggest this when you have already worked on a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants