Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS Selectors Level 4 Support #108

Open
Gallaecio opened this issue Feb 4, 2020 · 14 comments
Open

CSS Selectors Level 4 Support #108

Gallaecio opened this issue Feb 4, 2020 · 14 comments

Comments

@Gallaecio
Copy link
Member

Gallaecio commented Feb 4, 2020

We have a few issues open about adding support for specific parts of the CSS Selectors Level 4 specification already. The aim, however, is to eventually reach complete support.

@Gallaecio Gallaecio changed the title CSS4 Support CSS Selectors Level 4 Support Feb 4, 2020
@Farhan2316
Copy link

Hi Gallaecio! I would like to work on this project,I already have Worked on CSS Selectors level

@adarshvdeveloper
Copy link

Hi, i would also like to work on this project

@Gallaecio
Copy link
Member Author

I believe the deadline to present a proposal ends in less than 24h, and sending a pre-application pull request so that we can see how you work is a requirement, so the timeline is rather tight. See http://gsoc2020.scrapinghub.com/participate for details.

@annbgn
Copy link
Contributor

annbgn commented Apr 6, 2021

Hello, @Gallaecio ! I'm also interested in contributing to this project as part Google Summer of Code 2021, but the aim and expected result seems a little vague. On the list of open issues you linked above there is one issue with last update in 2016 and one issue with closed pr attached, though the issue itself is still open.
Could you please clarify, what is supposed to be done in this particular issue?

@Gallaecio
Copy link
Member Author

At the moment, cssselect has complete support for CSS Selectors Level 3, and partial support for CSS Selectors Level 4. The goal would be to extend CSS Selectors Level 4 support further, as much as possible within the time frame that Google Summer of Code 2021 offers.

There are issues that are quite old because this feature was initially requested a long time ago. It is still relevant, though. As for #66, I see no closed pull request linked; there’s an open one, #96, which once merged will only fix a part of #66.

@annbgn
Copy link
Contributor

annbgn commented Apr 6, 2021

@Gallaecio , thanks for your reply
I made a draft proposal https://docs.google.com/document/d/1HSDdZj6RmXNj7TwQXw0GFVYRRBwzZb0dMtotNqoBbH0/
I'd be glad to have some feedback :)

@Gallaecio
Copy link
Member Author

Your proposal lists issues and features to implement, but I think it would be great if you could go into more detail about your plan for each of them.

For example, it would be great if you could detail how you plan to translate each CSS Selector Level 4 feature that you plan to add to XPath 1.0, i.e. provide examples of the corresponding CSS Selector Level 4 features, and the matching XPath 1.0 expressions that you want to get cssselect to generate. I wonder if maybe some of the CSS Selector Level 4 features cannot be translated into XPath 1.0.

Also, just assuming that each issue will take 1 week may be overly optimistic. It would be best for you to look into each issue and really estimate how much you think each will take. It may even make sense to split work on some of them into different parts. It’s important to make some initial research now so that your project timeline is as accurate as possible, and pessimistic when in doubt, to make sure your project is successful.

@annbgn
Copy link
Contributor

annbgn commented Apr 7, 2021

Thank you for review. I added details and made timeline less optimistic as you suggested.

@Gallaecio
Copy link
Member Author

June 7--20: solve #66 issue, write tests. Example p:has(strong) -> descendant-or-self::p[name() = 'strong']. Reuse code in #96. Time permits, allow has to be nestable

What specifically do you plan to implement that is not already implemented in #96?

Also, stretch goals make sense, but they should not be a part of the timeline. For example, if nesting support is meant as a stretch goal, to be done only if time allows, then it should not be a part of this week’s work. You can propose stretch goals after the timeline, in a separate section.

July 19-25: closely compare specs https://www.w3.org/TR/selectors-3/ and https://www.w3.org/TR/selectors-4/ , define priority for differences (might ask mentor’s feedback on chosen priorities), clarify work plan for August. As for now I guess it’s possible to take logical combinators for August.

I think this is something that needs to be done now, not as part of the project, as it is part of defining the scope of the project itself.

About priorities, I have no preference, so I suggest you go with the stuff that you find more achievable, or more interesting for you personally.

@annbgn
Copy link
Contributor

annbgn commented Apr 7, 2021

Thanks for getting back to me so quickly :)

What specifically do you plan to implement that is not already implemented in #96?

As far as I understand, GenericTranslator().css_to_xpath('dt:has(+ dt)') raises SelectorSyntaxError because of + dt is related selector, but it is parsed as simple selector, so I'm planning on fixing it. Updated proposal

You can propose stretch goals after the timeline, in a separate section.

Moved all time permits things into separate section

not as part of the project, as it is part of defining the scope of the project itself.

Nice thought. I've already listed some differences, so I'll choose a couple of likely achievable.

@Gallaecio
Copy link
Member Author

Nice job, it’s looking much better.

Support expressions like dt:has(+ dt) to transform into descendant-or-self::dt/child::dt.

Given the meaning of + in CSS selectors, “next immediate item on the same level”, I suspect that XPath expression may not be the right one. It looks more like the XPath translation of dt > dt instead (or :scope > dt, dt > dt, I’m not 100% sure if dt > dt also implies :scope > dt).

As to how your example expression should be interpreted, I’m not sure either 😅 . I imagine that it’s either as “a dt containing something that is immediately followed by a sibling dt”, or “a dt that is itself immediately followed by a sibling dt”.

July 12-18: investigate #51 issue as it is a little vague, point out exact cases. It also seems that the issue might be unsolvable. If not, use the rest of this week and two following weeks (July 19 - August 1) to implement generic cases instead of logical combinators.

Could you elaborate on what parts of the issue seem vague, or what aspects may be hard or impossible to implement? I may be misreading or missing something, but the :not(a.important[rel]) and :not(a > b) examples look relatively straightforward to me.

July 19 - August 1: support logical combinators. Logical combinations are :is(), :has(), :not() and :where(). First three already have some implementations, so I have to support :where() by creating a new class with canonical and specificity methods.

I’ve just read about :where, and I’m wondering if it has any usefulness for cssselect, where I think specificity is not really very important, if at all.

That said, I see no reason not to support it, even if it is just for the sake of completion.

Finally, in addition to describing the work that you plan to achieve as part of this project, and your stretch goals, it would be good to describe the parts of CSS Selectors Level 4 that will not be a part of your project or stretch goals, and why. I imagine some elements simply make no sense for cssselect, some may not be very useful for cssselect, and some may be useful but cannot fit the project timeline.

@annbgn
Copy link
Contributor

annbgn commented Apr 8, 2021

It looks more like the XPath translation of dt > dt

Yeah, absolutely right. I just accidentally looked at the neighbouring line in the docs and made a translation for + operator as if it meant "dt that has a direct child which is also dt" :embarassed:. Changed in proposal + to >. I guess dt:has(+ dt) translates as descendant-or-self::dt[following-sibling::*[position()=1][name()='dt'] ]

Could you elaborate on what parts of the issue seem vague, or what aspects may be hard or impossible to implement? I may be misreading or missing something, but the :not(a.important[rel]) and :not(a > b) examples look relatively straightforward to me.

Well, the first example is already implemented and works just fine:

>>> GenericTranslator().css_to_xpath(':not(a.important[rel])')
"descendant-or-self::*[not(((@class and contains(concat(' ', normalize-space(@class), ' '), ' important ')) and (@rel)) and (name() = 'a'))]"

but the second one is not a simple selector, so it's harder to implement, though I found a translation :not(a > b) to descendant-or-self::b[..[not(name() = 'a')]] (having this example I can use TDD approach, haha). The vague part is that I don't know if I should implement just what was requested in the issue or any other cases of not simple selector, so I edited the timeline to implement :not(a > b) and moved supporting other relative selectors to stretch goals.

about :where

It won't be nice if someone catches cssselect.xpath.ExpressionError: The pseudo-class :where() is unknown and it also isn't difficult to implement.

it would be good to describe the parts of CSS Selectors Level 4 that will not be a part of your project or stretch goals, and why

Added section "Not a part of this project" to proposal

@Gallaecio
Copy link
Member Author

Well, I have no further feedback; the proposal looks good to me, and you seem to understand the problem pretty well. You also have submitted the required PR (please mention it in the proposal).

So now just make sure you meet the Google deadlines to submit the proposal on their platform, and let me know if there is anything else I can help you with.

@annbgn
Copy link
Contributor

annbgn commented Apr 8, 2021

Thank you a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants