Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom selector engines (Querying nested shadow roots) #5405

Closed
Georgegriff opened this issue Feb 10, 2020 · 13 comments
Closed

Support for custom selector engines (Querying nested shadow roots) #5405

Georgegriff opened this issue Feb 10, 2020 · 13 comments

Comments

@Georgegriff
Copy link

What is this?
Other tools offer the ability to provide a custom engine selecting elements in the DOM.

(example from playwright)

  await playwright.selectors.register(selectorEngine, { name: 'shadow' })
  await page.waitForSelector('shadow=#no-downloads span', {timeout: 3000}

I've seen #382 but i'm not sure this offers such an easy mechanism thats really simple for end users. I'm happy to be wrong on this, i couldn't find any examples.

Playwright offers this. https://github.com/microsoft/playwright/blob/master/docs/api.md#selectorsregisterenginefunction-args

In Selenium world its things like this: https://chercher.tech/java/custom-locators-selenium-webdriver

Why is this useful?
Traditionally these custom locators would be used to provide the ability to select elements via XPATH or JQuery selectors.

Why do i want this?
I maintain: https://github.com/Georgegriff/query-selector-shadow-dom which allows users to write css selectors that automatically pierce web component shadow roots and it was trivial to add support in Playwright to use my library as a selector engine.
Like so:

const { selectorEngine } = require("query-selector-shadow-dom/plugins/playwright");
const playwright = require('playwright')
  await playwright.selectors.register(selectorEngine, { name: 'shadow' })

  const browser = await playwright.chromium.launch({ headless: false})
  const context = await browser.newContext({ viewport: null })
  const page = await context.newPage()

  await page.goto('chrome://downloads')

  await page.waitForSelector('shadow=#no-downloads span', {timeout: 3000})
  await new Promise(resolve => setTimeout(resolve, 3000))   

  await page.close()
  await context.close()
  await browser.close()

Registering this engine allows users to use click waitForSelector and thing that accepts a selector to use my library to automatically pierce shadow roots.

How is my engine implemented in playwright?

Playwright defines this interface: https://github.com/microsoft/playwright/blob/master/docs/api.md#selectorsregisterenginefunction-args which accepts a Function/String
They will take your function and pass into into the browser context and handle the rest for you so you can use the engine for click etc.

My library implements this interface: https://github.com/Georgegriff/query-selector-shadow-dom/blob/master/plugins/playwright/index.js (It does this a little strangely using string because i need to inject my library into the function scope)

@paullewis
Copy link
Contributor

paullewis commented Apr 30, 2020

Hey, so we now have an experimental API that lets you do this (on master). Roughly it looks like this:

// Custom query handler.
const doesNotHaveClass = 
    (element, className) => element.querySelectorAll(`:not(.${className})`);

// Register it.
puppeteer.__experimental_registerCustomQueryHandler('doesNotHaveClass', 
    doesNotHaveClass);

// Prepend queries with the name of the handler.
const elements = await page.$$('doesNotHaveClass/foo');

We have the following APIs:

__experimental_registerCustomQueryHandler(name: string, queryHandler: QueryHandler): void;
__experimental_unregisterCustomQueryHandler(name: string): void;
__experimental_customQueryHandlers(): Map<string, QueryHandler>;
__experimental_clearQueryHandlers(): void;

Where QueryHandler is a relatively generic term for a function of the form:

(element, selector) => Element | Element[] | NodeListOf<Element>

Other points of note:

  • It's experimental (so it might change!)
  • You can only register one function for a given name, and names can only contain [a-zA-Z]
  • You can register or invoke a function that doesn't follow the expectations of $, $$, $$eval, or waitFor{Selector}, and you will either get unexpected outcomes or an error. In short we don't check that the query handler you invoke is going to do what you expect :)

@zewa666
Copy link

zewa666 commented May 6, 2020

this does sound super interesting. For the official Aurelia i18n plugin we're making use of custom attributes, by default named t, which contain a string pointing to a resource and translates by default the textContent of the target element.

an example would be something like this:

<span t="title">Title</span>

additionally next to the default textContent target the user can override the target with this syntax

<span t="[alt]title">Title</span>

So ideally we could forward multiple params to the custom query handler (something along these lines)

// Custom query handler.
const i18n = 
    (element, key, target) => element.querySelectorAll(`[t^='${target ? '[' + target + ']' : ''}${key}']`);

// Register it.
puppeteer.__experimental_registerCustomQueryHandler('i18n ', i18n);

const elementsWithoutTarget = await page.$$('i18n /title');
const elementsWithTarget = await page.$$('i18n /title/alt');

There are many more opportunities but essentially having multiple params available, would open up much more use cases

@mathiasbynens
Copy link
Member

mathiasbynens commented May 7, 2020

@zewa666 These use cases can already be addressed by splitting the selector string into parts in your custom query handler:

const myQueryHandler = (element, selector) => {
  const params = splitIntoParameters(selector);
  return doStuff(element, params);
};

The reason we want to avoid handling this for you (in this case by splitting on /) is because every use case might have different requirements, and we don't want to limit the possibilities. In XPath for example / already has special meaning, so if your custom handler uses XPath you likely wouldn't want to use / to mean anything special in your selector aside from the custom myQueryHandler/ prefix.

@zewa666
Copy link

zewa666 commented May 7, 2020

Oh ok yeah that makes sense. I thought the / was a kind of convention (like xpath) and you had to distinguish params by that. In this case my call can be simply

await page.$$('i18n/[alt]title');

Thanks for the clarification

@mathiasbynens
Copy link
Member

@zewa666 Exactly! The i18n/ is the prefix that tells Puppeteer which custom query handler to use. The string '[alt]title', i.e. the rest of the selector, is then passed to your custom handler where you can process it however way you like.

@mathiasbynens
Copy link
Member

@paullewis How would you register a custom query handler that supports both $ and $$?

@Georgegriff
Copy link
Author

Georgegriff commented May 8, 2020

I've also just run into this problem i've got .$ working fine, but not when i return an array of elements, with a shadow-dom based query handler, it falls over, my handlers attached at bottom to this comment. Ideally i could support both and puppeteer could choose what action to take accordingly based on if the user wanted $$ or $

This is handled in playwright by registering two functions query, and queryAll, in what if the query handlers supported returning something like this

const () => {
  return {
   query(element, selector) => // stuff you do for returning a single element,
   queryAll(element, selector) => // stuff you do for returning multiple elements
  },
}

Then puppeteer could call the appropriate function, or alternatively allow something like this:

        puppeteer.__experimental_registerCustomQueryHandler('shadow', queryHandler, queryAllHandler);

Where second function is intended to return arrays or nodelists

This works for $. but not $$.
(based on query-selector-shadow-dom)

my lib has two func:

return querySelectorShadowDom.querySelectorDeep(selector, element);
and
return querySelectorShadowDom.querySelectorAllDeep(selector, element);
The are mirrors of querySelector/querySelectorAll but they automatically pierce nested shadow roots.

The first function works, but using the 2nd doesn't because the second returns an array.

const queryHandler = (element, selector) => {

    // minified library guff to inject my code into the handler, scroll past
    var querySelectorShadowDom=function(e){"use strict";function o(e,a,c){var t=c.querySelector(e);return document.head.createShadowRoot||document.head.attachShadow?!a&&t?t:h(e,",").reduce(function(e,t){if(!a&&e)return e;var l,d,i,o=h(t.replace(/^\s+/g,"").replace(/\s*([>+~]+)\s*/g,"$1")," ").filter(function(e){return!!e}),r=o.length-1,n=function(t,e){void 0===t&&(t=null);var n=[],o=function e(t){for(var o,r=0;o=t[r];++r)n.push(o),o.shadowRoot&&e(o.shadowRoot.querySelectorAll("*"))};e.shadowRoot&&o(e.shadowRoot.querySelectorAll("*"));return o(e.querySelectorAll("*")),t?n.filter(function(e){return e.matches(t)}):n}(o[r],c),u=(l=o,d=r,i=c,function(e){for(var t,o,r,n=d,u=e,a=!1;u&&(r=u).nodeType!==Node.DOCUMENT_FRAGMENT_NODE&&r.nodeType!==Node.DOCUMENT_NODE;){var c=u.matches(l[n]);if(c&&0===n){a=!0;break}c&&n--,t=i,o=u.parentNode,u=o&&o.host&&11===o.nodeType?o.host:o===t?null:o}return a});return a?e=e.concat(n.filter(u)):(e=n.find(u))||null},a?[]:null):a?c.querySelectorAll(e):t}function h(e,o){return e.match(/\\?.|^$/g).reduce(function(e,t){return'"'!==t||e.sQuote?"'"!==t||e.quote?e.quote||e.sQuote||t!==o?e.a[e.a.length-1]+=t:e.a.push(""):(e.sQuote^=1,e.a[e.a.length-1]+=t):(e.quote^=1,e.a[e.a.length-1]+=t),e},{a:[""]}).a}return e.querySelectorAllDeep=function(e,t){return void 0===t&&(t=document),o(e,!0,t)},e.querySelectorDeep=function(e,t){return void 0===t&&(t=document),o(e,!1,t)},e}({});
    
    // my lib communicating with the new puppeteer api
    return querySelectorShadowDom.querySelectorDeep(selector, element);
}

Incidentally, Playwright recently made their inbuilt css selector automagically work for shadow dom: https://github.com/microsoft/playwright/releases/tag/v0.14.0

@paullewis
Copy link
Contributor

@paullewis How would you register a custom query handler that supports both $ and $$?

You would just register it and use it wherever you like. We don't make any distinction in the code about which function the handler is for. That said the implementation of the handler will either be doing something that expects a single element or a collection of elements, which will naturally lend it to either $ or $$, but that's not enforced so much as it's about what the function returns.

@jschfflr
Copy link
Contributor

jschfflr commented Jul 1, 2020

Just a note from a debugging session with @mathiasbynens: element will be the #document itself instead of an HTMLElement if the selector is not scoped.

@Georgegriff
Copy link
Author

Georgegriff commented Jul 19, 2020

Been experimenting with the updated API in 5.2.0 its working great! Was able to implement the following with ease:
https://github.com/Georgegriff/query-selector-shadow-dom/pull/36/files

QueryHandler implementation: https://github.com/Georgegriff/query-selector-shadow-dom/pull/36/files#diff-1297c36120ceed6b61d83df8075cc959

image
image
image

@namukang
Copy link

namukang commented Nov 29, 2020

I've been trying to develop a custom query handler, but I'm having trouble accessing custom properties set on window via page.evaluate inside the query handler.

For example:

// Handler tries to access custom window property
puppeteer.registerCustomQueryHandler("meow", {
    queryOne: (element, selector) => {
      return element.querySelector((window as any).meow);
    },
    queryAll: (element, selector) => {
      return element.querySelectorAll((window as any).meow);
    },
  });

// After page is created and navigated, set a global property
await page.evaluate(() => (window as any).meow = "body");

I would expect the custom query handler to always return the body, but meow is undefined on window. Is this by design or a bug? Thanks!

@stale
Copy link

stale bot commented Jun 25, 2022

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

@stale stale bot added the unconfirmed label Jun 25, 2022
@stale
Copy link

stale bot commented Jul 25, 2022

We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it!

@stale stale bot closed this as completed Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants