Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Server-side block attributes sourcing #18414

Closed
wants to merge 5 commits into from

Conversation

aduth
Copy link
Member

@aduth aduth commented Nov 9, 2019

This pull request seeks to explore an approach to block attributes sourcing on the server. In other words, it seeks to account for attributes whose values are derived from HTML (or post meta). It should be considered a prototype, but it currently implements most all of the current source supports.

Example:

curl 'http://localhost:8889/wp-json/wp/v2/posts/95?context=edit&_fields=content'  -H 'X-WP-Nonce: [...]' -H 'Cookie: wordpress_logged_in_[...]=[...]' | jq .
{
  "content": {
    "raw": "<!-- wp:paragraph {\"align\":\"center\",\"className\":\"my-custom-class\"} -->\n<p class=\"has-text-align-center my-custom-class\">Hello world</p>\n<!-- /wp:paragraph -->\n\n<!-- wp:image {\"id\":20,\"sizeSlug\":\"large\"} -->\n<figure class=\"wp-block-image size-large\"><img src=\"http://localhost:8889/wp-content/uploads/2019/11/stars-1024x681.jpeg\" alt=\"\" class=\"wp-image-20\"/><figcaption>Caption!</figcaption></figure>\n<!-- /wp:image -->",
    "rendered": "\n<p class=\"has-text-align-center my-custom-class\">Hello world</p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img src=\"http://localhost:8889/wp-content/uploads/2019/11/stars-1024x681.jpeg\" alt=\"\" class=\"wp-image-20\" srcset=\"http://localhost:8889/wp-content/uploads/2019/11/stars-1024x681.jpeg 1024w, http://localhost:8889/wp-content/uploads/2019/11/stars-300x199.jpeg 300w, http://localhost:8889/wp-content/uploads/2019/11/stars-768x510.jpeg 768w, http://localhost:8889/wp-content/uploads/2019/11/stars-1536x1021.jpeg 1536w, http://localhost:8889/wp-content/uploads/2019/11/stars-2048x1361.jpeg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" /><figcaption>Caption!</figcaption></figure>\n",
    "protected": false,
    "block_version": 1,
    "blocks": [
      {
        "name": "core/paragraph",
        "attributes": {
          "align": "center",
          "className": "my-custom-class",
          "content": "Hello world"
        },
        "inner_blocks": []
      },
      {
        "name": "core/image",
        "attributes": {
          "id": 20,
          "sizeSlug": "large",
          "url": "http://localhost:8889/wp-content/uploads/2019/11/stars-1024x681.jpeg",
          "alt": "",
          "caption": "Caption!",
          "title": null,
          "href": null,
          "rel": null,
          "linkClass": null,
          "linkTarget": null
        },
        "inner_blocks": []
      }
    ]
  },
}

Implementation Notes:

The parsing relies on DOMDocument, querying using DOMXPath by first converting the block type attribute selector to an equivalent XPath selector using a bundled, modified version of a third-party library PHP Selector.

In an effort to best demonstrate its usage, additional changes include:

  • A mechanism for server-registering all Gutenberg block.json manifests
  • Include a new content.blocks field on the REST API posts responses
  • Defining default-supported attributes for blocks registered on the server (className, align, anchor)

Open Questions:

For me, the main questions and concerns surrounding this approach include:

  • Is it performant enough?
  • Are DOMDocument and the PHP Selector utilities resilient enough, and do they account for modern HTML and CSS?
  • What permissions would we require for access to this raw data?

@aduth aduth added [Feature] Block API API that allows to express the block paradigm. [Type] Technical Prototype Offers a technical exploration into an idea as an example of what's possible [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f REST API Interaction Related to REST API labels Nov 9, 2019
Copy link
Contributor

@mcsf mcsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive!

Is it performant enough?
Are DOMDocument and the PHP Selector utilities resilient enough, and do they account for modern HTML and CSS?

Performance is the first thing one wonders about. To answer both those questions, there's probably nothing like getting this in the hands of testers. To that effect: we've been relatively liberal at pushing experimental client-side interfaces and know how to approach things there; how could we do the same here?

@mcsf
Copy link
Contributor

mcsf commented Nov 11, 2019

Resurfacing #7342, as it's in the same domain and something we could easily tackle now.

@aduth
Copy link
Member Author

aduth commented Nov 11, 2019

To that effect: we've been relatively liberal at pushing experimental client-side interfaces and know how to approach things there; how could we do the same here?

I think a safe route (albeit perhaps more labor-intensive) would be to first consider extracting some of the "additional" prerequisite work that I had to bundle into this pull request:

In an effort to best demonstrate its usage, additional changes include:

  • A mechanism for server-registering all Gutenberg block.json manifests
  • Include a new content.blocks field on the REST API posts responses
  • Defining default-supported attributes for blocks registered on the server (className, align, anchor)

The first and last of these would complement ongoing work in Trac#47620 (cc @gziolo, @spacedmonkey), since while the endpoint proposed there would expose registered blocks, it's currently the case that very few core blocks are actually registered on the server.

It would be good to have some feedback from REST API folks as well, at least so far as how we expose this data through that interface. It might be a task worth considering separate from how the attributes actually become populated during the parse. I can plan to bring it up in their weekly meeting this upcoming Thursday.

Lastly, while the plugin was a nice venue for quick prototyping, and we may be able to merge it in some experimental form, moving forward we might want to develop in Trac, or at least develop there some additional hooks necessary for a more solid solution. That this implementation replaces the default parser may not be how we ultimately want to go about doing this (e.g. perhaps we want a more "pure" HTML parser result from parse_blocks, and a separate parse_blocks_with_sourced_attributes that considers external factors like block registries, "current post", etc). Maybe @dmsnell has some thoughts here.

One specific task which otherwise limits the effectiveness of this in the plugin is a means to filter block registration settings on the server. Currently this is not possible, and in this implementation I manually apply this by re-registering the core set of blocks.

@$document->loadHTML( '<html><body>' . $block['innerHTML'] . '</body></html>' );
} catch ( Exception $e ) {
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I have worked with DOMDocument I found it extremely helpful to extract the setup code into a separate function that abstracts away the noisy quirks.

$document = parseHTML( $block['innerHTML'] );
if ( null === $document ) {
	return null;
}

it's a small thing to abstract but I find in my experience it worth it especially as we learn about the settings we need to activate with whitespace and with parse-handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I have worked with DOMDocument I found it extremely helpful to extract the setup code into a separate function that abstracts away the noisy quirks.

That seems like a reasonable revision, for sure! I expect it would also make writing the tests a little nicer, since I'd not need to lump all the error cases for this function otherwise intended specifically at sourcing values.

function gutenberg_replace_block_parser_class() {
return 'WP_Sourced_Attributes_Block_Parser';
}
add_filter( 'block_parser_class', 'gutenberg_replace_block_parser_class' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how the system was designed to work, though I believe that you should be able to skip creating the helper function.

add_filter( 'block_parser_class', 'WP_Sourced_Attributes_Block_Parser' );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is useful, as it allows for this filter to be unhooked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how the system was designed to work, though I believe that you should be able to skip creating the helper function.

I'd tend to agree with @spacedmonkey here, though it certainly piques my interest that this syntax can work 🤔

@dmsnell
Copy link
Contributor

dmsnell commented Nov 11, 2019

Thanks for getting back to server-side parsing of attributes @aduth.

This change seems to mitigate the problem of having attributes sourced from HTML and I think it's good to get it in Core. If we end up thinking this is the way to go I'd only want to consider eventually merging it into the default parser, though I'd prefer to find a way to do that which doesn't mash all that code into the default parser.

From our discussions I know that you are aware of the fact that this only addresses one of a few concerns with sourced attributes, that the desire to have all attributes available to a parser with no knowledge of the block implementations is still unresolved by this change.

It seems though that we have been pushing the limit of what sourced attributes are offering and some people are really wishing we didn't have as much of them; I think that in some ways the biggest problem we're addressing is one of quantity and not of quality. If we can surface the attributes for the Core blocks then most people will be happy.

This patch gets those attributes available to the PHP on the running server while leaving them inaccessible to any other parser. That's better than leaving them unavailable to every parser. 🙂

function gutenberg_register_block_types() {
$registry = WP_Block_Type_Registry::get_instance();

$block_manifests = glob( dirname( dirname( __FILE__ ) ) . '/packages/block-library/src/*/block.json' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have block.json for all core blocks, those which are dynamic don't have this metadata file provided because we still didn't resolve the following issues:

Those aren't blockers for this proposal if we were to use only attributes though. So maybe it would be a good idea to move attributes to the block.json file to better promote this format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still didn't resolve the following issues:

You'll have to forgive me since it's been a while that I've revisited some of those specific details of the JSON manifest. Working through this prototype forced me to consider how we would implement at least some of those supports on the server (className, align, anchor are implemented in this pull request).

For translations, I seem to recall something about how we considered to wrap the translateable fields via __ et. al., automatically? I'm not sure exactly how we determine the domain in that case.

As a prototype, I'm also fine to start splitting those off into their own individual tasks. It was at least interesting to explore the feasibility of pulling them in and highlighting some of these shortcomings (notably supports).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to forgive me since it's been a while that I've revisited some of those specific details of the JSON manifest. Working through this prototype forced me to consider how we would implement at least some of those supports on the server (className, align, anchor are implemented in this pull request).

Nice, I missed that, sorry about it :(

For translations, I seem to recall something about how we considered to wrap the translateable fields via __ et. al., automatically? I'm not sure exactly how we determine the domain in that case.

There needs to be the textDomain field declared in the block.json file. You can check my prototype for JS side as a reference: #16088.

I don't think it is a concern though in the context of attributes. I just wanted to raise awareness of that. The general agreement was that attributes shouldn't be translatable.

As a prototype, I'm also fine to start splitting those off into their own individual tasks. It was at least interesting to explore the feasibility of pulling them in and highlighting some of these shortcomings (notably supports).

Yes, supports seems like the only place which can cause issues for the proposed code.

gutenberg.php Show resolved Hide resolved
@spacedmonkey
Copy link
Member

Please consider this ticket in core. I believe this has to land before we can continue this work.

I also believe how editor_script, script, editor_style and style are handled in gutenberg and PHP are different. PHP using a handles where as javascript uses urls. There will need to be changed in PHP to handle url before this can be merged.

$registry = WP_Block_Type_Registry::get_instance();

$block_manifests = glob( dirname( dirname( __FILE__ ) ) . '/packages/block-library/src/*/block.json' );
foreach ( $block_manifests as $block_manifest ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some level of validation be happening here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some level of validation be happening here?

Do you mean that it has necessary properties to consider it a valid block manifest?

There are some simple checks below, both to account that the file could be parsed as JSON, and that it has a name. We could expand on this.

if ( is_null( $block_settings ) || ! isset( $block_settings['name'] ) ) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of types, like it string, int, array etc?

Copy link
Member Author

@aduth aduth Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of types, like it string, int, array etc?

If I understand you correctly, we probably should want something like what exists in WP_Block_Type#prepare_attributes_for_render, though I expect this would be applied at the time that $attributes are being sourced.

* be parsed.
* @return mixed Sourced attribute value.
*/
function get_html_sourced_attribute( $block, $attribute_schema ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is going to be expensive from a compute level. Is there anyway we can cache the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is going to be expensive from a compute level. Is there anyway we can cache the result?

I'd thought a bit about what might make sense to cache. Since the HTML of each block will likely be unique, I don't know that we would want to cache either the loaded HTML or the queried results, as the cache hit ratio would be very low.

What might make sense, depending on whether it makes a measurable difference:

  • Caching the $document itself.
    • If constructing DOMDocument is expensive (I don't know that it is)
  • Caching the converted XPath selectors
    • The conversion may or may not be expensive (might also depend which implementation we choose), but there's a higher likelihood we would reuse those on a a per-block-type basis (e.g. every paragraph will be running the p selector).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cache the attributes, in say post meta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cache the attributes, in say post meta?

Hm, I'd have to think about it more, but that does seem like a good idea. In fact, it might then make sense to run this sourcing logic when a post is saved, rather than at parse-time (the parse would just read the cached result).

* @return mixed Sourced attribute value.
*/
function get_html_sourced_attribute( $block, $attribute_schema ) {
$document = new DOMDocument();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this class has some requirements. Can we confirm that the libxml package is currently a required one for WP Core?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this class has some requirements. Can we confirm that the libxml package is currently a required one for WP Core?

From the page you link, it says "libxml is enabled by default".

We could still have some graceful fallback here for environments where it's explicitly disabled, although it would be unable to populate $attributes, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to change the requires for WP core.

* @return string Equivalent XPath selector.
*/
function _wp_css_selector_to_xpath( $selector ) {
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a filter here, to high jack this bahviour

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a filter here, to high jack this bahviour

Seems reasonable, sure 👍

* @param string $selector CSS selector.
* @return string Equivalent XPath selector.
*/
function _wp_css_selector_to_xpath( $selector ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs PHP unit tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs PHP unit tests.

Yep, I have no plans to merge anything without sufficient tests.

*
* @since 6.9.0
*/
class WP_Sourced_Attributes_Block_Parser extends WP_Block_Parser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests?

@aduth
Copy link
Member Author

aduth commented Nov 14, 2019

Please consider this ticket in core. I believe this has to land before we can continue this work.

Thanks for the pointer! Those changes seem much-needed, I agree. For how it impacts this pull request, I think $supports support is likely the most pressing, in how it might impact what's currently implemented as gutenberg_add_default_attributes.

I also believe how editor_script, script, editor_style and style are handled in gutenberg and PHP are different. PHP using a handles where as javascript uses urls. There will need to be changed in PHP to handle url before this can be merged.

Before this pull request is merged? Or the Trac ticket? I'm not really sure how those fields relate to the effort here.

@spacedmonkey
Copy link
Member

In the RFC, editor_script, script, editor_style and style are defines as URLs. However PHP registered block use script / style handles. See wp_enqueue_registered_block_scripts_and_styles.
The wp_enqueue_style and wp_enqueue_script can not currently handle passing a url to them.

TL:DR If you pass a URL in editor_script, script, editor_style and style fields, is not going to work and may break things.

Before this pull request is merged? Or the Trac ticket? I'm not really sure how those fields relate to the effort here.

I think that #48529 has to land and we need to decide how the fields are handled in #47620 before we merge anything here.

@spacedmonkey
Copy link
Member

The following blocks are missing block.json
tag-cloud
shortcode
search
rss
navigation-menu
legacy-widget
latest-posts
latest-comments
embed
categories
calendar
block
archives

See this ticket and this patch that resolves the issue.

@gziolo
Copy link
Member

gziolo commented Nov 15, 2019

The following blocks are missing block.json

See this ticket and this patch that resolves the issue.

Those files updated in the patch are replaced with the files from packages installed from npm. Your changes would be erased on the next run of npm run build or npm run build:dev. They need to be modified in Gutenberg. In addition, we need to land first the part which @aduth described in his comment #18414 (comment):

Working through this prototype forced me to consider how we would implement at least some of those supports on the server (className, align, anchor are implemented in this pull request).

We also didn't move translatable fields to block.json files because of:

For translations, I seem to recall something about how we considered to wrap the translateable fields via __ et. al., automatically?

We don't have a code in place which would do it with PHP code when registering blocks from block.json.

@kadamwhite
Copy link
Contributor

kadamwhite commented Jan 2, 2020

I'd like to propose exposing this block structure as a subresource, post/:id/blocks, instead of necessarily listing it in content; while it does represent post content, it's both more structural and in most cases not needed in the same contexts as the raw or rendered content. Making it a subresource would require an additional request or an _embed when this data is needed, but it would avoid somewhat duplicating the content we return in one response.

Edit: this was discussed a while back in slack, too.

@aduth
Copy link
Member Author

aduth commented Mar 10, 2020

This was always meant to be a prototype exploration, so I'm going to close this as it's not in a mergeable state. It can be useful for future reference implementation.

As far as its current status, there were pending architectural revisions that ought to be explored in any future implementation:

This pull request also included additional changes required by—but not directly related to—the addition of server-side attribute sourcing.

From the original comment:

A mechanism for server-registering all Gutenberg block.json manifests
Include a new content.blocks field on the REST API posts responses
Defining default-supported attributes for blocks registered on the server (className, align, anchor)

Respective status of each:

  • I believe @gziolo has his eyes on server-registering of block.json
  • The effort here enables content.blocks, and so would be dependent on this implementation. In any case, given Prototype: Server-side block attributes sourcing #18414 (comment), it might not be encouraged to implement as part of the existing posts endpoint
  • Depending if it is implemented first in Gutenberg, an initial step of server-side block supports is filterable block registration, proposed at Trac#49615.

@aduth aduth closed this Mar 10, 2020
@aduth aduth deleted the try/server-source-parsing branch March 10, 2020 16:12
@gziolo
Copy link
Member

gziolo commented Mar 11, 2020

  • I believe @gziolo has his eyes on server-registering of block.json

Yes, the plan is to introduce new helper method that allows registering block using a new utility function that works with block.json as discussed in #19786 (comment). In addition to that, we have WP-CLI changes tracked in wp-cli/scaffold-command#141 to make it possible to include translatable strings in block.json.


/**
* Given a registered block type settings array, assigns default attributes.
* This must be called manually, as there is currently no way to hook to block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, that register_block_type_args filter was introduced with https://core.trac.wordpress.org/ticket/49615, we can add this functionality in WordPress core. I will extract this function and propose it as an enhancement to the registration process on the server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is ready at WordPress/wordpress-develop#383.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f REST API Interaction Related to REST API [Type] Technical Prototype Offers a technical exploration into an idea as an example of what's possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants