-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In scripts expected to have JSON, ensure the JSON is valid #4340
Conversation
For scripts with type="application/json" or type="application/ld+json", ensure that its content is valid JSON.
@@ -10,6 +10,7 @@ | |||
"ext-date": "*", | |||
"ext-dom": "*", | |||
"ext-iconv": "*", | |||
"ext-json": "*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhpStorm reported the lack of this in composer.json
.
It looks like json_decode(), which AMP_HTML_Utils::is_valid_json()
calls, is from the json
extension.
And Jetpack also uses this in composer.json
.
Change the way of detecting invalid JSON. It looks like the previous way didn't account for empty JSON. Still, I might revert this.
This looks to be related to a failed Travis build: https://travis-ci.org/ampproject/amp-wp/jobs/656496431#L952
Hi @westonruter, Thanks, Weston! |
@@ -94,7 +90,7 @@ jobs: | |||
|
|||
- name: E2E tests | |||
php: "7.3" | |||
env: WP_VERSION=latest DEV_LIB_SKIP=phpcs,eslint,xmllint,phpsyntax,phpunit PUPPETEER_SKIP_CHROMIUM_DOWNLOAD= | |||
env: WP_VERSION=latest DEV_LIB_SKIP=phpcs,eslint,xmllint,phpsyntax,phpunit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like WordPress/gutenberg#20547. It's because of a failed E2E
Travis job.
// When the CDATA is expected to be JSON, ensure it's valid JSON. | ||
if ( 'script' === $element->tagName && | ||
in_array( $element->getAttribute( 'type' ), [ 'application/json', 'application/ld+json' ], true ) && | ||
null === json_decode( $element->textContent ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right. What if the JSON being decoded is itself null
? This needs to make use of json_last_error()
to detect the error scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ea26570 uses json_last_error()
instead.
in_array( $element->getAttribute( 'type' ), [ 'application/json', 'application/ld+json' ], true ) && | ||
null === json_decode( $element->textContent ) | ||
) { | ||
return [ 'code' => self::INVALID_JSON_CDATA ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could include the JSON error code as part of the error data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the new error message added in ea26570. How does that look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most common message would probably be probably 'Syntax error'.
So far, these look to mainly be 'Invalid syntax'
It looks like json_last_error() will not return an error if passed a falsy value. In the Changelog: "7.0.0 An empty PHP string or value that after casting to string is an empty string (NULL, FALSE) results in JSON syntax error." https://www.php.net/manual/en/function.json-decode.php
// When the CDATA is expected to be JSON, ensure it's valid JSON. | ||
if ( 'script' === $element->tagName && in_array( $element->getAttribute( 'type' ), [ 'application/json', 'application/ld+json' ], true ) ) { | ||
if ( empty( $element->textContent ) ) { | ||
return [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in PHP 5.6 and earlier, json_last_error( )
returns JSON_ERROR_NONE
after json_decode( '' )
.
In the Changelog:
7.0.0 | An empty PHP string or value that after casting to string is an empty string (NULL, FALSE) results in JSON syntax error.
These failed unit tests are from that different behavior, I think.
if ( empty( $element->textContent ) ) { | ||
return [ | ||
'code' => self::INVALID_JSON_CDATA, | ||
'message' => esc_html__( 'Expected JSON, got an empty value', 'amp' ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// When the CDATA is expected to be JSON, ensure it's valid JSON. | ||
if ( 'script' === $element->tagName && in_array( $element->getAttribute( 'type' ), [ 'application/json', 'application/ld+json' ], true ) ) { | ||
if ( empty( $element->textContent ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As opposed to empty()
should this check if '' === trim( $element->textContent )
? Does that also result in JSON_ERROR_NONE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PHP 7.0+, json_decode( trim( ' ' ) )
results in json_last_error()
returning JSON_ERROR_SYNTAX
.
But your suggestion would probably give a more useful message to the user.
If $element->textContent
is ' '
, having a message that it's empty is probably more useful than a message about invalid syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in 81ea67d
if ( JSON_ERROR_NONE !== json_last_error() ) { | ||
return [ | ||
'code' => self::INVALID_JSON_CDATA, | ||
'message' => esc_html( json_last_error_msg() ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using esc_html() isn't right here, as this is not being printed here. No escaping/sanitization should be necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, applied in 81ea67d
[ AMP_Tag_And_Attribute_Sanitizer::INVALID_JSON_CDATA ], | ||
], | ||
'cdata_malformed_json_with_emojis' => [ | ||
'<html><head><meta charset="utf-8"><script type="application/ld+json">{"wrong": 🚧 🚧 }</script></head><body></body></html>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't quite capturing the original issue observed. This should do it:
'<html><head><meta charset="utf-8"><script type="application/ld+json">{"wrong": 🚧 🚧 }</script></head><body></body></html>', | |
'<html><head><meta charset="utf-8"><script type="application/ld+json">{"wrong": "<?php echo wp_staticize_emoji( "🚧 🚧" ); ?>"}</script></head><body></body></html>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, committed in 5b187d3
if ( empty( $element->textContent ) ) { | ||
return [ | ||
'code' => self::INVALID_JSON_CDATA, | ||
'message' => esc_html__( 'Expected JSON, got an empty value', 'amp' ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'message' => esc_html__( 'Expected JSON, got an empty value', 'amp' ), | |
'message' => __( 'Expected JSON, got an empty value', 'amp' ), |
Escaping should be done at printing not storage. The validation error may be sent back via JSON where HTML entities should not be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note below about how having a more specific error code for this case could be preferable, as the error message could be translated for the user at display time.
json_decode( $element->textContent ); | ||
if ( JSON_ERROR_NONE !== json_last_error() ) { | ||
return [ | ||
'code' => self::INVALID_JSON_CDATA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include the JSON error code (perhaps json_error
). Ultimately, this may be preferable than using the message
as it would mean the strings could be translated, e.g. using https://www.php.net/manual/en/function.json-last-error-msg.php#117393
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather, each of the error codes could have their own dedicated validation error codes. So instead of just INVALID_JSON_DATA
there could be one for each of the constants in https://www.php.net/manual/en/function.json-last-error.php
Then the specific error message could be displayed at runtime rather than being stored in the actual error object, allowing for translations tailored to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, these error codes could be from the constants that json_last_error()
returns. I'll apply that later tonight if that's alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the specific error message could be displayed at runtime rather than being stored in the actual error object, allowing for translations tailored to the user.
Sure, it'd be good to have actual translations of these error messages.
But do you have in mind passing each of the 10 error messages through __()
, like:
private static function get_error_message() {
...
case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_DEPTH:
return __( 'The maximum stack depth has been exceeded', 'amp' );
case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_STATE_MISMATCH:
return __( 'Invalid or malformed JSON', 'amp' );
etc...
This could get pretty verbose, but maybe I'm not understanding this.
Thanks, Weston!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, something like that. Maybe let the error title be normalized to just “Invalid JSON” for each error code, but then have the message have more elaboration, for example:
diff --git a/includes/validation/class-amp-validation-error-taxonomy.php b/includes/validation/class-amp-validation-error-taxonomy.php
index 6a6158c61..ed6c49e40 100644
--- a/includes/validation/class-amp-validation-error-taxonomy.php
+++ b/includes/validation/class-amp-validation-error-taxonomy.php
@@ -1833,8 +1833,22 @@ class AMP_Validation_Error_Taxonomy {
$content .= '</p>';
}
- if ( isset( $validation_error['message'] ) ) {
- $content .= sprintf( '<p>%s</p>', esc_html( $validation_error['message'] ) );
+ $message = null;
+ switch ( $validation_error['code'] ) {
+ case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_DEPTH:
+ $message = __( 'The maximum stack depth has been exceeded', 'amp' );
+ break;
+ case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_STATE_MISMATCH:
+ $message = __( 'Invalid or malformed JSON', 'amp' );
+ break;
+ // ...
+ default:
+ if ( isset( $validation_error['message'] ) ) {
+ $message = $validation_error['message'];
+ }
+ }
+ if ( $message ) {
+ $content .= sprintf( '<p>%s</p>', esc_html( $message ) );
}
break;
@@ -3099,6 +3113,11 @@ class AMP_Validation_Error_Taxonomy {
}
return $title;
+ case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_DEPTH:
+ case AMP_Tag_And_Attribute_Sanitizer::JSON_ERROR_STATE_MISMATCH:
+ // ...
+ return __( 'Invalid JSON', 'amp' );
+
default:
/* translators: %s error code */
return sprintf( __( 'Unknown error (%s)', 'amp' ), $validation_error['code'] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, how does 47565f1 look?
JSON_ERROR_EMPTY
isn't one of the json_last_error() values, but I made it up for <script>
textContent of ''
or ' '
.
If it's alright, later today I'll add a test for AMP_Validation_Error_Taxonomy::filter_manage_custom_columns()
that covers that new switch
statement for the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
47b8e87 adds a test for the error message.
Commit Weston's suggestion for the unit test Co-Authored-By: Weston Ruter <westonruter@google.com>
This should give a more useful message than an invalid syntax error.
Pass these to __(), and only get the message when they're about to be displayed.
It should appear in the custom column, based on whether it's present in the validation error, or whether it can be found from the error code, like with JSON errors.
This can have null, so this should be string|null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure for Test_AMP_Validation_Manager::test_has_parameters_passed_by_reference
is not caused by the code here, but by a change in core: WordPress/wordpress-develop@f40c6c9#diff-1f96a4b7aed0d895bbbcfc265fcb4fc6
This can be fixed in a separate PR.
$filtered_content = AMP_Validation_Error_Taxonomy::filter_manage_custom_columns( $initial_content, 'error_code', $term_id ); | ||
|
||
$this->assertStringStartsWith( $initial_content . '<button type="button" aria-label="Toggle error details"', $filtered_content ); | ||
$this->assertStringContainsString( $expected_error_message, $filtered_content ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is causing test failures:
PHP Fatal error: Call to undefined method Test_AMP_Validation_Error_Taxonomy::assertStringContainsString() in /tmp/wordpress/src/wp-content/plugins/amp/tests/php/validation/test-class-amp-validation-error-taxonomy.php on line 1199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing that. ae5df63 fixes it.
Requested changes have been implemented.
This is being fixed in #4357. |
It looks like assertStringContainsString() caused a failure, as Weston mentioned.
…validation-error * 'develop' of github.com:ampproject/amp-wp: (151 commits) Remove need to iterate over nodesToRemove a second time Fix static analysis of expiry param Improve static analysis of Document::normalizeHtmlAttributes() Improve phpdoc, formatting, and return value Add curl_getinfo() and curl_errno() to list of required cURL functions Remove obsolete Amp\AmpWP\Filter missed in 179b49b Expose AMP optimizer errors in HTML comment at end of head Omit AmpRuntimeCss transformer if SSR is not enabled Add angle brackets to ElementDump output Remove dead boilerplate code from AMP_Meta_Sanitizer Improve robustness of getting error code from class after 938bbf4 Fix removal of namespace when deriving error code from class name Revert deprecation expectation in helper tests for stylesheet functions Capitalize AMP Fix tests for meta sanitizer and bug in Dom/Document Undeprecate boilerplate stylesheet functions Remove node deduplication in meta sanitizer Fix validation errors when optimizer is disabled Add cache-control support Introduce Response object to pass headers together with body for remote requests ...
…validation-error * 'develop' of github.com:ampproject/amp-wp: (58 commits) Remove vendor dirs for library if they exist Remove dotenv config Update package-lock.json Update dependency babel-eslint to v10.1.0 (#4344) Update dependency @wordpress/e2e-test-utils to v4.3.1 (#4360) Update dependency react-dom to v16.13.0 (#4348) Update dependency uuid to v7 (#4325) Update dependency eslint-plugin-jest to v23.8.2 (#4346) Update dependency cross-env to v7.0.2 (#4359) Remove now unneeded dotenv dependency Remove extraneous closing brace from add_nav_menu_styles Used fixed-height layout for iframe Add missing allow-popups sandbox for iframe Re-add image assets for tests Improve phpdoc, method names, and remove unnecessary global namespace qualifier Throw Exception instead of Error Wait for server to start Improve static analysis for passing ownerDocument Do capital_K_dangit for TikTok Update namespace reference after #4364 ...
Summary
<script>
tags withtype="application/json"
ortype="application/ld+json"
, this ensures that thetextContent
is valid JSON, and produces aINVALID_JSON_CDATA
error if it's not.wp_staticize_emoji()
from making the emojis invalid. But it now reports emojis that have become invalid due to that issue.INVALID_JSON_CDATA
error in the plugin:Testing
One should be from the first
<script>
in theadd_action( 'wp_head' ...
call above, and one should be from the Custom HTML block.The bottom two scripts in the
add_action( 'wp_head' ...
call should not cause a validation error:Fixes #4320
Checklist