RichText state structure for value manipulation #7890

ellatrix · 2018-07-11T14:07:40Z

Description

Read about the functions for manipulation.

Why a different `RichText` internal state structure?

Currently RichText values is internally structured as tree of HTML nodes that can map to the displayed live DOM. When the value needs to be manipulated, we are either looking at the live DOM or the internal array structure.

Current parsing and serialisation

Raw parsed blocks have a blob of HTML as their content, from which attributes are sourced according to the block’s attribute schema. This is done by parsing the HTML with the browser. For RichText values, the in memory DOM tree is then converted to an array structure of the same shape. In the RichText editor, the values are applied by setting the HTML (element.innerHTML). On Input, the live DOM is converted as well to an array structure. Any manipulation that is done by us happens by manipulation the live DOM directly, or by manipulating the array structure, which is then set as HTML in the editor.

Proposed parsing and serialisation: controlled component

Ideally, we’d like to avoid manipulating the live DOM directly, be it though TinyMCE APIs or not. It is both very fragile (prone to errors) and not so easy to do. Say you’d like to automatically format some text that matches @user, and complete it based on suggestion, you’d have to merge all text children of element nodes in the tree before the selection to be able to search, split elements and text nodes in order to format the matching text, and move selection using the Selection API. Formatting etc. is similarly complex within TinyMCE. See https://github.com/tinymce/tinymce/tree/master/src/core/main/ts/fmt. A tree shape is not so easy to manipulate, wether it is the live DOM or an array structure. A flat structure would be easier, where text and formats are separated. I’m not proposing an alternative format for saving or displaying the data. HTML remains perfect to store the value, and the DOM needs to be used, in the browser, to display the data in a contentEditable container.

The best structure I’ve found was one where the text and formats are separated. The text would be just a string, which becomes easy to search. The formats are stored in a sparse array of the same length as the text. Each index of the array maps to a character in the string, and formats are reference to a unique format variation. RichText values become very easy to manipulate like this, imagine for example splitting, merging, slicing, splicing, applying and removing formats. Empty checking is simply a matter of checking the string. Things like an empty HTML element and empty text nodes are not possible.

Example:

<p>one <em><a href="https://w.org">two</a></em></p>

// All formats are referenced, this is to illustrate.
const format1 = { name: 'italic' };
const format2 = { name: 'link', url: 'https://w.org' };

{
	formats: [
		,
		,
		,
		,
		[ format1, format2 ],
		[ format1, format2 ],
		[ format1, format2 ],
	],
	text: 'one two',
}

New formats could be registered with their own view:

<p>one <span class="mention">two</span></em></p>

const format1 = { name: 'mention' };
{
	formats: [
		,
		,
		,
		,
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'one two',
}

We could also support some meta date here if we need to save extra information e.g. attributes or a user ID.

Because there is a textual representation, RichText instances could be searched and formatted easily, e.g. think about auto-formatting @mentions and marking pieces of text for spell checking or comments.

Another benefit is that there is only one right way to map this to HTML. Merging

const format1 = { name: 'italic' };

{
	formats: [
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'one',
}

const format1 = { name: 'italic' };

{
	formats: [
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'two',
}

will result in onetwo. Not onetwo.

How has this been tested?

Gutenberg should load without errors. Test splitting and merging, formatting, inline tokens, text patterns, blocks transformations.

Screenshots

Types of changes

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.

brandonpayton · 2018-07-15T18:44:18Z

Instead I'd like to propose an alternative structure where the text content and formatting are separated.

Thanks so much for working on this. We really need it. It would allow us to style autocompletions like @-mentions based on a pattern match rather than modifying the actual content like #6577 does. This would be cleaner, more flexible, and less error-prone.

brandonpayton

This seems like a really good direction if some pattern transforms can be shared across all RichText instances for things like @-mentions. In general, the changes felt like they were simplifying. I like how this is consolidating text-related logic in rich-text-structure.

brandonpayton · 2018-07-15T18:49:04Z

blocks/api/parser.js

@@ -157,6 +159,11 @@ export function parseWithAttributeSchema( innerHTML, attributeSchema ) {
 */
 export function getBlockAttribute( attributeKey, attributeSchema, innerHTML, commentAttributes ) {
 	let value;
+
+	if ( attributeSchema.source === 'rich-text' ) {
+		attributeSchema.default = create();


It might be worth aliasing the create import with a more specific name to clarify what is being created.

We can rename all this sure, atm it's maybe a bit less important while it's in progress.

brandonpayton · 2018-07-15T19:12:21Z

blocks/api/rich-text-structure.js

+	} = settings;
+
+	return Array.from( element.attributes ).reduce( ( acc, { name, value } ) => {
+		if ( ! removeAttributeMatch( name ) ) {


Real, non-rhetorical question:
Would there be any value in providing more context like the element's name to removeAttributeMatch?

Sure, maybe. Atm the goal is just to be able to strip mce specific stuff.

brandonpayton · 2018-07-15T19:23:19Z

blocks/api/rich-text-structure.js

+		formats.splice( start, end - start + 1 );
+		record = { formats, text };
+	} else {
+		// Delete from highest to lowest.


It may be good to say why we're deleting from highest to lowest.

Sorry if there are any poor comments, it's a WIP, this particular one will actually be removed again.

Sorry for the nitpicks. :) The intent was to share thoughts as I had them, but it wasn't super helpful for a WIP.

brandonpayton · 2018-07-15T19:27:27Z

blocks/api/rich-text-structure.js

@@ -0,0 +1,271 @@
+/**


These tests were very helpful in understanding how the rich-text-structure functions work. Thank you.

brandonpayton · 2018-07-15T20:20:12Z

editor/components/rich-text/index.js

-		this.savedContent = this.getContent();
+		const record = this.getContent();
+
+		this.savedContent = this.patterns.reduce( ( accu, transform ) => transform( accu ), record );


Does this mean that transforms due to matched patterns will be persisted in the post content? That seems clear here but is different than I've been mentally modeling this.

Here's my understanding of how we would apply this for @-mentions:

Adding formatting: Text matching a pattern like /@\w+/ would be formatted as an @-mention by wrapping it like .

Removing formatting: A  whose text content does not begin with an @ would have the formatting removed by replacing the span with its text content.

Is that roughly correct?

This one currently is just for the patterns without the mentions in mind. They're a bit different. The patterns expect a piece of text to be transformed from one thing to another. The mentions merely expect some piece of text to be displayed differently (and maybe allow to have some meta info). I'll look at the mentions shortly.

Cool. Thanks for clarifying.

ellatrix · 2018-07-27T15:21:42Z

@brandonpayton I update the autocompleter to use the rich text structure. I think this simplifies it a lot, but it needs some more polishing for sure there. Also wondering if it should maybe be moved in the rich text folder...

brandonpayton

I love these changes, and it's great to see the Autocomplete component simplified. Thank you.

Also wondering if it should maybe be moved in the rich text folder...

That seems like a good move to me. Autocomplete already had things that made it more editor-related than generic. Even before this PR, its implementation assumed wrapping something like a contenteditable and had the onReplace prop which implies use in a block context. IMHO, bringing it even closer to rich-text to work with a more consistent and reliable editor layer is a win.

brandonpayton · 2018-07-30T20:24:15Z

packages/components/src/autocomplete/index.js

-			inputEvent.initEvent( 'input', true /* bubbles */, false /* cancelable */ );
-		}
-		range.commonAncestorContainer.closest( '[contenteditable=true]' ).dispatchEvent( inputEvent );
+		replacement = replacement.slice( 1 + this.state.query.length );


Is this to account for the prefix character that triggers the completer? If so, what do you think about multi-character triggers?

No, it's to replace the prefix and the already typed characters to minimise updates. So a multi character trigger can still work, in fact, there's no longer such a thing, you can complete customise what to match and how to match it.

Ah, OK. That helps.

There are a couple of assumptions here that may not hold for all completers.

This line relies on completions always finishing what is being typed (e.g., +doto being completed with +dotorg), but completions may be anything and aren't constrained to what the user typed. For example, a fruit completer could complete "!apple" with '🍎'. This means we need to replace the entire substring corresponding to the completer query rather than simply adding the part that isn't there.

It's also possible for completers to return wp.element declarations, so eventually this method needs to to account for that.

That's a very good point, thank you!

The apple example seems to be something that closer to a pattern like `test`... I wonder if we can somehow unify all of these little functionalities.

I wonder if we can somehow unify all of these little functionalities.

I like the direction you're going with the token configuration here. I'm not sure if the current completer interface will ultimately have a role here and need to sleep on it to give my mind some time to adjust. :)

It's also possible for completers to return wp.element declarations, so eventually this method needs to to account for that.

Since we're allowing matched tokens to be either selected from a list or typed in full, maybe we should reconsider allowing completions to be wp.element declarations. If we restrict to text completions, completions would be simpler, and there could be a single place to declare token formatting and a single execution path for applying that formatting.

brandonpayton · 2018-08-01T05:43:54Z

packages/components/src/autocomplete/index.js

+		if ( richTextStructure.isCollapsed( record ) ) {
+			const text = richTextStructure.getTextContent( record );
+			const prevText = richTextStructure.getTextContent( prevRecord );
+			const textToSearch = text.slice( 0, record.selection.start );


I took some time to build and play with this PR and noticed typing "Hello, @-exampl" led to completer's test and getQuery functions being called with "Hello, @-exampl" instead of "@-exampl". This gives more power to completers, but I wonder whether it would be simpler for authors if we only passed "@-exampl" which is the text between the cursor and the nearest preceding space. What do you think?

Yeah, for sure, that sounds good. Not sure about keeping the trigger, let's see when it all works properly. Atm the autocomplete also pops up for something like text @so|omeone with an item for e.g. someone. We'll need to avoid this somehow. Unsure how this worked before. This PR touches so many components that I'm wondering if you'd be able to help out a bit with the autocomplete component, mainly making sure all current functionality is present?

Sure, I'm happy to help update the autocomplete component.

My family is moving in a little over a week, and I'll be taking a few days off then. Hopefully, that won't be an issue, but I'll try to keep a close eye on things and do what I can around that time.

brandonpayton · 2018-08-01T05:46:00Z

packages/editor/src/components/autocompleters/block.js

@@ -58,7 +58,17 @@ export function createBlockCompleter( {
 	return {
 		name: 'blocks',
 		className: 'editor-autocompleters__block',
-		triggerPrefix: '/',


The power of the new test and getQuery functions is cool, but I'm wondering if it would be good to continue supporting a trigger property for people getting started or for those who don't need the power.

Ok, let's leave the API alone until we work on the formatting API in the next PR. We should just focus on replacing the content change detection and manipulation logic.

Yes, it would be another breaking change. What we could instead as an enhancement to allow a function to be passed as triggerPrefix - this would be quite similar to what Webpack does with its options.

youknowriad · 2018-10-01T23:06:29Z

Fantastic @iseulde

tofumatt · 2018-10-01T23:13:38Z

So good!

gziolo · 2018-10-02T04:06:31Z

Love the changes added in 94352e7. Ace work on this PR @iseulde 👏 Thanks for being patient with all our feedback 😃

hypest · 2018-10-02T12:46:56Z

🎉 Thanks for the effort here @iseulde and thanks for "minding the mobile" and working closely with the mobile team! 🙇

aduth · 2018-10-03T14:42:04Z

packages/rich-text/package.json

+	"react-native": "src/index",
+	"dependencies": {
+		"@babel/runtime": "^7.0.0",
+		"@wordpress/escape-html": "file:../escape-html",


This introduces uncommitted changes to the local branch after npm install. Please review #10234

Do you mean package lock? See #10304.

aduth · 2018-10-05T20:23:57Z

packages/rich-text/src/index.js

+export { insert } from './insert';
+export { slice } from './slice';
+export { split } from './split';
+export { apply, toDom as unstableToDom } from './to-dom';


toDom does not follow our naming conventions. It should be toDOM.

https://github.com/WordPress/gutenberg/blob/master/docs/reference/coding-guidelines.md#abbreviations-and-acronyms

ktmn · 2018-10-09T16:44:20Z

Quick question about this rich text, previously if I had a template like this:

<wp.editor.InnerBlocks template={[
	['core/paragraph', {
		content: ['Default content'],
	}],
]} />

Now with array it doesn't work, string does:

<wp.editor.InnerBlocks template={[
	['core/paragraph', {
		content: 'Default content',
	}],
]} />

And adding a line break, for example, worked like this:

<InnerBlocks template={[
    ['core/paragraph', {
        content: [
            'Hello',
            {
                type: 'br',
                props: {
                    children: [],
                },
            },
            'world',
        ],
    }],
]} />

How would I do that now? I tried content: wp.richText.create('Hello world') but that was not it. Do I need like an object with formats and text or something?

The docs don't really say much, create( ?input: Element | string, ?range: Range, ?multilineTag: string, ?settings: Object ): Object, whats the range, what are the settings?

youknowriad · 2018-10-09T17:15:22Z

@ktmn We're still in the process of ensuring the best upgrade experience without breakage for existing templates. See #10370

ktmn · 2018-10-10T06:04:09Z

@youknowriad Ok I see, thanks. Another thing that seems broken for me is when a block previously had an attribute

content: {
	type: 'array',
},

that is used in wp.editor.RichText then it throws

Do all the richtext attribute types need to be changed, and if so, to what type? And do they need a default value? I tried rendering wp.editor.RichText without any value prop and it still threw it, so does it need something from me?

Basically I shouldn't worry about current version of Gutenberg at all, it's supposed to be broken? I wanted to develop some blocks that I put aside waiting for the non-paragraph block inserter, now that it's merged I grabbed the latest version but I should have installed something earlier, this is completely WIP?

youknowriad · 2018-10-10T07:56:22Z

Basically I shouldn't worry about current version of Gutenberg at all, it's supposed to be broken?

Short answer, yes, we're working on it. The migration path is not fully decided.

aduth · 2018-11-12T16:10:17Z

packages/escape-html/src/index.js

+ *
+ * @return {string} Escaped HTML element value.
+ */
+export function escapeHTML( value ) {


The module is called escape-html but all it handles is specifically an attribute value? Isn't that a bit misleading?

It also exports escapeHTML?

It also exports escapeHTML?

I think I misinterpreted this to only be handling attribute values.

aduth · 2018-11-27T22:40:41Z

packages/rich-text/src/to-tree.js

+
+	append( tree, '' );
+
+	for ( let i = 0; i < formatsLength; i++ ) {


Since formats is a sparse array, would it be worth considering using forEach to skip over the unformatted segments?

forEach calls callbackfn [...] only for elements of the array which actually exist; it is not called for missing elements of the array.

https://www.ecma-international.org/ecma-262/5.1/#sec-15.4.4.18

When a format is encountered and at the end of the text, we could simply append the unformatted text in a single go. This would also be much more performant since append is a non-trivial operation.

Benchmarks:

forEach is 84x faster iterating an array of 100 empty values (e.g. a paragraph with 100 characters and no formatting)

append is 17x faster when appending in one go, rather than one character at a time

Yes, maybe we can do this. I'd have to explore it, so far didn't have the time. Formats that stay equal could also be appended in one go. Also note that appendData, not append, which probably doesn't make it any faster.

aduth · 2018-11-27T22:52:42Z

packages/rich-text/src/to-dom.js

+}
+
+function createEmpty( type ) {
+	const { body } = document.implementation.createHTMLDocument( '' );


This has frequently turned up in performance profiling as an expensive operation, which on reflection is understandable given that it's creating an entirely new document. I think we should hold on to a constant reference of a document body and reset it each time we call createEmpty.

Benchmark: Creating a "fresh" body using innerHTML to reset is 646x faster than via document.implementation.createHTMLDocument

Yes, I saw the other PR. We can do the same here.

aduth · 2018-11-27T22:57:45Z

packages/rich-text/src/to-tree.js

+	const formatsLength = formats.length + 1;
+	const tree = createEmpty( tag );
+
+	append( tree, '' );


Why do we append an empty string?

phpbits · 2019-05-23T09:08:52Z

@ellatrix I hope it's fine to ask you a question here. Can you point me to the right direction on adding custom markdown or patterns? Inline code markdown is registered on getPatterns in which I'm interested on extending.

I've also figured out that value.text on formats returns the current RichText contents but there's no way to manipulate it directly. If you could help me out, that'll be really great. Thanks!

ellatrix · 2019-05-27T14:53:31Z

@phpbits Currently there's no way to do this as a plugin as we do it in core. We still need an API for this. :) That's why the code is not in the format library yet. Feel free to create a separate issue.

phpbits · 2019-05-28T07:16:01Z

Thanks @ellatrix! I've finally figured it out after couple of days trying :) I've created my own component following what you already have on code formatting. Here's what I've got so far : https://twitter.com/phpbits/status/1132988500196773888 👍

ellatrix added the [Status] In Progress Tracking issues with work in progress label Jul 11, 2018

ellatrix mentioned this pull request Jul 12, 2018

Expose TinyMCE editor content manipulation APIs in Gutenberg #3688

Closed

ellatrix force-pushed the try/rich-text-record branch from 590cbc0 to 4359e51 Compare July 12, 2018 09:09

aduth mentioned this pull request Jul 12, 2018

Rich Text: Formalize RichText children value abstraction #7934

Merged

gziolo added this to In Progress in API freeze via automation Jul 13, 2018

gziolo assigned ellatrix Jul 13, 2018

brandonpayton reviewed Jul 15, 2018

View reviewed changes

brandonpayton mentioned this pull request Jul 17, 2018

Treat autocompletions as tokens with editing boundaries #6577

Closed

4 tasks

ellatrix force-pushed the try/rich-text-record branch 2 times, most recently from f67f1c2 to 7e474eb Compare July 25, 2018 13:31

ellatrix force-pushed the try/rich-text-record branch from ce220f5 to c30eeed Compare July 30, 2018 17:32

brandonpayton reviewed Jul 30, 2018

View reviewed changes

brandonpayton reviewed Aug 1, 2018

View reviewed changes

This comment has been minimized.

Sign in to view

ellatrix mentioned this pull request Aug 2, 2018

Remove oEmbed fixture files #8393

Merged

ellatrix force-pushed the try/rich-text-record branch 2 times, most recently from 754bf06 to 91dbec9 Compare August 7, 2018 11:08

This comment has been minimized.

Sign in to view

ellatrix force-pushed the try/rich-text-record branch from 2064dfe to ad5bae6 Compare August 7, 2018 17:27

This comment has been minimized.

Sign in to view

ellatrix deleted the try/rich-text-record branch October 1, 2018 23:03

ellatrix mentioned this pull request Oct 3, 2018

RichText State Structure #771

Closed

ellatrix added a commit that referenced this pull request Oct 3, 2018

Fix typos after #7890

67cbba0

aduth reviewed Oct 3, 2018

View reviewed changes

ellatrix added a commit that referenced this pull request Oct 3, 2018

Fix package lock and typos after #7890 (#10304)

4691baf

aduth reviewed Oct 5, 2018

View reviewed changes

talldan mentioned this pull request Oct 5, 2018

Editing a link causes the link url to be inserted into the text #10368

Closed

saulirajala mentioned this pull request Oct 7, 2018

Cannot edit links to open in external window in Gutenberg #10088

Closed

daniloercoli mentioned this pull request Oct 8, 2018

Update GB hash to latest master wordpress-mobile/gutenberg-mobile#158

Merged

aduth reviewed Nov 12, 2018

View reviewed changes

aduth mentioned this pull request Nov 27, 2018

Rich Text: Remove componentDidUpdate deep equality conditions #7648

Closed

aduth reviewed Nov 27, 2018

View reviewed changes

aduth mentioned this pull request Nov 28, 2018

RichText: Reuse DOM document across calls to createEmpty #12402

Merged

aduth mentioned this pull request Mar 20, 2019

Add a sanitize package #7824

Closed

aduth mentioned this pull request Jun 17, 2022

Add CSV Download button to reports (LG-5913) GSA-TTS/identity-reporting#57

Merged

ellatrix mentioned this pull request Sep 9, 2022

Blocks: remove deprecated node and children matchers #44035

Closed

ellatrix mentioned this pull request Sep 19, 2022

Blocks: officially deprecated the children and node matchers #44265

Merged


		append( tree, '' );

		for ( let i = 0; i < formatsLength; i++ ) {

RichText state structure for value manipulation #7890

RichText state structure for value manipulation #7890

Conversation

ellatrix commented Jul 11, 2018 • edited by mcsf

Description

Read about the functions for manipulation.

Why a different RichText internal state structure?

Current parsing and serialisation

Proposed parsing and serialisation: controlled component

How has this been tested?

Screenshots

Types of changes

Checklist:

brandonpayton commented Jul 15, 2018

brandonpayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ellatrix commented Jul 27, 2018

brandonpayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brandonpayton Aug 1, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

youknowriad commented Oct 1, 2018

tofumatt commented Oct 1, 2018

gziolo commented Oct 2, 2018

hypest commented Oct 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmn commented Oct 9, 2018

youknowriad commented Oct 9, 2018

ktmn commented Oct 10, 2018

youknowriad commented Oct 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aduth Nov 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phpbits commented May 23, 2019

ellatrix commented May 27, 2019

phpbits commented May 28, 2019

ellatrix commented Jul 11, 2018 •

edited by mcsf

Why a different `RichText` internal state structure?

brandonpayton Aug 1, 2018 •

edited

aduth Nov 27, 2018 •

edited