Slate/HTML serializers and deserializers #101

tiberiuichim · 2022-11-28T22:19:24Z

The html2slate and slate2html are the value here.

For the curious, there's an htmlblock.zcml that you can include, which will override the way slate blocks are stored, making them always store as HTML in ZODB, with python-side serializer/deserializer.

Checklist:

Solve resiliparse dependency problem
Documentation
Rewrite the link support to the model is used in Volto, it's the old EEA style links

mister-roboto · 2022-11-28T22:19:29Z

@tiberiuichim thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

tiberiuichim · 2022-11-29T21:27:55Z

@jenkins-plone-org please run jobs

tiberiuichim · 2022-11-29T21:40:47Z

Jenkins is complaining that I've added a new dependency and it's not pinned.

What's the procedure for this?

Resiliparser is the only Python HTML parser that represents its data as DOM Nodes (Document, Element, Inlines and Text nodes) and so it's the only one that I could use in a convenient manner to preserve the perception-based rendering of HTML in browser. When I've initially created eea.volto.slate back in 2021, this parser didn't exist (and I was an ignorant back then to all the complexities of this problem space). So, I'm ready to be pointed to another option, but only if it can do browser-style dom parsing. html5lib won't cut it. It doesn't expose TextNodes, all text between tags is exposed as node.tail. So, a wrapper could be done on top of it, but it's more work then it's worth.

davisagli · 2022-11-29T22:20:37Z

@tiberiuichim I'm not familiar with the motivation for this work, so can you say a bit more about why you want to store blocks as HTML? My initial gut reaction is that it sounds like it adds unnecessary work at the time of serialization and deserialization, and also adds complexity. Is there a benefit that makes that worthwhile?

Regarding Resiliparser, I would want to avoid adding a new dependency if possible. Did you consider BeautifulSoup (bs4), which is already a dependency of Plone? It looks like it exposes each bit of text as a https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring -- but maybe there is some other thing it doesn't handle in the way you need?

tiberiuichim · 2022-11-30T08:16:49Z

I do wonder if it should go in a separate library, so that it could be used by a Python client of the REST API without depending on all of Plone.

The html2slate.py and slate2html.py are not dependent on any Plone. If anyone needs them in their own projects, they can just copy them.

tiberiuichim · 2022-12-01T07:01:40Z

@jenkins-plone-org please run jobs

tiberiuichim · 2022-12-01T14:40:33Z

I think it's ready.

tiberiuichim · 2022-12-01T17:56:44Z

@jenkins-plone-org please run jobs

davisagli

@tiberiuichim Here's my first pass on reviewing. So far I looked at the tests to make sure that what it's doing makes sense, but I didn't yet look at how it's doing it.

README.rst

DEVELOPING.md

README.rst

davisagli · 2022-12-05T04:58:33Z

src/plone/volto/tests/data/4.html

@@ -0,0 +1 @@
+<p style="text-align:center" class="styled"><b><span data-slate-data="{&quot;type&quot;:&quot;dataentity&quot;,&quot;data&quot;:{&quot;column&quot;:&quot;number_total_sites&quot;,&quot;provider_url&quot;:&quot;/data/countries-protected-areas-statistics&quot;}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p>


There's no json file for tests 3 and 4?

@tiberiuichim are these used in the tests?

src/plone/volto/tests/data/5.json

src/plone/volto/tests/data/6-1.html

src/plone/volto/tests/test_html2slate.py

stevepiercy

Minor grammar fixes to README.rst

README.rst

Co-authored-by: David Glick <david@glicksoftware.com> Co-authored-by: Steve Piercy <web@stevepiercy.com>

tiberiuichim · 2022-12-06T07:21:26Z

@jenkins-plone-org please run jobs

davisagli

@tiberiuichim Sorry for taking so long to get back to this. I've now done a pass reading through the implementation. For something like this it's hard to say when enough testing has been done, since there are always edge cases. It would be possible to go borrow some more test cases from blocks-conversion-tool. But, maybe it makes as much sense to merge it as is and let someone try it with real data.

davisagli · 2022-12-15T04:31:29Z

DEVELOPING.md

+- Run `make build` to build the Plone backend
+- Run `make start` to start the Plone backend
+- Run `make test` to run the tests.
+- Run `bin/zope-testrunner --auto-color --auto-progress --test-path src -t name_of_test` to run a particular test


@tiberiuichim I usually use the coredev buildout for this (as well as working on other packages in Plone core). I think either way is fine but for the sake of comparison:

clone git@github.com:plone/buildout.coredev.git

edit checkouts.cfg to specify which packages to check out from github

run make to build

edit in src/plone.volto

run bin/instance fg to start the backend

run bin/test -s plone.volto to run all tests from plone.volto

run bin/test -t [name] to run a specific test

davisagli · 2022-12-15T04:33:24Z

README.rst

+These two classes can be inherited and extended for your custom elements and
+plugins. To handle any custom element, you need to provide a method called
+``handle_tag_<elementname>``. For example, if you have a custom element of
+``@type`` "a", you can do::


Thanks, this helps me understand why they are classes instead of functions.

davisagli · 2022-12-15T04:43:55Z

src/plone/volto/tests/data/4.html

@@ -0,0 +1 @@
+<p style="text-align:center" class="styled"><b><span data-slate-data="{&quot;type&quot;:&quot;dataentity&quot;,&quot;data&quot;:{&quot;column&quot;:&quot;number_total_sites&quot;,&quot;provider_url&quot;:&quot;/data/countries-protected-areas-statistics&quot;}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p>


@tiberiuichim are these used in the tests?

davisagli · 2022-12-15T04:48:06Z

src/plone/volto/slate/html2slate.py

+FIRST_ANY_SPACE = re.compile(r"^\s", re.M)
+FIRST_ALL_SPACE = re.compile(r"^\s+", re.M)
+ANY_SPACE_AT_END = re.compile(r"\s$", re.M)
+ANY_WHITESPACE = re.compile(r"\s|\t|\n", re.M)


Be aware, \s includes non-breaking whitespace, which might not be desired. Tab and newline are also included in \s so I don't think including them separately does anything here.

davisagli · 2022-12-15T05:01:06Z

src/plone/volto/slate/html2slate.py

+    def normalize(self, value):
+        """Normalize value to match Slate constraints"""
+
+        assert isinstance(value, list)


If this isn't the case, the program will end with AssertionError: False which won't be super helpful. If this is something that should never happen, using type annotations and static analysis in your editor might be a better way to go. If we expect that we might actually get a list here from certain user input, then we should give the user a more helpful error about what they did wrong.

tiberiuichim added 2 commits November 29, 2022 00:16

Add straight copy of code from eea.volto.slate

20612f6

Remove backup files

0098fee

tiberiuichim added 19 commits November 29, 2022 10:54

Plug the package; tests fail

63aa19e

Fix imports

629437c

Correct test layers

09ce130

No more slicing

13ba060

Try to get code formatters in line

ef8003f

Add developer info, myself to contributors

023d29f

Don't need the indexer

616fdea

Down to one failure

e6d1ce3

Covered edge case with next node

770fcab

Code reformat

7826d90

Enable the slate storage as html via transformers

85616c5

Added failing test case

d9ead3f

Better condition for whitespace cleanup

273f9c7

One more test case, this type shows the broken slate normalization

7d9e550

Evolve the test case

897e5b2

Fix padding with spaces

13c56b8

Add changelog

8c7a93d

Use correct JSON in tests

536348f

Remove block.py, it's already in plone.restapi

d9448e0

tiberiuichim changed the title ~~WIP Slate integration~~ Slate/HTML serializers and deserializers Nov 29, 2022

tiberiuichim requested a review from sneridagh November 29, 2022 21:26

tiberiuichim requested review from davisagli, cekk and avoinea November 29, 2022 21:41

tiberiuichim added 3 commits November 30, 2022 10:09

Cleanup tests

b212f52

Remove transformers test

039354d

Remove comment

eb72b1d

Use bs4 instead of resiliparse

2129ff2

tiberiuichim added 2 commits December 1, 2022 16:25

Use the link plugin instead of <a> for links

653b913

Add docs

9207feb

davisagli reviewed Dec 5, 2022

View reviewed changes

stevepiercy requested changes Dec 5, 2022

View reviewed changes

README.rst Outdated Show resolved Hide resolved

README.rst Outdated Show resolved Hide resolved

README.rst Outdated Show resolved Hide resolved

README.rst Outdated Show resolved Hide resolved

README.rst Outdated Show resolved Hide resolved

Apply suggestions from code review

ac72520

Co-authored-by: David Glick <david@glicksoftware.com> Co-authored-by: Steve Piercy <web@stevepiercy.com>

stevepiercy approved these changes Dec 5, 2022

View reviewed changes

tiberiuichim added 4 commits December 5, 2022 21:39

Remove empty text padding; cleanup code

eed335e

More developing instructions

921a328

Fill in readme

e0be9f2

Fix readme

47d5e1f

tiberiuichim requested a review from davisagli December 5, 2022 19:56

davisagli reviewed Dec 15, 2022

View reviewed changes

tiberiuichim and others added 7 commits December 15, 2022 07:17

Merge branch 'main' into slate_integration

f1b9342

Merge branch 'main' into slate_integration

0548f1f

Merge branch 'main' into slate_integration

2ba02bf

Merge branch 'main' into slate_integration

01f6752

Merge branch 'main' into slate_integration

5d278b4

Merge branch 'main' into slate_integration

4265139

Add another test case

84bf85f

tiberiuichim mentioned this pull request Mar 15, 2023

blocks-conversion-tool should be deprecated plone/blocks-conversion-tool#27

Open

Merge branch 'main' into slate_integration

9fb3308

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slate/HTML serializers and deserializers #101

Slate/HTML serializers and deserializers #101

tiberiuichim commented Nov 28, 2022 •

edited

mister-roboto commented Nov 28, 2022

tiberiuichim commented Nov 29, 2022

tiberiuichim commented Nov 29, 2022

davisagli commented Nov 29, 2022 •

edited

tiberiuichim commented Nov 30, 2022

tiberiuichim commented Dec 1, 2022

tiberiuichim commented Dec 1, 2022

tiberiuichim commented Dec 1, 2022

davisagli left a comment

davisagli Dec 5, 2022

davisagli Dec 15, 2022

stevepiercy left a comment

tiberiuichim commented Dec 6, 2022

davisagli left a comment

davisagli Dec 15, 2022

davisagli Dec 15, 2022

davisagli Dec 15, 2022

davisagli Dec 15, 2022

davisagli Dec 15, 2022

		@@ -0,0 +1 @@
		<p style="text-align:center" class="styled"><b><span data-slate-data="{"type":"dataentity","data":{"column":"number_total_sites","provider_url":"/data/countries-protected-areas-statistics"}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p>

Slate/HTML serializers and deserializers #101

Are you sure you want to change the base?

Slate/HTML serializers and deserializers #101

Conversation

tiberiuichim commented Nov 28, 2022 • edited

mister-roboto commented Nov 28, 2022

tiberiuichim commented Nov 29, 2022

tiberiuichim commented Nov 29, 2022

davisagli commented Nov 29, 2022 • edited

tiberiuichim commented Nov 30, 2022

tiberiuichim commented Dec 1, 2022

tiberiuichim commented Dec 1, 2022

tiberiuichim commented Dec 1, 2022

davisagli left a comment

Choose a reason for hiding this comment

davisagli Dec 5, 2022

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

stevepiercy left a comment

Choose a reason for hiding this comment

tiberiuichim commented Dec 6, 2022

davisagli left a comment

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

davisagli Dec 15, 2022

Choose a reason for hiding this comment

tiberiuichim commented Nov 28, 2022 •

edited

davisagli commented Nov 29, 2022 •

edited