Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slate/HTML serializers and deserializers #101

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

tiberiuichim
Copy link
Contributor

@tiberiuichim tiberiuichim commented Nov 28, 2022

The html2slate and slate2html are the value here.

For the curious, there's an htmlblock.zcml that you can include, which will override the way slate blocks are stored, making them always store as HTML in ZODB, with python-side serializer/deserializer.

Checklist:

  • Solve resiliparse dependency problem
  • Documentation
  • Rewrite the link support to the model is used in Volto, it's the old EEA style links

@mister-roboto
Copy link

@tiberiuichim thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

@tiberiuichim tiberiuichim changed the title WIP Slate integration Slate/HTML serializers and deserializers Nov 29, 2022
@tiberiuichim
Copy link
Contributor Author

@jenkins-plone-org please run jobs

@tiberiuichim
Copy link
Contributor Author

Jenkins is complaining that I've added a new dependency and it's not pinned.

What's the procedure for this?

image

Resiliparser is the only Python HTML parser that represents its data as DOM Nodes (Document, Element, Inlines and Text nodes) and so it's the only one that I could use in a convenient manner to preserve the perception-based rendering of HTML in browser. When I've initially created eea.volto.slate back in 2021, this parser didn't exist (and I was an ignorant back then to all the complexities of this problem space). So, I'm ready to be pointed to another option, but only if it can do browser-style dom parsing. html5lib won't cut it. It doesn't expose TextNodes, all text between tags is exposed as node.tail. So, a wrapper could be done on top of it, but it's more work then it's worth.

@davisagli
Copy link
Sponsor Member

davisagli commented Nov 29, 2022

@tiberiuichim I'm not familiar with the motivation for this work, so can you say a bit more about why you want to store blocks as HTML? My initial gut reaction is that it sounds like it adds unnecessary work at the time of serialization and deserialization, and also adds complexity. Is there a benefit that makes that worthwhile?

Regarding Resiliparser, I would want to avoid adding a new dependency if possible. Did you consider BeautifulSoup (bs4), which is already a dependency of Plone? It looks like it exposes each bit of text as a https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring -- but maybe there is some other thing it doesn't handle in the way you need?

@tiberiuichim
Copy link
Contributor Author

I do wonder if it should go in a separate library, so that it could be used by a Python client of the REST API without depending on all of Plone.

The html2slate.py and slate2html.py are not dependent on any Plone. If anyone needs them in their own projects, they can just copy them.

@tiberiuichim
Copy link
Contributor Author

@jenkins-plone-org please run jobs

@tiberiuichim
Copy link
Contributor Author

I think it's ready.

@tiberiuichim
Copy link
Contributor Author

@jenkins-plone-org please run jobs

Copy link
Sponsor Member

@davisagli davisagli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiberiuichim Here's my first pass on reviewing. So far I looked at the tests to make sure that what it's doing makes sense, but I didn't yet look at how it's doing it.

README.rst Outdated Show resolved Hide resolved
DEVELOPING.md Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
<p style="text-align:center" class="styled"><b><span data-slate-data="{&quot;type&quot;:&quot;dataentity&quot;,&quot;data&quot;:{&quot;column&quot;:&quot;number_total_sites&quot;,&quot;provider_url&quot;:&quot;/data/countries-protected-areas-statistics&quot;}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p>
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no json file for tests 3 and 4?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiberiuichim are these used in the tests?

src/plone/volto/tests/data/5.json Outdated Show resolved Hide resolved
src/plone/volto/tests/data/6-1.html Outdated Show resolved Hide resolved
src/plone/volto/tests/test_html2slate.py Show resolved Hide resolved
Copy link
Contributor

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammar fixes to README.rst

README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
Co-authored-by: David Glick <david@glicksoftware.com>
Co-authored-by: Steve Piercy <web@stevepiercy.com>
@tiberiuichim
Copy link
Contributor Author

@jenkins-plone-org please run jobs

Copy link
Sponsor Member

@davisagli davisagli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiberiuichim Sorry for taking so long to get back to this. I've now done a pass reading through the implementation. For something like this it's hard to say when enough testing has been done, since there are always edge cases. It would be possible to go borrow some more test cases from blocks-conversion-tool. But, maybe it makes as much sense to merge it as is and let someone try it with real data.

- Run `make build` to build the Plone backend
- Run `make start` to start the Plone backend
- Run `make test` to run the tests.
- Run `bin/zope-testrunner --auto-color --auto-progress --test-path src -t name_of_test` to run a particular test
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiberiuichim I usually use the coredev buildout for this (as well as working on other packages in Plone core). I think either way is fine but for the sake of comparison:

  • clone git@github.com:plone/buildout.coredev.git
  • edit checkouts.cfg to specify which packages to check out from github
  • run make to build
  • edit in src/plone.volto
  • run bin/instance fg to start the backend
  • run bin/test -s plone.volto to run all tests from plone.volto
  • run bin/test -t [name] to run a specific test

These two classes can be inherited and extended for your custom elements and
plugins. To handle any custom element, you need to provide a method called
``handle_tag_<elementname>``. For example, if you have a custom element of
``@type`` "a", you can do::
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this helps me understand why they are classes instead of functions.

@@ -0,0 +1 @@
<p style="text-align:center" class="styled"><b><span data-slate-data="{&quot;type&quot;:&quot;dataentity&quot;,&quot;data&quot;:{&quot;column&quot;:&quot;number_total_sites&quot;,&quot;provider_url&quot;:&quot;/data/countries-protected-areas-statistics&quot;}}"><span class="primary-big-text">1565</span></span></b> Protected areas</p>
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiberiuichim are these used in the tests?

FIRST_ANY_SPACE = re.compile(r"^\s", re.M)
FIRST_ALL_SPACE = re.compile(r"^\s+", re.M)
ANY_SPACE_AT_END = re.compile(r"\s$", re.M)
ANY_WHITESPACE = re.compile(r"\s|\t|\n", re.M)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware, \s includes non-breaking whitespace, which might not be desired. Tab and newline are also included in \s so I don't think including them separately does anything here.

def normalize(self, value):
"""Normalize value to match Slate constraints"""

assert isinstance(value, list)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't the case, the program will end with AssertionError: False which won't be super helpful. If this is something that should never happen, using type annotations and static analysis in your editor might be a better way to go. If we expect that we might actually get a list here from certain user input, then we should give the user a more helpful error about what they did wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants