Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlexmarkHtmlConverter - HtmlConverterExtension #613

Open
c4rth opened this issue May 9, 2024 · 0 comments
Open

FlexmarkHtmlConverter - HtmlConverterExtension #613

c4rth opened this issue May 9, 2024 · 0 comments

Comments

@c4rth
Copy link

c4rth commented May 9, 2024

Is your feature request related to a problem? Please describe.

I'm using flexmark-html2md-converter to convert Confluence HTML pages to markdown.

The elements in the html are already handled by the default conversion. e.g. <div>, <img>, <span>, ...
But for some cases, the class attributes describe more precisely what the elements are.
e.g.

  • <div class='confluence-information-macro'> is an admonition
  • <img class='emoticon'> is an emoji

So I wrote an HtmlNodeRenderer (inner class of an HtmlConverterExtension) for <div> but it handles all of them, I didn't find a way to specialize it.

Describe the solution you'd like

It would be nice to be more specific in the HtmlNodeRendererHandler to handle tagName and attributes (class or others).
Like that I can have one extension by type (tag + attribute(s)) and not one by tag.

actual:

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", Element.class, this::processDiv)
        ));
    }
    ...
}

desired:

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        // <div class='className1 className2 ...' title='title' ... >
        Map attributesMap = Map.of("class", List.of("className1", "className2"), "title", List.of("title"));
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
        ));
    }
    ...
}

Ideally, the attributesMap should accept basic expressions (and, or, not,...)

public class MyHtmlNodeRender implements HtmlNodeRenderer {
    ...
    @Override
    public Set<HtmlNodeRendererHandler<?>> getHtmlNodeRendererHandlers() {
        // <div class='className1 or className2 ...' title='!title' ... >
        Map attributesMap = Map.of("class", or("className1", "className2"), "title", not("title"));
        return new HashSet<>(Collections.singletonList(
                new HtmlNodeRendererHandler<>("div", attributesMap, Element.class, this::processDiv)
        ));
    }
    ...
}

Describe alternatives you've considered

Write one HtmlNodeRenderer by tag

Additional context
none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant