Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where do we sanitise the incoming markup? (Hooks implementation) #12

Open
tanmayaBiswalOdiware opened this issue Apr 19, 2023 · 19 comments

Comments

@tanmayaBiswalOdiware
Copy link

Writing DOM.purify inside renderer, right before returning the parsed markup leads to problems like showing [object Object] for things like bold and italics.

    const renderer = {
        paragraph(text) {
            console.log("text =", text)
            // let cleanHTML = DOMPurify.sanitize(text, { USE_PROFILES: { html: true } });
            return <MathJax key={uniqueId()}>{text}</MathJax>
            // return <span>hahah lol</span>
        },
        html(text) {
            let cleanHTML = DOMPurify.sanitize(text, { USE_PROFILES: { html: true } });
            return <span key={"title-span"} dangerouslySetInnerHTML={{ __html: cleanHTML }}></span>;
        }
    }

Where do you suggest we purify the input?

@sibiraj-s
Copy link
Owner

I have never been in a situation to sanitize markdown files. But If you ask me, I would heavily recommend to sanitize the content before storing in DB, processing it in client side is not recommended, as it can be bypassed on way or other.

And to your question. your code looks fine, and should work fine. Can you please verify the output of the sanitize function once (though it shouldn't be an issue).

And please provide a reproducible example in stackblitz/codesandbox maybe. Thanks

@tanmayaBiswalOdiware
Copy link
Author

Well, there is sanitization happening on the server side as well. But the best practice is usually to sanitize them on both ends (I can just no implement it and tell my overlords that it been done though😂 but I digress).

Anyway, as far as the code shown above goes - it works fine for the HTML, but it doesn't work for some other type of input like bold, italics and so on. I am going to post a sandbox soon. Let me see if it can be reproduced.

Thanks for the prompt reply and hard work!

@sibiraj-s
Copy link
Owner

Ah, I get it now. I'll see why you are referring to now. I didn't understand fully earlier.

You must return a valid react node. See here . I'll also look into this once.

@tanmayaBiswalOdiware
Copy link
Author

Yea, that's the problem. Sanitizing the argument passed into renderer messes up the React Node, and hence a weird output is received. That was my diagnosis. For now, I am essentially running the sanitization in useEffect where I am getting my data from API call.

@sibiraj-s
Copy link
Owner

sibiraj-s commented Apr 20, 2023

I have just tried it and it works fine as expect 🤔 . Make sure the implementation is for the renderer.

This is the code i used

strong(text){
  const clean = DOMPurify.sanitize(text as string)
  return <CustomComponent>{clean}</CustomComponent>
}

You can refer to some examples here from storybook stories

To debug further. a reproducible stackblitz will help.

@tanmayaBiswalOdiware
Copy link
Author

Ok, I have got the Sandbox. Maybe I am doing something wrong.

@sibiraj-s
Copy link
Owner

I think I see whats happening whats happening. So the elements like bold, italics are parsed first and converted to react element before passing to paragraph.

This react elements are just objects, which when passed to purifier, is converted to object object.

I tried differently before, hence I couldn't see this.

@tanmayaBiswalOdiware
Copy link
Author

Yep, I figured as much. So even if I give the renderer a separate method for processing bold and italics, it still passes them into paragraph and the result is basically the same.

If I have inline prop as true, everything is sent into text method. And even in that case, the bold and italics are processed first and then sent into text method.

Anyways, I am in no hurry, you can have a little fun figuring this out in the weekend or the next😄

@tanmayaBiswalOdiware
Copy link
Author

I have seen the native marked package handle Sanitize at some point in the rendering cycle. Maybe that will give you the clue. I was thinking of switching to the native package but then I thought that will be hassle, let's try hitting you up instead, maybe you will figure it out.

@sibiraj-s
Copy link
Owner

Screenshot 2023-04-26 at 11 25 59 AM

marked doesn't sanitize. Ref: https://marked.js.org/#usage

@tanmayaBiswalOdiware
Copy link
Author

tanmayaBiswalOdiware commented Apr 26, 2023

Yea, marked doesn't sanitize. But it has a way to handle sanitization at some point in the rendering cycle.
Ref: https://marked.js.org/using_pro#hooks

Image showing the second example in hooks section

See the second example of them using postprocess hook.

Also, since marked just outputs the basic HTML, it can be directly sent through sanitization.
But with React, that can't be the case. So having access to the hook would be good.

@sibiraj-s
Copy link
Owner

Oh didn't know that. Thanks.

Yeah it is simple in their case as it is just a simple string of html. Where are the parser here converts to React components. I don't see an option here, even a window to do something.

@tanmayaBiswalOdiware
Copy link
Author

I wonder if React elements can be sanitized. That didn't pop into my brain until now haha.

But from your point of view, there is no way to introduce hooks into this package?

@sibiraj-s
Copy link
Owner

I did not originally plan to add option to implement sanitization or provide some hooks to pre/post process stuffs, but I am open to suggestions for the implementation.

@sibiraj-s
Copy link
Owner

Why not just pass the content through dom-purify first then send it to marked-react. I believe dom-purify won't remove anything that is related to markdown formatting. 🤔

did you try that and created any problems?

@sibiraj-s
Copy link
Owner

Screenshot 2023-04-26 at 2 07 31 PM

@tanmayaBiswalOdiware
Copy link
Author

tanmayaBiswalOdiware commented Apr 26, 2023

That is in fact what I am doing right now. It looks fine with the content I have tested.

It's just that the processing takes place in two separate places, you know. Once I am sanitizing else where, and then later on in the return I push the data in the Marked.

It works, but they happen separately. So yes, I was just nitpicking but it would have been sweet if all of those could happen altogether. If you get what I mean.

That is why I wrote, it's not urgent, but it definitely will make the package better overall.

@sibiraj-s
Copy link
Owner

Cool. In that case, I'll leave this open for sometime, may be some idea might popup overtime or someone could suggest a better solution to make this seamless.

@tanmayaBiswalOdiware
Copy link
Author

Sure, that would be good. I do not have a lot of experience with developing packages but I will try to think of something if we can somehow implement hooks (and even tokenizer but I just don't understand tokenizer myself haha)

@tanmayaBiswalOdiware tanmayaBiswalOdiware changed the title Where do we sanitise the incoming markup? Where do we sanitise the incoming markup? (Hooks implementation) Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants