Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial mode #10

Open
4 tasks done
NickHeiner opened this issue Jul 21, 2023 · 10 comments
Open
4 tasks done

Partial mode #10

NickHeiner opened this issue Jul 21, 2023 · 10 comments
Labels
🤞 phase/open Post is being triaged manually 💬 type/discussion This is a request for comments

Comments

@NickHeiner
Copy link

NickHeiner commented Jul 21, 2023

Initial checklist

Problem

I'm working with large language models, which are relatively slow, so streaming/incremental output is important to a good user experience. (The project I'm working in is https://github.com/fixie-ai/ai-jsx.)

My LLM calls stream MDX. I'd like to be able to show the partially-rendered MDX to the user. However, if the MDX is incomplete, this parser crashes via the crash function (e.g. https://github.com/micromark/micromark-extension-mdx-jsx/blob/main/dev/lib/factory-tag.js#L311).

Solution

Ideally, there would be a "partial" mode (similar to untruncate-json) which auto-closes all open AST nodes via best-guess.

{/* input */}
<A b="a

{/* output */}
<A b="a" />
{/* input */}
<A b="a">foo

{/* output */}
<A b="a">foo</A>
{/* input */}
<

{/* output */}
<

Obviously, this might produce semantically strange results; it would be up to the caller to opt-in and handle these gracefully.

Alternatives

I considered sidestepping the compiler entirely and implementing my own function that maintains a stack, crawls the MDX as a string, and when it reaches the conclusion, tries to close everything on the stack.

My suspicion is that this would be simpler to implement for the easy cases, but using a modified version of the compiler would be more robust for the long tail.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jul 21, 2023
@remcohaszing
Copy link
Member

This would also be useful for MDX IntelliSense, as content tends to be invalid while typing.

Related: mdx-js/mdx-analyzer#267

@wooorm
Copy link
Member

wooorm commented Jul 22, 2023

The first and third have little to do with a stack, how would you want to solve that?

This micromark extension I think doesn't use a stack, the stack is handled in the corresponding mdast utility

@NickHeiner
Copy link
Author

NickHeiner commented Jul 22, 2023

Can you tell me more what you mean by the first not being related to stacks?

Agreed for the third – that case was just clarifying that when the input was already valid, the output is unchanged.

NickHeiner added a commit to fixie-ai/ai-jsx that referenced this issue Jul 25, 2023
This PR adds `MdxChatCompletion`, which produces MDX output as a string.
It does not handle:
* Auto-healing partial MDX (related:
micromark/micromark-extension-mdx-jsx#10)
* Rendering to the UI

I agree with @zkoch that this also might not be the final API we want.

I added docs for this, but as a standalone, this feature isn't super
interesting – it gets fun when we're actually able to render to the UI.
That can be a follow-up.

To verify that I didn't break the previous `UICompletion` component, I
manually tested the nextjs demo at http://localhost:3000/recipe.
@ChristianMurphy
Copy link
Member

ChristianMurphy commented Jul 25, 2023

Adding another perspective in.

I do think that some form of partial render could have value for authoring experience, human based or machine based.
Specifically, being able to show a valid document up-to the point where it becomes invalid, and having some for of invalid placeholder for the remainder.
I also think speculative completion/healing could be beneficial in the authoring experience like mdx-analyzer.

That said, I don't think speculative healing is a good idea for the parser.
MDX is Markdown + JSX.
JSX like XML and JS, does not speculatively heal or complete invalid documents.


Giving a hypothetical example of how this distinction could be a problem (grounded in past experience maintaining MDX and remark).
A community member and developer uses the autohealing partials option in their authoring experience.
A document author is able to see an invalid document rendering how they expect, so they leave the document invalid and save/commit it into their content management system.
The developer sees the document failing on their live site, and enables the option in production so the documents render.

Problem 1: We now have an option only designed for development, enabled in production

Down the road, the MDX maintainer team makes changes to improve the autohealing, and the authoring experience.
The community member and developer upgrades to the latest MDX version, but then finds documents now render different, and files an issue with MDX believing this is a bug.

Problem 2: We now have an unstable and unofficial API, being relied on as a SemVer stable API.
Problem 2.1: Any change to the auto healing will either prevent adopters from upgrading, force them change their documents, or need to be put behind a flag. None of those are good options from a maintainer perspective.


I'm against adding auto healing to the core parser.
Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option.
An external more forgiving parser could be created as a parser plugin on top of that, but should not, in my opinion, live as part of the core MDX parser.

@ChristianMurphy ChristianMurphy added the 💬 type/discussion This is a request for comments label Jul 25, 2023
@NickHeiner
Copy link
Author

Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option.

That would work perfectly well for my usecase. (Preferably, there would be some signal from the compiler API to let the caller know that the render was partial.)

@wooorm
Copy link
Member

wooorm commented Jul 29, 2023

Can you tell me more what you mean by the first not being related to stacks?

The tag is still open. The tag needs to be closed before it can be pushed to a stack or popped off a stack.

– that case was just clarifying that when the input was already valid, the output is unchanged.

To me, the 3rd is already some form of “autohealing” / “partial mode”. See #7.


I'm against adding auto healing to the core parser.
Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option.
An external more forgiving parser could be created as a parser plugin on top of that, but > should not, in my opinion, live as part of the core MDX parser.

Pausing could live here, right? Or where do you see it?

Currently, in JS either a successful result is returned or an error is thrown.
In Rust, it’s a Result<str, ErrorMessage>.

I guess the solution here is to expose the up to the error result, on errors? (perhaps optionally?).
This gets complex, errors might occur while making that partial result.
Particularly around MDX, which knows errors when parsing (here, as a micromark extension) and when dealing with the stack (as an mdast-util-from-markdown extension, because <a> </b> parses fine but isn’t, and same for <a>)

@NickHeiner
Copy link
Author

Can you tell me more what you mean by the first not being related to stacks?

The tag is still open. The tag needs to be closed before it can be pushed to a stack or popped off a stack.

Yep, I think we're saying the same thing. I just meant that, if the parser sees something like <A><B><C EOF, the reason it knows the EOF is wrong is because somewhere, it's likely maintained a stack representation of A -> B -> C.

@wooorm
Copy link
Member

wooorm commented Aug 1, 2023

The error is earlier, when parsing, before stacks: <a crashes because EOF is not allowed there.
The stack comes after the parsing (which emits events).
Perhaps pedantic and similar enough!

@ChristianMurphy
Copy link
Member

Pausing could live here, right? Or where do you see it?

Pausing could live here

I guess the solution here is to expose the up to the error result, on errors? (perhaps optionally?).

That could make sense

Particularly around MDX, which knows errors when parsing (here, as a micromark extension) and when dealing with the stack (as an mdast-util-from-markdown extension,

Right, there can still be issues downstream, even if mdast-util-from-markdown parses as expected, any <CustomElement /> could itself fail.

@wooorm
Copy link
Member

wooorm commented Aug 1, 2023

Something like this isn’t really a high priority for me to work on currently. Could be years. But I can always advise if someone else finds this important and wants to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually 💬 type/discussion This is a request for comments
Development

No branches or pull requests

4 participants