Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for parsing LLM output #355

Open
verhovsky opened this issue Dec 12, 2023 · 0 comments
Open

Better support for parsing LLM output #355

verhovsky opened this issue Dec 12, 2023 · 0 comments

Comments

@verhovsky
Copy link

cmark-gfm is used by a number of apps that interface with text-generating Large Language Models (LLMs) (this one and this one are the ones I know). These models produce a few characters of Markdown every 200ms (on my machine) and cmark-gfm is used continuously to render the output text so far as Markdown. This is inefficient because (as far as I can tell) the entire generated Markdown has to be re-parsed from the beginning for every generated token, even though it has already been parsed except for the latest token.

cmark-gfm has a streaming interface of cmark_parser_feed and cmark_parser_finish but it seems like I need to call cmark_parser_finish every time I actually want to parse and I need to re-create a parser after that, I can't feed more tokens and re-parse. I would have expected there to be a way to cmark_parser_feed and then cmark_parser_parse and then doing cmark_parser_feed again, or a more complicated interface for editing the parse tree like tree-sitter has.

Also, while we're at it, the other issue is that the syntax isn't stable when it hasn't yet seen the entire input. Namely, a trailing single backtick ` should open a code block until the end of the line/input even if there's no closing backtick. The way it is now leads to jittering in the UI, where the UI first prints a backtick and a few seconds later removes it and re-renders everything after it in monospace when the LLM generates the closing backtick. This is also a problem for horizontal rules and bold/italic but definitely the latter isn't doable because many people use single * characters for multiplication.

nerocui pushed a commit to nerocui/cmark-gfm that referenced this issue May 29, 2024
Otherwise we can get quadratic increase in size with deeply
nested structures.

See github#355.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant