feat(parser): reimplement parser in `langium` instead of `jison` #4401

Yokozuna59 · 2023-05-15T11:43:39Z

From this conversion. It would be great to reimplement the parser in langium instead of jison the following reasons:

The current jison package is unmaintained.
Langium supports LSP.
Langium has cleaner and easier syntax.
Langium has typescript integration with a fully typed AST.
Langium has better error reporting.

And much more!

I'm currently working to create a separate package using it here Yokozuna59/mermaid-lint. Anyone is welcome to contribute and help me! Even thoughts would be fine.

Here is an example I've built using Langium for pie charts (toggle the icon next to the link to show the AST). playground

Note: It's not finished; it's just an example.

related issues:

sidharthv96 · 2023-05-16T18:18:57Z

This is exciting @Yokozuna59 !
Once you have a working parser, can you create a PR like #3432, so that we can see how much the size would increase with Langium?

I'd love to see the existing parsers modernised!

Yokozuna59 · 2023-05-17T08:58:36Z

Once you have a working parser, can you create a PR like #3432, so that we can see how much the size would increase with Langium?

Sure! I'm currently planning to support the pie chart since it's the easiest one. But I think I should support the common syntax and configuration first: comments, directives, accessible titles and descriptions, etc.

I've got to improve the current comments and directives regexes since they aren't covering some weird cases; e.g., in #1645, if text has been wrapped inside {} it won't be skipped and shown as text instead.

classDiagram
    class Cat {
        +int age %% not skipped comment 
    }

Yokozuna59 · 2023-05-18T03:11:21Z

@sidharthv96 It doesn't look like Langium or Chevrotain are customizable enough in our use case.

I was trying to match the diagram title (and accessible title and description in the same way), so I wanted to match everything after title: and set the value of it as title, but it doesn't seem that there is a clean way using Langium or Chevrotain from this discussion.

sidharthv96 · 2023-05-19T07:14:52Z

I think I was able to do that in chevrotain.

https://github.com/mermaid-js/mermaid/pull/3432/files#diff-d7ef899ed988dee0393145afe276d4821390b2d40f4bac039bea0f129e4338afR102-R106

Yokozuna59 · 2023-05-19T07:35:17Z

I think I was able to do that in chevrotain.

https://github.com/mermaid-js/mermaid/pull/3432/files#diff-d7ef899ed988dee0393145afe276d4821390b2d40f4bac039bea0f129e4338afR102-R106

Although not in the same way, I was able to do it too.

I've finished accTitle and single and multi-line accDesc with their unit tests. But I haven't committed yet; some minor changes might be needed. Then I'm planning to support directives, which might take a lot of time because of the weird style and number of valid values.

Yokozuna59 · 2023-05-22T17:00:06Z

I think accTitle and accDescr are done with their test cases. (Some weird test cases that are not handled have .todo).

Can some verify I haven't forgotten anything? link

aloisklink · 2023-05-24T20:47:15Z

Then I'm planning to support directives, which might take a lot of time because of the weird style and number of valid values.

Hmmmm, this is kinda a hack, so sticking it langium is probably the ideal way of doing this, but if you're feeling lazy, I wonder if it's easier to do this a pre-proccess step, e.g. something like:

const diagram = `%%{init: { **insert argument here**}}%%
  graph TD
    A-->B`;
const directiveRegex = /%%(\{[\w\W]*\})%%/gmU; // use U-flag to make `*` lazy
const directives = diagram.matchAll(directiveRegex);
const diagramWithoutDirectives = diagram.replace(directiveRegex, "");

const parsedDiagram = yourLangiumParser.parse(diagramWithoutDirectives);

It seems like directives are more like a metadata thing that applies to all diagrams, rather than something part of each diagram, so it might make sense to split it away from the diagram parser 🤷

Edit:

Just thought of something else, all mermaid diagram definitions support using a YAML front-matter before the diagram, e.g. like:

```mermaid
---
# some YAML here
title: "Hi"
---
pie
    "Rats" : 15
```

The way this is currently implemented is by doing a pre-processing step on all diagrams, so doing the same thing with directives wouldn't be too weird.

Can some verify I haven't forgotten anything? link

From a quick look, it looks great! Great work 👍! The it.todo tests are probably the most important unfortunately, otherwise it would break existing diagrams.

sidharthv96 · 2023-05-25T06:21:31Z

I agree with @aloisklink, yaml and init directives can be handled externally for now as they aren't super important for our core usecase which is formatting and parsing the actual diagrams.

Yokozuna59 · 2023-05-25T14:48:48Z

Hmmmm, this is kinda a hack, so sticking it langium is probably the ideal way of doing this...

I can create a general directive parser (still not easy), but it won't be strictly typed. So it won't suggest anything if the user tries to write some directives, unlike the strictly which would suggest according to what the user has typed using LSP. For example:

...but if you're feeling lazy, I wonder if it's easier to do this a pre-proccess step, e.g. something like:

I tried tbh, but it would hard to preform with my experience with langium. It can be removed using hidden terminal rule:

hidden terminal DIRECTIVE: "%%{" -> "}%%\r?\n";

It seems like directives are more like a metadata thing that applies to all diagrams, rather than something part of each diagram, so it might make sense to split it away from the diagram parser shrug

Not really sure tbh, but I won't think about it for now and will focus on other stuff.

Just thought of something else, all mermaid diagram definitions support using a YAML front-matter before the diagram, e.g. like:
---
# some YAML here
title: "Hi"
---
pie
    "Rats" : 15
The way this is currently implemented is by doing a pre-processing step on all diagrams, so doing the same thing with directives wouldn't be too weird.

I think it won't be that hard to implement this since it's wrapped differently from comments (%%) and only appears at the top of the file, but I'm not really sure what possible values can be there since I couldn't find them in the docs.

From a quick look, it looks great! Great work +1! The it.todo tests are probably the most important unfortunately, otherwise it would break existing diagrams.

I was trying to create a general regex that make sure that there is no accTitle or accDescr before title but appears that it won't be easy. You can try it here and please inform if you could come up with one. https://regex101.com/r/rGuOBM/1

Yokozuna59 · 2023-05-30T18:46:53Z

I figured out the solution for those it.todo test cases; it appears that I was using Langium in the wrong way. However, there is a limitation on the Langium side that I couldn't solve completely. If this PR merged eclipse-langium/langium#1067, then I would be able to continue working on the parser.

sidharthv96 · 2023-06-01T07:30:19Z

@Yokozuna59 will be developing this package in their repo, and we'll move it into mermaid once it's stable for production use. This will replace mermaid's current JISON parser once the team decides.

I wonder if there's any way to verify that the new parser is backwards compatible with the syntax supported by the old parser?

Yokozuna59 · 2023-06-01T07:40:59Z

I wonder if there's any way to verify that the new parser is backwards compatible with the syntax supported by the old parser?

I'm currently creating unit tests for common stuff, for example, title, accTitle, accDescr, etc. And will add unit tests for each diagram. Currently, only pie chart and the above-listed common stuff have test cases. And I add more test cases for each edge case and valid syntax while developing. I could take the test cases for the current parser and add them to mine to validate that they're working fine.

What do you suggest?

Yokozuna59 · 2023-06-01T07:45:57Z

Should I support multiline comments #1281 with it? I came across it while a ago. Please mention me for similar stuff if you could since there isn't a parser label in mermaid issues. I would suggest adding one if you can! Maybe adding other label like renderer and lsp would be helpful for other contributors.

sidharthv96 · 2023-06-02T07:46:56Z

@Yokozuna59, let's keep your implementation as close to the current parser as possible. We can add new features and fix bugs once we make the switch. Otherwise it'll be difficult to test.

Regarding testing & verification, unit tests and visual tests will definitely be required and helpful, but they won't cover all the usecases.

Two ideas I had (which seems a bit over engineered) were,

Fuzzing the current parsers to find many valid edge case unit tests, and then running that suite on the new parser.
Verifying syntax diagrams. But this requires syntax diagrams to be generated from the current jison files, which might be difficult.

Anything else?

sidharthv96 · 2023-06-02T07:49:37Z

If your pie implementation is ready, can you raise a PR like #3432, so we can test the size increase?

Yokozuna59 · 2023-06-02T09:27:52Z

@Yokozuna59, let's keep your implementation as close to the current parser as possible. We can add new features and fix bugs once we make the switch. Otherwise it'll be difficult to test.

Okay, I'll remove the multi line comment for now.

Regarding testing & verification, unit tests and visual tests will definitely be required and helpful, but they won't cover all the usecases.

Two ideas I had (which seems a bit over engineered) were,

Fuzzing the current parsers to find many valid edge case unit tests, and then running that suite on the new parser.

Verifying syntax diagrams. But this requires syntax diagrams to be generated from the current jison files, which might be difficult.

Anything else?

I'm not really sure how I can help with that. Before implementing the parser in langium, I tried looking at those jison files, but I wasn't able to understand much or get edge cases. If you point them out I would be more than happy to verify and test them.

Yokozuna59 · 2023-06-02T09:29:33Z

If your pie implementation is ready, can you raise a PR like #3432, so we can test the size increase?

The Langium PR haven't been merged yet, so the title and accessibilities would give wrong result:

- awesome title
+ title awesome title

There is an alternative way to implement it; thanks to @msujew for pointing it out, I'll implement it then create a PR.

sidharthv96 · 2023-06-02T12:47:24Z

@Yokozuna59 you can focus on langium for now. Don't worry about the verification part. I was sharing that to get input from the community.

Yokozuna59 · 2023-06-03T23:44:25Z

Never mind, all of the above issues were solved in mermaid-parser@0.2.0.

I guess Pie Chart is now completely supported (without directives or yaml).

NicolasCwy · 2024-03-22T02:29:47Z

@sidharthv96 I would like to convert the Git Graph parser to use Langium, can I tag along this issue or should I create a new one? Perhaps we can add a checklist of the remaining parsers and make this an epic of sorts

sidharthv96 · 2024-03-22T03:51:03Z

@NicolasCwy you can start with a draft PR directly, no need to create an issue. :)

The flow what we've seen till now is usually

Stage 1

Convert DB to TypeScript
Optimise the DB functions to read from AST (if required)
Tweak any unit tests to fit the new DB.

Stage 2

Add langium parser in the parser package
Add a parser file in mermaid that interfaces with the parser package and populates the DB
Do not modify any visual tests.

Stage 1 and 2 can be separate PRs, so we can get stuff in easily.

Few reference PRs

github-actions bot added the Status: Triage Needs to be verified, categorized, etc label May 15, 2023

Yokozuna59 mentioned this issue May 15, 2023

Split docs into separate package #4382

Open

sidharthv96 added Area: Development Status: In progress enhancement and removed Status: Triage Needs to be verified, categorized, etc labels May 16, 2023

Yokozuna59 mentioned this issue Jun 1, 2023

chore: add more labels to issues #4444

Closed

sidharthv96 added the Internals: Parser label Jun 2, 2023

This comment was marked as resolved.

Sign in to view

Yokozuna59 mentioned this issue Jun 3, 2023

add @mermaid-js/parser separate package #4450

Closed

24 tasks

jgreywolf added Type: Enhancement New feature or request and removed enhancement labels Jun 7, 2023

Yokozuna59 linked a pull request Jul 4, 2023 that will close this issue

add @mermaid-js/parser separate package #4450

Closed

24 tasks

Yokozuna59 removed a link to a pull request Jul 4, 2023

add @mermaid-js/parser separate package #4450

Closed

24 tasks

This was referenced Aug 16, 2023

feat: add @mermaid-js/parser package and info langium parser #4727

Merged

feat: add pie langium parser #4751

Merged

Yokozuna59 mentioned this issue Sep 2, 2023

feat: create sankey parser and integrate sankey parser into mermaid package #4799

Open

2 tasks

Yokozuna59 self-assigned this Sep 8, 2023

weedySeaDragon mentioned this issue Oct 3, 2023

dot character in ERD #1895

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): reimplement parser in `langium` instead of `jison` #4401

feat(parser): reimplement parser in `langium` instead of `jison` #4401

Yokozuna59 commented May 15, 2023 •

edited

sidharthv96 commented May 16, 2023

Yokozuna59 commented May 17, 2023

Yokozuna59 commented May 18, 2023

sidharthv96 commented May 19, 2023

Yokozuna59 commented May 19, 2023

Yokozuna59 commented May 22, 2023

aloisklink commented May 24, 2023 •

edited

sidharthv96 commented May 25, 2023

Yokozuna59 commented May 25, 2023

Yokozuna59 commented May 30, 2023

sidharthv96 commented Jun 1, 2023

Yokozuna59 commented Jun 1, 2023

Yokozuna59 commented Jun 1, 2023 •

edited

sidharthv96 commented Jun 2, 2023 •

edited

sidharthv96 commented Jun 2, 2023

Yokozuna59 commented Jun 2, 2023

Yokozuna59 commented Jun 2, 2023

sidharthv96 commented Jun 2, 2023

This comment was marked as resolved.

Yokozuna59 commented Jun 3, 2023

NicolasCwy commented Mar 22, 2024 •

edited

sidharthv96 commented Mar 22, 2024 •

edited

feat(parser): reimplement parser in langium instead of jison #4401

feat(parser): reimplement parser in langium instead of jison #4401

Comments

Yokozuna59 commented May 15, 2023 • edited

sidharthv96 commented May 16, 2023

Yokozuna59 commented May 17, 2023

Yokozuna59 commented May 18, 2023

sidharthv96 commented May 19, 2023

Yokozuna59 commented May 19, 2023

Yokozuna59 commented May 22, 2023

aloisklink commented May 24, 2023 • edited

sidharthv96 commented May 25, 2023

Yokozuna59 commented May 25, 2023

Yokozuna59 commented May 30, 2023

sidharthv96 commented Jun 1, 2023

Yokozuna59 commented Jun 1, 2023

Yokozuna59 commented Jun 1, 2023 • edited

sidharthv96 commented Jun 2, 2023 • edited

sidharthv96 commented Jun 2, 2023

Yokozuna59 commented Jun 2, 2023

Yokozuna59 commented Jun 2, 2023

sidharthv96 commented Jun 2, 2023

This comment was marked as resolved.

Yokozuna59 commented Jun 3, 2023

NicolasCwy commented Mar 22, 2024 • edited

sidharthv96 commented Mar 22, 2024 • edited

Stage 1

Stage 2

feat(parser): reimplement parser in `langium` instead of `jison` #4401

feat(parser): reimplement parser in `langium` instead of `jison` #4401

Yokozuna59 commented May 15, 2023 •

edited

aloisklink commented May 24, 2023 •

edited

Yokozuna59 commented Jun 1, 2023 •

edited

sidharthv96 commented Jun 2, 2023 •

edited

NicolasCwy commented Mar 22, 2024 •

edited

sidharthv96 commented Mar 22, 2024 •

edited