Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore matches #6

Open
kkaefer opened this issue Jul 4, 2011 · 3 comments
Open

Ignore matches #6

kkaefer opened this issue Jul 4, 2011 · 3 comments

Comments

@kkaefer
Copy link

kkaefer commented Jul 4, 2011

The functionality to match something, but not add it as a child node would be useful. E.g. I usually don't care about whitespace, but I have whitespace nodes littered over my AST. The excellent LEPL uses the ~/Drop operator to match, but ignore input tokens.

@tolmasky
Copy link
Owner

tolmasky commented Jul 4, 2011

Hi kkaefer,

I have considered this syntax addition (and it has been suggested to me by others as well), and I am not yet 100% convinced we should add it (but I will admit I am very tempted). Let me list some of my concerns and we can work our way from there.

From a philosophical point of you, language.js' MO has always been "this is not the language transformation step, this is the tagging step". In other words, it may make more sense to think of language.js as a syntax highlighter: you are actually going through and annotating the text and giving it structure that way, rather than evaluating it (in other words, language.js produces a CST instead of an AST). For an example of a real like inconsistency that would arise consider the innerText property of all nodes. Say we have "x y z" being parsed as:

+ parent
+--x
+--y
+--z

As you can see, we've dropped the whitespaces here. Calling innerText on the x,y,z nodes works as expected, returning "x", "y", and "z". However, counterintuitively, calling innerText on the parent returns "x y z", so there is a discrepancy of information. The nodes don't actually store any strings, but are rather ranges that point to the original source (again, think of this as "tagging" the document), so we can't change simply change the parent to "xyz" easily (and this is probably not desired either). The question thus is whether we are comfortable with having this discrepancy (maybe we are and it is not a big deal) -- I don't know the answer yet.

@kkaefer
Copy link
Author

kkaefer commented Jul 4, 2011

Maybe this feature could be added by not dropping them on parse time but skipping over "dropped" tokens on traverse time, similar to how traversesTextNodes works

@tolmasky
Copy link
Owner

tolmasky commented Jul 4, 2011

Yeah that is certainly an option, either have skippedNodeNames:["WhiteSpace", "SomethingElse", etc] or the other option would be to traverse the tree and manually remove them oneself with something like tree.removeNodesNamed(...).

In a world where we did add the explicit operator, it might be nice to be able to apply it to rule definitions as well:

~WhiteSpace = ... // now anywhere WhiteSpace is used it is dropped, that way you don't have ~'s all over your grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants