Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing lexer rules if tokenVocab defined #157

Closed
wants to merge 4 commits into from

Conversation

robstoll
Copy link
Contributor

Fix is due to a problem I encountered, see http://stackoverflow.com/questions/22285912/antlr-v3-token-file-array-and-null

Problem was: lexer rules for tokens defined in the tokenVocab where not generated which resulted in NullPointerExceptions in the lexer.

This pull request fixes this issue (and I have fixed some broken links in the README.txt). The reasons why I want to use a tokenVocab already in the parser are explained in the StackOverflow thread

grammar.getTokenType(t.getText())==Label.INVALID )
Character.isLowerCase(currentRuleName.charAt(0))) &&
(hasTokenVocabAndIsParserOrCombined ||
grammar.getTokenType(t.getText())==Label.INVALID ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is completely correct.

  1. Why do you change the behavior for parser grammars? A parser grammar would use a tokenVocab = LexerGrammarName, and that lexer grammar name could itself have a tokenVocab = CustomTokensFile. It appears that only combined grammars are impacted by the scenario you are describing.
  2. Does the condition assume that the imported tokenVocab defines tokens which are referenced in the parser? What happens if the referenced tokenVocab file is empty, or otherwise does not define a literal which is referenced in a parser rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback.

  1. I will change the condition to combined grammars only, my mistake.
  2. I guess you are right that the relaxation of the condition might be too relaxed. I assumed the use cases you described are covered by existing tests (and they all passed). I have only written a test for the use case were the defined tokens in tokenVocab are referenced in the parser (as a side notice, should I open an issue for this bug?)

@sharwell
Copy link
Member

Have you tried separating your grammar into separate lexer and parser grammars, and then using the tokenVocab option in your lexer grammar to import the customized token assignments?

…d normal parser before)

- fixed token vocab parser, it got into an endless loop when there was an error at the end of the file
- added tests for:
-- empty token vocab file
-- token vocab file with errors
-- token vocab file which includes referenced tokens
-- token vocab file which includes non-referenced tokens
@robstoll
Copy link
Contributor Author

As mentioned in my previous comment I have changed the condition to combined parsers only. Furthermore I have added the desired tests and also fixed a bug in the token vocab parser on the go.
I also tested it with my own grammar and it worked fine :)

Btw. separating the grammar is not an option since I do not want more maintenance than necessary (and it's just matter of adding this feature and we are good to go).

@robstoll robstoll mentioned this pull request Jul 3, 2014
@robstoll
Copy link
Contributor Author

Are there any open questions or issues left?

@parrt
Copy link
Member

parrt commented Sep 4, 2014

we will look at this for the next release.

sharwell added a commit to sharwell/antlr3 that referenced this pull request Sep 29, 2014
sharwell added a commit to sharwell/antlr3 that referenced this pull request Sep 29, 2014
@sharwell
Copy link
Member

I created an alternate implementation of this feature. Note that you don't need to include each token twice in the .tokens file. For example, the following is sufficient to cover the assignment for all of TypeArray, 'array', Null, and 'null':

TypeArray=394
Null=395

@robstoll
Copy link
Contributor Author

Sounds promising. I will try it out as soon as your pull request is merged.

@robstoll
Copy link
Contributor Author

Please consider to cherry pick the commit: robstoll@20d859a

Would be nice to be able to add comments and new lines to token files. I am sorry that I did not put that into a separate pull request. Could create one though if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants