New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customize tokenizer via the fork API #264
base: master
Are you sure you want to change the base?
Conversation
@scripthunter7 Thank you for the PR! I'm supportive of this extension. However, there are a few points to address:
Regarding documentation for the fork API, I absolutely agree. I would appreciate it if you could propose a PR to lay the groundwork. This would also be a good place to detail the requirements for the custom tokenizer, as you've outlined. |
@lahmatiy Thank you for your feedback! I think I made the tokenizer switchable everywhere (I hope). Finally, I introduced a separate utility that returns the tokenizer function from the config object, if it is present. I also made some simple unit tests to make sure that the custom tokenizer is used in the fork, but it does not affect the operation of the default library. |
Closes #253
This is a relatively simple PR that allows advanced library users to use a custom tokenizer via the fork API. This PR doesn't change how the base library works, it only affects the forks and makes them even more flexible.
Custom tokenizer function can be a completely new tokenizer or a simple wrapper around CSSTree's tokenizer, the point is that it should meet the following requirements:
Example usage:
@lahmatiy I think It would be worth making a documentation about the fork API. If you think so, if I have some free time, I will be happy to help you make a basic one in a different PR. These requirements for the custom tokenizer should also be described there.