Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bundler): New Gemfile parser based on "tree-sitter" #28403

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

zharinov
Copy link
Collaborator

Changes

This PR introduces "tree-sitter" for parsing dependencies from complex languages.

Context

The current regex-based implementation is hard to maintain.
It blocks implementation of #4789, which should be straightforward with this new parsing approach.

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • Both unit tests + ran on a real repository

@zharinov zharinov changed the title refactor(bundler): Reimplement parsing using "tree-sitter" refactor(bundler): New Gemfile parser based on "tree-sitter" Apr 14, 2024
@zharinov zharinov added the ci:fulltest Run full test suite on all platforms label Apr 14, 2024
@zharinov
Copy link
Collaborator Author

Okay, now it's obvious that particular tree-sitter-* native packages are the weak point.
It works on my machine, but fails on CI environment.
While the overall approach looks good, it seems like the custom WASM packaging is needed to work around native packages.

This reverts commit 33818ee.
This reverts commit 95f93a9.
This reverts commit ee7fafe.
This reverts commit 89041e1.
@zharinov
Copy link
Collaborator Author

Or, another possibility that it has something to do with our aggressive CI caching

@zharinov zharinov removed the ci:fulltest Run full test suite on all platforms label Apr 15, 2024
@zharinov zharinov marked this pull request as draft April 15, 2024 10:42
Copy link
Member

@viceice viceice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another parser? isn't your self written parser good enough?

it's probably related to node v18 issues?

@zharinov
Copy link
Collaborator Author

Nah, it's not good enough 😂

@zharinov
Copy link
Collaborator Author

it's probably related to node v18 issues?

I don't know, the local setup on my M1 machine works fine for both 18 and 20

@zharinov
Copy link
Collaborator Author

zharinov commented Apr 15, 2024

Starting from 0.21.0, the NAPI-based prebuilds are used, which seems to work:

image

The problem is that some (e.g. ruby and scala) are not updated yet.

Copy link
Member

@viceice viceice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks complicated 🫣

@@ -135,10 +135,10 @@ describe('modules/manager/bundler/extract', () => {
'Gemfile',
);
expect(res?.deps).toMatchObject([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the order changes?

@@ -0,0 +1,327 @@
import Parser, { Query, SyntaxNode } from 'tree-sitter';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it's something like antlr?

does it make sense to use it for more languages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is similar to Antlr, now it's gaining popularity among text editor enthusiasts (specifically, Neovim), and it powers GitHub's syntax highlight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to use it for more languages?

It supports quite a bit of languages: https://tree-sitter.github.io/tree-sitter/

The problem is, like this Ruby implementation, many of them are not compatible with the recent node binding.

lib/modules/manager/bundler/parser.ts Outdated Show resolved Hide resolved
@zharinov zharinov marked this pull request as ready for review April 22, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants