Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extensions: Select language extension based on longest matching path extension #11697

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

claytonrcarter
Copy link
Contributor

@claytonrcarter claytonrcarter commented May 11, 2024

This is a "let's see if I can get it to work" (aka hack) to try address #8408 and #10997. In short: the current language extension matcher is only considering the "final extension" of a file when seeing which language extensions would apply to the file. (This is because that's how Rust's path.extension() works.) This works fine for most cases, but there are some cases where a language extension may want to match on a "compound file extension", and this is currently not supported at all.

Example:

  • Laravel Blade templates use the blade.php "extension"
  • So a Blade file may be named index.blade.php
  • A hypothetical lang extension for Blade will specify path_suffixes = ["blade.php"]
  • Zed will look at index.blade.php, and then try to match a lang ext based on any of the path_suffixes exactly matching either "php" or "index.blade.php"
  • The Blade ext will not match because "blade.php" != "php" nor "index.blade.php"

Proposed Solution
Basically, instead of doing an exact match on the suffix and either the file extension or filename, this change:

  • finds the longest suffix that matches the end of the filename
  • returns the suffix string length as the matching "score"

Languages that may use this

  • Laravel Blade templates use the blade.php "extension"
  • I'm not familiar with Angular, but apparently it uses component.html
  • others?

Open questions

  • perf: I'm assuming that filename.ends_with(&ext) || filename == suffix is slower than path_suffixes.contains(&Some(suffix.as_str()))? If so, how bad is it?
    • alternate option: separate path_suffixes in the ext API into path_suffixes and filenames, and then we could preprocess these when the language ext is first loaded to reduce overhead
  • unintended matches: I realized while writing this that it may introduce behavior where (eg) a lang ext for D (suffix d) may be matched to Markdown (md). In that case, Markdown would still win, though (because of length) ... so may that's not an issue? edit I resolved this when I found the tests that cover this change and go this to pass.
  • alternate approach: perhaps declare within an extension that it may "compete" with other extensions and provide some way to disambiguate which should apply?
  • prior art: Add glob to match filetype #10960 is related to this, but I believe that it is addressing a different problem

Release Notes:

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label May 11, 2024
@wassson
Copy link

wassson commented May 11, 2024

Languages that may use this

I like where your head is at! To add to this, Rails has erb files with the extension *.html.erb.

@@ -1,6 +1,6 @@
name = "TypeScript"
grammar = "typescript"
path_suffixes = ["ts", "cts", "d.cts", "d.mts", "mts"]
path_suffixes = ["ts", "cts", "mts"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not important to this PR – and to be honest, I didn't actually test this – but I noticed while working on this that these suffixes a. would never work (because of the issue this PR is trying to fix) and b. are redundant (because cts and mts are already handled).

@@ -533,8 +542,10 @@ impl LanguageRegistry {
},
);
if path_matches_custom_suffix {
2
} else if path_matches_default_suffix || content_matches {
usize::MAX
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used MAX to ensure that user overrides always win.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user override needs to always win. I think we could use the longest match, but in the event of a tie, favor the user-defined path suffix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to implement that is to multiple the longest matching suffix's length by two and then add one if it's a user-defined suffix. Curious what you think of that.

let path_matches_custom_suffix = user_file_types
.and_then(|types| types.get(language_name))
.unwrap_or(&empty)
.iter()
.any(|suffix| path_suffixes.contains(&Some(suffix.as_str())));
.any(|suffix| filename.ends_with(suffix));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to update this like I did above to support the full file name and prevent (eg) rs also matching .cars. I'll leave this as is for now, pending input from someone else on how to proceed. 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should use the same logic for finding the suffix length, for both the default path suffixes and the user-defined path suffixes.

I'd write it something like this:

let matching_suffix_len = config
    .path_suffixes
    .iter()
    .filter(|suffix| {
        filename == suffix ||
            filename.strip_suffix(suffix).map_or(false, |prefix| prefix.ends_with('.'))
    })
    .map(|suffix| suffix.len())
    .max();

@claytonrcarter
Copy link
Contributor Author

I like where your head is at! To add to this, Rails has erb files with the extension *.html.erb.

@wassson Thank you! I think you should already be OK with *.html.erb because the ruby and erb lang extensions aren't competing for .erb. It would be an issue if you wanted to make a (eg) Markdown ERB extension that tried to use md.erb. In that case, your extension would be competing w/ the ERB lang ext over the .erb, and then this would definitely come into play. Or if, eg, the ERB files were instead named *.erb.html, thus competing with the HTML lang extension.

@wassson
Copy link

wassson commented May 11, 2024

@claytonrcarter haha duh yes that sounds right to me!

@claytonrcarter claytonrcarter changed the title Extensions: match language extensions based on longest matching path extension extensions: Select language extensions based on longest matching path extension May 11, 2024
@claytonrcarter claytonrcarter changed the title extensions: Select language extensions based on longest matching path extension extensions: Select language extension based on longest matching path extension May 11, 2024
@claytonrcarter
Copy link
Contributor Author

This isn't ready for merge, but it is ready for input from staff.

@claytonrcarter claytonrcarter marked this pull request as ready for review May 16, 2024 09:17
Copy link
Collaborator

@maxbrunsfeld maxbrunsfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a good change! I left some comments.

@@ -533,8 +542,10 @@ impl LanguageRegistry {
},
);
if path_matches_custom_suffix {
2
} else if path_matches_default_suffix || content_matches {
usize::MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user override needs to always win. I think we could use the longest match, but in the event of a tie, favor the user-defined path suffix.

@@ -533,8 +542,10 @@ impl LanguageRegistry {
},
);
if path_matches_custom_suffix {
2
} else if path_matches_default_suffix || content_matches {
usize::MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to implement that is to multiple the longest matching suffix's length by two and then add one if it's a user-defined suffix. Curious what you think of that.

let path_matches_custom_suffix = user_file_types
.and_then(|types| types.get(language_name))
.unwrap_or(&empty)
.iter()
.any(|suffix| path_suffixes.contains(&Some(suffix.as_str())));
.any(|suffix| filename.ends_with(suffix));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should use the same logic for finding the suffix length, for both the default path suffixes and the user-defined path suffixes.

I'd write it something like this:

let matching_suffix_len = config
    .path_suffixes
    .iter()
    .filter(|suffix| {
        filename == suffix ||
            filename.strip_suffix(suffix).map_or(false, |prefix| prefix.ends_with('.'))
    })
    .map(|suffix| suffix.len())
    .max();

@claytonrcarter claytonrcarter marked this pull request as draft May 25, 2024 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed The user has signed the Contributor License Agreement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconcile multiple language match by suffix length
4 participants