Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve did-you-mean suggestions by detecting swapped words #4997

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
44 changes: 40 additions & 4 deletions clap_builder/src/parser/features/suggestions.rs
Expand Up @@ -8,23 +8,43 @@ use crate::builder::Command;
/// Returns a Vec of all possible values that exceed a similarity threshold
/// sorted by ascending similarity, most similar comes last
#[cfg(feature = "suggestions")]
pub(crate) fn did_you_mean<T, I>(v: &str, possible_values: I) -> Vec<String>
pub(crate) fn did_you_mean<T, I>(input: &str, possible_values: I) -> Vec<String>
where
T: AsRef<str>,
I: IntoIterator<Item = T>,
{
let mut candidates: Vec<(f64, String)> = possible_values
.into_iter()
// GH #4660: using `jaro` because `jaro_winkler` implementation in `strsim-rs` is wrong
// causing strings with common prefix >=10 to be considered perfectly similar
.map(|pv| (strsim::jaro(v, pv.as_ref()), pv.as_ref().to_owned()))
.map(|pv| {
let confidence = if words_are_swapped(input, pv.as_ref()) {
0.9
} else {
// GH #4660: using `jaro` because `jaro_winkler` implementation in `strsim-rs` is wrong
// causing strings with common prefix >=10 to be considered perfectly similar
strsim::jaro(input, pv.as_ref())
};

(confidence, pv.as_ref().to_owned())
})
// Confidence of 0.7 so that bar -> baz is suggested
.filter(|(confidence, _)| *confidence > 0.7)
.collect();
candidates.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(Ordering::Equal));
candidates.into_iter().map(|(_, pv)| pv).collect()
}

#[cfg(feature = "suggestions")]
fn words_are_swapped(input: &str, candidate: &str) -> bool {
let input_words = input.split_once('-');
let candidate_words = candidate.split_once('-');
match (input_words, candidate_words) {
(Some((input1, input2)), Some((candidate1, candidate2))) => {
Comment on lines +40 to +41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just if a if let?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be ... I guess I personally prefer a match block but I can change it if you prefer if let.

input1 == candidate2 && input2 == candidate1
}
_ => false,
}
Comment on lines +38 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will only check for one-"word" swaps and only if the word separator is - (which it really should be) and won't handle typos within those swapped words.

The thing I'm tying to decide if this is too narrow of a niche to include
(and this is the type of discussion I prefer having in Issues before PRs are created)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... if you want to I can convert the PR to a draft and we can discuss this in an issue.

Personally I think mixing up the order of words of CLI arguments happens much more often for me than misspelling them. E.g. ripgrep has --type-not ... trying --not-type instead is a very easy mistake to make but clap currently suggests --no-pre. I think a small quality of life improvement like this across all clap CLIs would be nice ... sure this logic could be made more elaborate to also handle swaps in arguments consisting of e.g. 3 words but I think 2 word flags are much more common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to sit on this to give this more thought. I'm trying to weigh out the couple of specific cases this helps with vs where the line is for us supporting these one-off heuristics.

At this point, I would really like to move this discussion to an issue so we decouple the conversation on whether to do it and what we should support from this specific PR (as they can come and go).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #5029 :)

}

#[cfg(not(feature = "suggestions"))]
pub(crate) fn did_you_mean<T, I>(_: &str, _: I) -> Vec<String>
where
Expand Down Expand Up @@ -123,6 +143,22 @@ mod test {
);
}

#[test]
fn swapped_words_1() {
assert_eq!(
did_you_mean("write-lock", ["lock-write", "no-lock"].iter()),
vec!["no-lock", "lock-write"]
);
}

#[test]
fn swapped_words_2() {
assert_eq!(
did_you_mean("features-all", ["all-features", "features"].iter()),
vec!["features", "all-features"]
);
}

#[test]
fn flag_missing_letter() {
let p_vals = ["test", "possible", "values"];
Expand Down