Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: suggest close matches using Levenshtein distance [POC] #836

Closed
wants to merge 7 commits into from

Conversation

dougbacelar
Copy link

What:
Inspired by #582

This provides a way to suggest close matches to users when the query cannot find any elements.

render(<div data-testid="cat" />)
screen.getByTestId('kat');

// output
`Unable to find an element by: [data-testid="kat"]. Did you mean one of the following?
cat`

This is a POC, looking to gather feedback to see if its worth it pursuing.

Why:
Might make it easier to debug, specially when there are typos in the queries or when a certain element name has changed slightly.

How:
query by attribute:

  1. iterate through all elements and calculate close matches
  2. keep only matches that are the closest to the search string, keep all that are the same distance

calculate close matches:

  1. initialise a dynamic programming table of size MxN where M = element text length and N = search string length
  2. use the dp table above to calculate the Levenshtein distance between the element text and the search string

Note: this was implemented only on the byTestId query for now and behind a computeClosetMatches flag

Checklist:

  • Documentation added to the
    docs site
  • Tests
  • Typescript definitions updated
  • Ready to be merged

src/queries/test-id.js Outdated Show resolved Hide resolved
@codesandbox-ci
Copy link

codesandbox-ci bot commented Nov 21, 2020

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 8ee261f:

Sandbox Source
react-testing-library-examples Configuration

@dougbacelar dougbacelar changed the title feat: suggest query close matches using Levenshtein distance [POC] feat: suggest close matches using Levenshtein distance [POC] Nov 21, 2020
@codecov
Copy link

codecov bot commented Nov 21, 2020

Codecov Report

Merging #836 (8ee261f) into master (c6e7a83) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #836   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           26        27    +1     
  Lines          934       965   +31     
  Branches       286       298   +12     
=========================================
+ Hits           934       965   +31     
Impacted Files Coverage Δ
src/config.js 100.00% <ø> (ø)
src/close-matches.js 100.00% <100.00%> (ø)
src/queries/test-id.js 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6e7a83...8ee261f. Read the comment docs.

Copy link
Member

@kentcdodds kentcdodds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I'm thinking this would be pretty useful. Good progress so far.

src/close-matches.js Outdated Show resolved Hide resolved
src/queries/test-id.js Outdated Show resolved Hide resolved
Comment on lines +22 to +25
const closeMatches =
!computeCloseMatches || typeof id !== 'string'
? []
: getCloseMatchesByAttribute(getTestIdAttribute(), c, id, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this increasing the performance issues we already have with find* queries which are expected to fail at least once. Any chance we could lazily calculate this value so it's only run when the error is actually displayed? I don't know whether this is possible.

But perhaps my concern is unwarranted? Maybe this is faster than I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this increasing the performance issues we already have with find* queries which are expected to fail at least once.

Good point. I tested a few times on my local running getByTestId('search', {computeCloseMatches: true}) vs getByTestId('search', {computeCloseMatches: false}).

Not a reliable benchmark, but when false it finished consistently at around 20ms. When true finished at around 35ms. It increases with the number of elements found, but not by much. Might become slightly quicker with the recommended lib.

Any chance we could lazily calculate this value so it's only run when the error is actually displayed?

An alternative would be to throw functions for find* queries. And then check if typeof lastError === 'function' when waitFor times out.

That might mean decoupling find* and get* queries a bit or perhaps make get queries depend on find queries instead of the other way around.

i.e: a get* query could be a find* query with {timeout: 0, interval: Infinity} ? (not sure if that would work)

Since that seems a bit involved, we could start with a config computeCloseMatches that defaults to false and give that some testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with giving it a trial run and seeing what real-world experience with it will be like, so defaulting to disabled makes sense to me. Let's try it out using leven.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kentcdodds do you think we should start with the testId query or add this feature to all 10 queries? Wondering if i can break down the work somehow...

Re. the performance concern. Maybe if we realise there is a huge performance impact we can skip the computation of close matches on find* queries(similar to what was done for the role query here: #590

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. What does everyone else think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if the default is false, we can give it a try in more queries, though, this seems to me like a configuration that shouldn't be set to true by default at all, especially since it has a performance impact. I see this as something that will not be helpful in CI for example and will only take more time so if the developer wants to opt in they will have an option to do that.
Putting aside what I wrote above, I really like this PR and do think it can have a valuable impact so thanks for this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please try a benchmark again using the new leven implementation?

@dougbacelar dougbacelar closed this by deleting the head repository Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants