Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding search #3774

Closed
wants to merge 15 commits into from
Closed

Add embedding search #3774

wants to merge 15 commits into from

Conversation

KTibow
Copy link
Contributor

@KTibow KTibow commented Apr 21, 2024

Description:

  • Shows 3 recommended pages when you start typing in the search box
    image

  • Fast (also doesn't require hitting "go"), better at finding some pages (light -> light component is first result, internet -> shows internet-related components), although doesn't replace the current stuff

  • Uses glove-based embeddings, shipping directly to the browser, no 3rd party. The current search index is ~250kb compressed; this index is ~300kb compressed, including a number of words that might be typed.

    • Too big? Let me know and I'll try to ship less
    • Only should load data upon using the search bar? Let me know and I'll implement that

Merge first: #3773
Related issue (if applicable): N/A
Pull request in esphome with YAML changes (if applicable): N/A

Checklist:

  • N/A I am merging into next because this is new documentation that has a matching pull-request in esphome as linked above.
    or
  • I am merging into current because this is a fix, change and/or adjustment in the current documentation and is not for a new component or feature.
  • N/A Link added in /index.rst when creating new documents for new components or cookbook.

Copy link

netlify bot commented Apr 21, 2024

Deploy Preview for esphome ready!

Name Link
🔨 Latest commit 07dd59f
🔍 Latest deploy log https://app.netlify.com/sites/esphome/deploys/663af3c1cd036b00083000a4
😎 Deploy Preview https://deploy-preview-3774--esphome.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@nagyrobi
Copy link
Member

nagyrobi commented Apr 21, 2024

A few quick thoughts:

  • remove changelogs from the search index
  • some kind of mapping between I²C - I2C, as people won't be filling the search box with superscripts, but would like to get the results
  • maybe prioritize title results to appear first, and text body result appear lower

@KTibow
Copy link
Contributor Author

KTibow commented Apr 21, 2024

I had some of the same thoughts, but found out I couldn't implement them.

Right now, I'm only searching on the titles. This isn't ideal, but I'm using a simple weighted average to turn word embeddings into phrase embeddings, so including the body might drown out the title. Plus, the fact that we're using embeddings means that related terms will still work.

It turns out Glove doesn't have a word embedding for i2c. I should see if the original tokenizer was tokenizing it differently.

@nagyrobi
Copy link
Member

Try "graph". Changelogs in the results are really annoying. Graph Component by itself doesn't appear in the list.

Maybe it would worth to take ref instances into account separately.

@KTibow
Copy link
Contributor Author

KTibow commented Apr 21, 2024

Because the graph component isn't its own page, it doesn't show up as a separate result in either this search or the default Sphinx search. (It would be nice if the subheaders like "light effects", "graph component", "pin" showed up, but that would probably require hardcoding or something)
You are using the embedding search (the one that shows while you're typing) though, right? I can't get changelogs to show up by searching for "graph" with it.

Maybe it would worth to take ref instances into account separately.

That sounds interesting, but I'm not sure what that would look like in practice. Could you elaborate?

@nagyrobi
Copy link
Member

I just type in what I search for, and if there's nothing relevant showing up while typing, I press the button.

I would expect to have a more detailed set of relevant results, without changelog entries, etc., based on similar criteria as the popups I got while typing.

In my mind the two are one.

@KTibow
Copy link
Contributor Author

KTibow commented Apr 21, 2024

I understand where you're coming from, you would expect them to act similarly. However, this PR is just an incremental change, only adding a search while typing; the more detailed search is exactly the same to the one currently on https://esphome.io/. I could hook into Sphinx's search logic, and add some custom filtering and sorting, but I would prefer to just edit the search while typing.

@nagyrobi
Copy link
Member

I understand that. But users will see is holistically, it would really help to have a consistent experience.

@RubyBailey
Copy link
Contributor

RubyBailey commented Apr 22, 2024

I think anything to improve searching esphome would be beneficial.
With this PR the search doesn’t appear to produce the results I’d expect. With a search that is based on the title then I’d expect “abp” (either lowercase or caps) to show the Honeywell ABP and ABP2 components. Yet it shows neither one. “Honeywell” finds both. “Pressure” finds neither and only lists one component with the word “pressure” in the title.

@KTibow
Copy link
Contributor Author

KTibow commented Apr 22, 2024

Looks like if I increase the embedding dimension to 50 the performance gets better. I'll have to see how I can increase the dimensions while not shipping too many word embeddings.

@RubyBailey
Copy link
Contributor

It’s somewhat better, yet “Pressure” still doesn’t show 3 components whose titles include the word pressure.

@KTibow
Copy link
Contributor Author

KTibow commented Apr 22, 2024

@RubyBailey for some reason one component is using "co_" (which we have no embeddings for) instead of "co", making it embed badly. optimally it would just use "co", but i just turned the "co_" token into the "co" token, which fixes the problem

@KTibow
Copy link
Contributor Author

KTibow commented May 23, 2024

Today I was going to update this to the newest embeddings when I saw that 4 days ago another PR was merged to replace the search system with something else. Oh well.

@KTibow KTibow closed this May 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants