Drop anemone and use Spidr for Repo discovery #10947

sjha4 · 2024-03-22T03:26:33Z

To Do:

Support basic auth
Support proxy

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

bundle install
Go to Content > Product > Repo discovery
Run repo discovery.

sjha4 · 2024-03-22T03:40:52Z

@evgeni : Thoughts on spidr gem as replacement for anemone?

evgeni · 2024-03-22T06:09:21Z

Doesn't look crazy? Only dep is nokogiri, which we have anyway, tested on modern rubies. Why not.

app/lib/katello/repo_discovery.rb

sjha4 · 2024-05-21T15:40:45Z

I am seeing significant performance difference with what's on the PR vs the existing workflow..Looking at ways to speed this up..Will push updates when I get the performance sorted.

Update: Should be good to go with latest commit.
Able to see chunked output in repo discovery page and the task finishes in about the same time as earlier.

app/lib/katello/repo_discovery.rb

katello.gemspec

ekohl

Implementation wise I think you should separate the crawler and Docker search into separate classes. Perhaps even the file crawl as well. Right now it's confusing.

app/lib/katello/repo_discovery.rb

katello.gemspec

ekohl

I think this heads in the right direction.

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

app/lib/katello/resources/discovery/file_discovery.rb

sjha4 · 2024-06-03T17:28:22Z

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

The anemone -> spidr change is localized to yum_discovery only right now as far a changes go. My 2 cents is it's a small enough change to be in one PR?

ekohl

Review wise I always prefer to first have a refactor and then an actual code change. Right now if I read the commits then it's the other way around. That makes this whole change harder to follow, but I don't have commit permissions here so I'll let other reviewers weigh in on that aspect.

ekohl · 2024-06-06T13:33:49Z

app/lib/katello/resources/discovery/container_discovery.rb

+                   upstream_credentials_and_search = {
+                     upstream_username: nil,
+                     upstream_password: nil,
+                     search: '*'


This will be nil if upstream_credentials_and_search is overridden by a caller. If you want it to default to * you can use @search = upstream_credentials_and_search.fetch(:search, '*').

ekohl · 2024-06-06T13:38:53Z

app/lib/katello/resources/discovery/yum_discovery.rb

+      @upstream_username = upstream_credentials_and_search[:upstream_username].empty? ? nil : upstream_credentials_and_search[:upstream_username]
+      @upstream_password = upstream_credentials_and_search[:upstream_password].empty? ? nil : upstream_credentials_and_search[:upstream_password]


I know it was already this way, but isn't this equivalent?

Suggested change

@upstream_username = upstream_credentials_and_search[:upstream_username].empty? ? nil : upstream_credentials_and_search[:upstream_username]

@upstream_password = upstream_credentials_and_search[:upstream_password].empty? ? nil : upstream_credentials_and_search[:upstream_password]

@upstream_username = upstream_credentials_and_search[:upstream_username].presence

@upstream_password = upstream_credentials_and_search[:upstream_password].presence

ekohl · 2024-06-06T13:40:10Z

app/lib/katello/repo_discovery.rb

@@ -1,206 +1,33 @@
 require 'uri'
+require 'spidr'


I think this should live in the yum class

pr-processor bot added Not yet reviewed Waiting on contributor labels Mar 22, 2024

sjha4 force-pushed the anemone branch 2 times, most recently from cc6dee6 to 934292d Compare March 22, 2024 16:48

pr-processor bot removed the Waiting on contributor label Mar 22, 2024

jeremylenz reviewed May 14, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

ekohl reviewed May 16, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

sjha4 force-pushed the anemone branch from 934292d to 759e66d Compare May 21, 2024 16:57

github-actions bot added the Packaging Change label May 21, 2024

sjha4 changed the title ~~Early Draft - Drop anemone and use Spidr~~ Drop anemone and use Spidr for Repo discovery May 21, 2024

sjha4 force-pushed the anemone branch from 759e66d to 5834448 Compare May 21, 2024 16:58

sjha4 marked this pull request as ready for review May 21, 2024 16:59

evgeni reviewed May 21, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

katello.gemspec Outdated Show resolved Hide resolved

sjha4 force-pushed the anemone branch 2 times, most recently from d96f3f0 to 1d7766a Compare May 21, 2024 17:56

ekohl reviewed May 21, 2024

View reviewed changes

sjha4 force-pushed the anemone branch from 2c86dc5 to 7abf885 Compare May 30, 2024 19:24

ekohl reviewed Jun 3, 2024

View reviewed changes

app/lib/katello/resources/discovery/file_discovery.rb Outdated Show resolved Hide resolved

app/lib/katello/resources/discovery/file_discovery.rb Outdated Show resolved Hide resolved

sjha4 force-pushed the anemone branch from 7abf885 to 4c7c721 Compare June 3, 2024 17:24

sjha4 added 3 commits June 5, 2024 18:05

Fixes #37159 - Drop anemone and use Spidr for repo discovery

de85205

Refs #37159 - Improve discovery performance and add gem dependency

a222859

Refs #37159 - Refactor content specific discoveries

ac7f250

sjha4 force-pushed the anemone branch from 4c7c721 to ac7f250 Compare June 5, 2024 18:05

ekohl reviewed Jun 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop anemone and use Spidr for Repo discovery #10947

Drop anemone and use Spidr for Repo discovery #10947

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 •

edited

ekohl left a comment

ekohl left a comment

sjha4 commented Jun 3, 2024

ekohl left a comment

ekohl Jun 6, 2024

ekohl Jun 6, 2024

ekohl Jun 6, 2024

		@upstream_username = upstream_credentials_and_search[:upstream_username].empty? ? nil : upstream_credentials_and_search[:upstream_username]
		@upstream_password = upstream_credentials_and_search[:upstream_password].empty? ? nil : upstream_credentials_and_search[:upstream_password]

Drop anemone and use Spidr for Repo discovery #10947

Are you sure you want to change the base?

Drop anemone and use Spidr for Repo discovery #10947

Conversation

sjha4 commented Mar 22, 2024 • edited

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 • edited

ekohl left a comment

Choose a reason for hiding this comment

ekohl left a comment

Choose a reason for hiding this comment

sjha4 commented Jun 3, 2024

ekohl left a comment

Choose a reason for hiding this comment

ekohl Jun 6, 2024

Choose a reason for hiding this comment

ekohl Jun 6, 2024

Choose a reason for hiding this comment

ekohl Jun 6, 2024

Choose a reason for hiding this comment

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented May 21, 2024 •

edited