Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improvements to known CPE index construction #2801

Merged
merged 2 commits into from Apr 23, 2024
Merged

Conversation

westonsteimel
Copy link
Contributor

@westonsteimel westonsteimel commented Apr 23, 2024

Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways

There was also a bug with the struct definition that caused only the final reference url in the list to be unmarshaled and considered when constructing the index

@westonsteimel westonsteimel added the bug Something isn't working label Apr 23, 2024
@westonsteimel westonsteimel enabled auto-merge (squash) April 23, 2024 10:11
Previously when building the known CPE index, there was logic to
de-duplicate processing based on the normalized CPE name; however, this
means a significant number of known CPE's don't get indexed because the
first instance of that name didn't have a supported collection url but a
later one did.  This isn't code that executes at runtime in syft so
de-duplicating the processing for performance isn't really necessary
here and it doesn't add much to the total runtime anyways

Signed-off-by: Weston Steimel <commits@weston.slmail.me>
@westonsteimel westonsteimel changed the title fix: stop pre-filtering potential known CPE URLs fix: improvements to known CPE index construction Apr 23, 2024
Previously the struct definition for CpeItem caused only the last URL
reference in the list to be kept and processed for inclusion in the
index

Signed-off-by: Weston Steimel <commits@weston.slmail.me>
@@ -3,8 +3,8 @@ package main
type CpeItem struct {
Name string `xml:"name,attr"`
Title string `xml:"title"`
References []struct {
Reference struct {
References struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonsteimel can you help me understand this change?

Why is it more correct to model references as a struct holding a slice of structs than just as a slice of structs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the previous struct definition we only got a single url (the final one from the list) after unmarshalling. If there is a better way to do this, I'm happy to update this, but the go xml unmarshalling examples I found all seemed to show this was the way to make it work

@westonsteimel westonsteimel merged commit 891e61a into main Apr 23, 2024
11 checks passed
@westonsteimel westonsteimel deleted the fix-cpe-indexing branch April 23, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants