Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Repology Updater #54

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

WIP: Repology Updater #54

wants to merge 1 commit into from

Conversation

captn3m0
Copy link
Member

What it will do:

Repository Level (This is entirely TODO)

  1. Maintain a list of repology repository prefixes that are relevant
  2. Generate a list of source package directory URLs from (1)
  3. Fetch the list of packages and keep it locally.

Package Level (Some of this is done)

  1. Fetch the list of repology identifiers for a product
  2. These are used to fetch the relevant repology project
  3. For each product, filter the list of packages by a list of repository prefixes
  4. For each such repository, use the list of packages generated above and use it to generate a comprehensive list of PURLs (TODO)
  5. Finally, save the list of PURLs to disk

The final version should deliver a clear and comprehensive list of PURLs for a given product, where each PURL represents the latest version of a package available on a specific distribution channel (not necessarily linux distro).

These PURLs can then be used to augment scan results, by generating feeds for scanning products. The usecase could be:

  1. Use type/namespace/name to check if product is in our database
  2. Use the version against our list from above to see if it is the latest version available on that channel. Give warning if not.
  3. If it is the latest version, check to see if the latest version is considered supported. Additionally, use the channel's support status as well (such as debian support dates, repository information) to provide clear guarantee of support.

Depending on results from 1,2,3: return a vulnerability rating. Most of the scanning part can perhaps be done by existing scanners, so we are looking to bootstrap this by generating a "feed" instead.

Feed Details:

  1. A vulnerability feed typically contains information about known vulnerabilities in various products, using package name, channel, and version ranges.
  2. We can generate such a feed from our PURLs and EOL API. Each unsupported release cycle can be used to craft a "pseudo"-vulnerability that triggers on unsupported versions being detected.
  3. The feed will need a lot of exceptions for supported packages on various channels, which is why we need to do repology scraping

What it will do:

Repository Level:

1. Maintain a list of repology repository prefixes that are relevant
2. Generate a list of source package directory URLs from (1)
3. Fetch the list of packages and keep it locally.

Package Level:

1. Fetch the list of repology identifiers for a product
2. These are used to fetch the relevant repology project
3. For each product, filter the list of packages by a list of repository
   prefixes
4. For each such repository, use the list of packages generated above
   and use it to generate a comprehensive list of PURLs
5. Finally, save the list of PURLs to disk

The final version should deliver a clear and comprehensive list of PURLs
for a given product, where each PURL represents the latest version of a
package available on a specific distribution channel (not necessarily
linux distro).

These PURLs can then be used to augment scan results, by generating
feeds for scanning products. The usecase could be:

1. Use type/namespace/name to check if product is in our database
2. Use the version against our list from above to see if it is the
   latest version available on that channel. Give warning if not.
3. If it is the latest version, check to see if the latest version
   is considered supported. Additionally, use the channel's support
   status  as well (such as debian support dates, repository information)
   to provide clear guarantee of support.

Depending on results from 1,2,3: return a vulnerability rating. Most of
the scanning part can perhaps be done by existing scanners, so we are
looking to bootstrap this by generating a "feed" instead.

Feed Details:

1. A vulnerability feed typically contains information about known
   vulnerabilities in various products, using package name, channel, and
   version ranges.
2. We can generate such a feed from our PURLs and EOL API. Each
   unsupported release cycle can be used to craft a
   "pseudo"-vulnerability that triggers on unsupported versions being
   detected.
3. The feed will need a lot of exceptions for supported packages on
   various channels, which is why we need to do repology scraping
@captn3m0 captn3m0 marked this pull request as draft December 23, 2022 05:23
@captn3m0
Copy link
Member Author

Found out that this was a lot more work than I'd expected, due to my flawed understanding of what all repology tracked. Repology tracks source-packages, where it can, to reduce effort and make tracking easier. This works, since repology is more interested in tracking "what version of a package is available in a repository" rather than "all the various ways this package can be installed".

We're interested in the latter (we want a SBOM -> package -> PURL -> product lookup). But for that, we need an exhaustive list of all packages that are built from a source-package. This happens in many cases, but most prominently in the case of debian and rpm based distros.

For eg, https://repology.org/api/v1/project/zookeeper has a single entry for debian bookworm. That entry links it to the zookeeper source package, which is listed at https://packages.debian.org/bookworm/source/zookeeper

That itself gets built into 10 separate binary packages, which are all those we actually want to track. It is in generating this mapping that I'm working on currently - this involves parsing the package files across all distros, and took some effort.

Got it working for DEB distros.

@noqcks
Copy link
Sponsor Contributor

noqcks commented Dec 31, 2022

Doing some investigation into MongoDB as an example.

https://repology.org/api/v1/project/mongodb

For Ubuntu, the packages installed are from repo.mongodb.org

Get:1 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-database-tools amd64 100.6.1 [48.0 MB]
Get:2 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-mongosh amd64 1.6.1 [37.7 MB]
Get:3 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-shell amd64 6.0.3 [3,080 B]
Get:4 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-server amd64 6.0.3 [28.9 MB]
Get:5 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-mongos amd64 6.0.3 [20.3 MB]
Get:6 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-database-tools-extra amd64 6.0.3 [7,752 B]
Get:7 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-database amd64 6.0.3 [3,540 B]
Get:8 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-tools amd64 6.0.3 [2,892 B]
Get:9 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org amd64 6.0.3 [2,932 B]

But repology has no knowledge of this package existing in this repo. Would resolving this be as simple as adding a new repository to repology and then finding the binaries installed from the repo package?

@captn3m0
Copy link
Member Author

captn3m0 commented Jan 1, 2023

Since this is a small list, we could easily add static PURLs for all of these. We could scan the repo as well, but that only makes sense for larger significant repositories.

@noqcks
Copy link
Sponsor Contributor

noqcks commented Jan 2, 2023

@captn3m0 do you have WIP commits on this branch you could push?

Might be able to work in parallel here. I can tackle searching packages in other distros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants