Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting graph from requirements.txt #247

Open
abitrolly opened this issue Mar 9, 2021 · 11 comments
Open

Getting graph from requirements.txt #247

abitrolly opened this issue Mar 9, 2021 · 11 comments

Comments

@abitrolly
Copy link

With #237 in place, how to use deps to get a graph from my requirements/base.txt?

I don't see any deps options to feed that in.

Usage:
  deps get dependencies [flags]
  deps get dependencies [command]

Aliases:
  dependencies, dependency

Examples:
deps get dependencies -l go -o github.com -m depscloud/api
deps get dependencies -l go -n github.com/depscloud/api

Available Commands:
  topology    Get the associated topology

Flags:
  -h, --help                  help for dependencies
  -l, --language string       The language of the module
  -m, --module string         The name of the module
  -n, --name string           The name of the module
  -o, --organization string   The organization of the module
@mjpitz
Copy link
Member

mjpitz commented Mar 9, 2021

This has been merged, but has not been released. Once 0.3.0 is released (should be later this week) you will be able to query for python dependencies. Here are a few examples:

deps get dependencies -l python -n my-python-app
deps get dependents -l python -n my-python-library

There are some limitations with the way requirements.txt currently works. For example, requirements.txt has no way of signaling what the current name of the library is so we assume that it's the name of the repository. This approach breaks down for python monorepos and could definitely use some improvements. We also support Pipfiles which does allow for the library name to be specified. You'd be able to query for them with the same commands above.

@abitrolly
Copy link
Author

abitrolly commented Mar 9, 2021

Why the library name is needed? This project is a plain Django website.

I'd prefer to specify path to the dependencies list explicitly. Otherwise I don't see how deps can discover that the correct file is requirements/base.txt and not requirements.txt.

@mjpitz
Copy link
Member

mjpitz commented Mar 9, 2021

| Why the library name is needed? This project is a plain Django website.

You only need the library name if you're trying to find all consumers of a library. For example, show me all projects that are using django (the common case for this project). Querying for dependencies (i.e. the libraries that an application or other library may consume) isn't as useful because much of that is defined in your repo using files like pom.xml or package.json.

| I'd prefer to specify path to the dependencies list explicitly.

This breaks from common patterns seen in every other language. Keep in mind, this project supports languages other than python. I do think there is room to improve the support for python (specifically around requirements.txt) but it's such a small* edge case compared to every other language that provides canonical library/application names in their manifest files. (e.g. package.json has name, pom.xml has groupId and artifactId so on and so forth.) Another factor to keep in mind here too is the case of monorepos where a single repository might contain numerous manifest files (something the current requirements.txt approach does not handle well).

| Otherwise I don't see how deps can discover that the correct file is requirements/base.txt and not requirements.txt.

You bring up a good point here. Right now, depscloud makes assumptions about the way dependencies are defined (e.g. using common file names). We don't support non-standard approaches (such as requirements/base.txt) right now, or even cases where requirements.txt imports other requirements.txt files. A future improvement could be to add support for a depscloud.yaml file that can help direct the tooling to the appropriate files and fill in the gaps

@abitrolly
Copy link
Author

Querying for dependencies (i.e. the libraries that an application or other library may consume) isn't as useful because much of that is defined in your repo using files like pom.xml or package.json.

Trying to find out which on the packages in requirements.txt pulls out the outdated dependency is exactly the problem I was trying to solve. It even came twice this week - one for Rails project and another for Django one. So I wouldn't say it is not as useful. )

Right now, depscloud makes assumptions about the way dependencies are defined (e.g. using common file names). We don't support non-standard approaches (such as requirements/base.txt) right now

Is it possible to separate manifest discovery strategy from manifest processing? Then it is possible to write custom discovery scripts that could invoke the latter stage directly.

@mjpitz
Copy link
Member

mjpitz commented Mar 9, 2021

Is it possible to separate manifest discovery strategy from manifest processing?

We already kinda do this. When a repository is cloned, we call a match endpoint on the extractor process which tells us which of the files it supports processing. This endpoint is largely driven by globs that say processes files matching this pattern (here's the match config for requirements.txt: https://github.com/depscloud/depscloud/blob/main/extractor/src/extractors/RequirementsTxtExtractor.ts#L60-L65).

My thought was a depscloud.yaml file would be an easy way to override the behavior of the discovery cases. Another thing I had considered early on (and what ultimately led me to typescript for the extractor service) was that it would be relatively easy to support a plugin-based approach so people could customize the extractor by installing their own plugins / overriding the defaults. This thought was more to handle the case where companies might have their own internal dependency management tool, but it would also work for overriding / customizing behavior based on a companies use of a technology. For example, I know some folks who use gradle in a very atypical way and right now, the gradle inference really doesn't help them.

@abitrolly
Copy link
Author

Are those extractors integrated into deps? Or they are server-side only?

I am thinking about a way to invoke them from command line and pass includes as params.

@mjpitz
Copy link
Member

mjpitz commented Mar 10, 2021

Those are server-side in the extractor service. There is a little bit of complexity in the way that the indexer process sends data to the server that could probably be simplified (i.e. needing to account for both unix and windows based file paths). The extractor is responsible for normalizing those paths to unix to simplify some of matching logic in the extractor.. Some of those internal interfaces could definitely be improved and made configurable to handle more company-specific variants. Since the expectation is that companies.

@abitrolly
Copy link
Author

Perhaps I should try to run the indexer standalone and see if it can be patched to dump the message it sends to extractor to console in JSON.

@abitrolly
Copy link
Author

I found I could run the extractor image. Now how to make it analyze https://github.com/jonmatthis/freemocap for dependencies?

$ podman run -it depscloud/extractor -h

> @depscloud/extractor@0.3.5 start
> node lib/main.js "-h"


   extractor undefined 

   USAGE

     main.js 

   OPTIONS

     --bind-address <bindAddress>        the ip address to bind to                               optional      
     --http-port <port>                  the port to run http on                                 optional      
     --grpc-port <port>                  the port to bind to                                     optional      
     --port <port>                       the port to bind to                                     optional      
     --tls-key <key>                     the path to the private key used for TLS                optional      
     --tls-cert <cert>                   the path to the certificate used for TLS                optional      
     --tls-ca <ca>                       the path to the certificate authority used for TLS      optional      
     --disable-manifests <manifest>      the manifests to disable support for                    optional      
     --log-level <level>                 configures the level at with logs are written           optional      
     --log-format <format>               configures the format of the logs (console / json)      optional      

   GLOBAL OPTIONS

     -h, --help         Display help                                      
     -V, --version      Display version                                   
     --no-color         Disable colors                                    
     --quiet            Quiet mode - only displays warn and error messages
     -v, --verbose      Verbose mode - will also output debug messages    

@mjpitz
Copy link
Member

mjpitz commented Aug 25, 2021

I could definitely use some better guides for folks, but one of the easier things to do might be to pull this up in something like docker or kubernetes. These guides outline how to deploy the subsystem, update the configuration, and inspect the API. You should be able to follow these and use your user name jonmatthis when it comes to updating the configuration.

Docker Guide - https://deps.cloud/docs/deploy/docker/
Kubernetes Guide - https://deps.cloud/docs/deploy/k8s/

@abitrolly
Copy link
Author

I've already tried to use extractor from the container. The guides could help to patch the binaries to be self-sufficient, so that people can discover what they can do without referencing to docs. This mean adding server independent CLI that produces JSON or CSV with the results, and request the missing params from the user.

The stumbling block for me personally is understanding how to add subcommands with params in Go CLI programs. Each time I tried it was quite non-trivial and barely readable if you haven't spent a day or two with some CLI framework that was used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants