Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps doesn't report errors #341

Open
abitrolly opened this issue Aug 20, 2021 · 5 comments
Open

deps doesn't report errors #341

abitrolly opened this issue Aug 20, 2021 · 5 comments

Comments

@abitrolly
Copy link

Is your feature request related to a problem? Please describe.

I'm always frustrated when deps command fails to return result, and I don't know why.

$ deps get dependencies -l python -n https://github.com/jonmatthis/freemocap
$ deps get dependencies -l python -n jonmatthis/freemocap 
$ deps get dependencies -l python -n @jonmatthis/freemocap

Describe the solution you'd like

Show a message to stderr when project can not be found, and explain a reason why. Or show that parameters are invalid.

Describe alternatives you've considered

Add a --debug flag that will output communication with server, what server does and when it fails.

Additional context

System V: {baseURL: https://api.deps.cloud, os: linux, arch: amd64}
Client Version: {"version":"0.3.5","commit":"4d94a021e15d60f20ede00df7de2fd9149159928","date":"2021-06-07T13:46:53Z"}
Server Version: {"version":"0.3.5","commit":"4d94a021e15d60f20ede00df7de2fd9149159928","date":"2021-06-07T13:46:53Z"}
Server Health: {"state":"ok","currentHP":1,"timestamp":"2021-08-18T19:55:03.501555817Z"}
@mjpitz
Copy link
Member

mjpitz commented Aug 25, 2021

Just a heads up, the public deps.cloud API is by no means complete (currently). I had plans to actually have a public API, but the cost to run that is a bit outside of my budget right now.

Python is also an interesting case. For example, you use requirements.txt, but there isn't a great way to infer library or application name from the file. So we just use the repo name:

$ deps get dependencies -l python -n freemocap

If you use a Pipfile, we can more reliably extract a library name instead of inferring it based on repo metadata. I've thought about trying to parse out little bits of metadata from setup.py, but that felt like a lot more work when most dependency management systems tag their library / application.

@abitrolly
Copy link
Author

I imagine the cost would be high if it is a full repo download for parsing. From the FinOps point of view it would still be interesting to know the cost for parsing specific repos. For personal use it is more interesting to execute parts of the pipeline indenendenly.

$ deps get dependencies -l python -n freemocap
$ echo $?
0

This doesn't work either.

setup.py parsing is hard, because it is quite often pure Python logic. There is a intention instead to extract dependency info from packages uploaded to PyPI in pypi/warehouse#8254 (comment)

@mjpitz
Copy link
Member

mjpitz commented Aug 26, 2021

I imagine the cost would be high if it is a full repo download for parsing.

Right now, we shallow clone the default branch but we have talked about having it run against more than that. I'm currently digging into adding support for common artifactories (like jfrog and sonatype nexus3). To support them properly, I'd want to support indexing all versions that are in the artifactory. Some of this has me wondering if I need to move away from having versions on edges and having them part of the node structure (and even more specifically as part of the key). This would keep the traversal logic straightforward and could be a relatively seamless migration. (I might have just talked myself into doing this.)

From the FinOps point of view it would still be interesting to know the cost for parsing specific repos. For personal use it is more interesting to execute parts of the pipeline indenendenly.

100%. I wanted to execute specific parts of the pipeline for a few reasons. One, it allows people to better understand out the ecosystem extracts information from their systems. Two, it allows for non-git type integrations. For example someone could hook it into their CI/CD system, write a custom SVN/perforce crawler, etc.

This doesn't work either.

I figured it wouldn't. I don't error when we don't find a record. The current json-stream output was intended to be temporary (I kinda threw the CLI together over night and wasn't worries about data presentation 😅 ). There's a ticket out there for adding better output for data. Seems like we could use a specific error code to indicate no data.

Edit: For reference, grep returns a non-zero code when no results are found so I don't see any harm in following suit.

@abitrolly
Copy link
Author

abitrolly commented Aug 26, 2021

Some of this has me wondering if I need to move away from having versions on edges and having them part of the node structure (and even more specifically as part of the key). This would keep the traversal logic straightforward and could be a relatively seamless migration.

I think about graph of package names and graph of package names with version as of two layers. Meaning if I have to write just the list of nodes down to denormalized table, it would be one column for the first layer, and two columns for the second layer. The first layer when there is no version information and just packages could easily be derived from the list of dependencies without versions. And then we have a problem that dependency links change over time, so we need versions to track time across one package evolution in consecutive discrete way. Do you propose that instead of two columns (name, version) there is just one bug column (name:version)? I imagine version constraints as kind of compression scheme, because storing all possible connections between all existing versions will probably bring a cambrian explosion to database. But with two layered graph, the solution to solve version constrains on top of already connected package names seems more manageable. From the other side deep learning is all about cambrian explosions, so maybe it will be effective to denormalize everything.

Can't say how it should work until I can see this visualized.

Edit: For reference, grep returns a non-zero code when no results are found so I don't see any harm in following suit.

I think it is also safe to print human readable errors to stderr as it won't be captured by stdout redirection. Not sure about the best practices.

@mjpitz
Copy link
Member

mjpitz commented Sep 13, 2021

Denormalizing everything has a lot of tradeoffs to it. Before starting on this approach. I spent a few months pulling some denormalized options together but wound up not being satisfied with the number of tables I needed to join through. Definitely a useful exercise though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants