Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most obvious CSV data is two years out of date #343

Open
nedbat opened this issue Mar 3, 2023 · 4 comments · May be fixed by #344
Open

Most obvious CSV data is two years out of date #343

nedbat opened this issue Mar 3, 2023 · 4 comments · May be fixed by #344

Comments

@nedbat
Copy link

nedbat commented Mar 3, 2023

The home page says:

CSV data
The data is available on Google Cloud Storage and can be downloaded via:

web browser: commondatastorage.googleapis.com/ossf-criticality-score/index.html

That page has handy per-language files, but they are dated 2020-12-30. Newer data should be made easier to find, or at least stale data should be removed as an attractive nuisance.

nathannaveen added a commit to nathannaveen/criticality_score that referenced this issue Mar 3, 2023
- Fixes ossf#343

Signed-off-by: nathannaveen <42319948+nathannaveen@users.noreply.github.com>
@nathannaveen nathannaveen linked a pull request Mar 3, 2023 that will close this issue
@calebbrown
Copy link
Contributor

I've rearranged the objects in the bucket - does this help?

@nedbat
Copy link
Author

nedbat commented Mar 5, 2023

Are the files in "archive" the same 2020 files? It helps in that the old files are now in "archive", but now their dates are 2023, which is itself misleading. Is there a reason to keep the old files at all? Why not produce "top 200" files for current data?

@calebbrown
Copy link
Contributor

I've put them in a folder roughly correlating to the date they were originally created.

As for producing "top 200" files for current data - I'm interested in how you might be using these.

I had been leaning towards not producing top-200 sets for each language group, and just supplying a script for producing them locally.

However if the top-200 sets are providing value, I'm more than happy to work on getting these produced automatically.

@nedbat
Copy link
Author

nedbat commented Mar 5, 2023

TBH, I'm new to this data set, and am not sure how I would use the data. I wrote this issue as some feedback from a new user trying to understand the data set. The link from the README sounds enticing, then I am looking at a raw web server page with old files. My suggestion is simply to present the data you value in a way that makes it easy for people to find it and understand it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants