Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor updates #4

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Minor updates #4

wants to merge 6 commits into from

Conversation

tasdomas
Copy link

Import scikit-learn directly.
Use updated cml cli.

@tasdomas tasdomas changed the title Minor updates. Minor updates Oct 4, 2022
@tasdomas tasdomas force-pushed the update branch 4 times, most recently from ffd0b5b to 34e5a2c Compare October 10, 2022 12:59
@tasdomas tasdomas force-pushed the update branch 4 times, most recently from 4d0829a to b973927 Compare October 10, 2022 14:42
.github/workflows/cml.yaml Outdated Show resolved Hide resolved
@tasdomas tasdomas force-pushed the update branch 3 times, most recently from e997687 to 6201c51 Compare October 11, 2022 09:08
@tasdomas
Copy link
Author

@yathomasi PTAL - doing some house keeping on iterative's example repos. Though it looks like there's a problem with the dvc data sources for this one.

@yathomasi
Copy link
Contributor

Though it looks like there's a problem with the dvc data sources for this one.

Yeah, I also got the following error on dvc pull. @tasdomas I am not the expert here 😃.

gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket., 401
ERROR: unexpected error - Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket., 401  

cc: @iterative/dvc


dvc get https://github.com/iterative/dataset-registry/ dvc-course/hymenoptera_data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this line is supposed to do as the downloaded folder is not used by the pipeline in dvc.yaml?


dvc get https://github.com/iterative/dataset-registry/ dvc-course/hymenoptera_data
dvc pull
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to run dvc pull you need to provide credentials for the DVC remote defined in https://github.com/iterative/stale-model-example/blob/main/.dvc/config .

I have no idea who has access to those buckets

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this workflow has ever run successfully as it is impossible for DVC to access the data given the current and previous setup

@pmrowla
Copy link

pmrowla commented Oct 11, 2022

gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket., 401
ERROR: unexpected error - Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket., 401  

There's no credentials configured for the default gcs remote.

['remote "gcpbikes"']
url = gs://updatedbikedata

can you double check that this bucket is actually publically accessible and that it's configured to allow public (anonymous) users to have the required permissions?

.github/workflows/cml.yaml Outdated Show resolved Hide resolved
fetch-depth: 0
- name: Fix git safe.directory in container
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required because of dvc get?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required for dvc exp run

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed for dvc repro too? Can't find any mention of this requirement in https://dvc.org/doc. Asking because we may need to update/fix https://github.com/iterative/cml_dvc_case/blob/master/.github/workflows/cml.yaml

Copy link

@pmrowla pmrowla Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dvc repro normally the answer is no, but if the user's pipeline is doing things with the local git repo (like pipeline stages that run dvc get) then it may be needed

Co-authored-by: Casper da Costa-Luis <casper.dcl@physics.org>
@tasdomas tasdomas closed this Dec 6, 2022
@casperdcl
Copy link

@tasdomas closed by accident?

@casperdcl casperdcl reopened this Dec 12, 2022
@tasdomas
Copy link
Author

@casperdcl I think this repo needs to be archived altogether.

@casperdcl
Copy link

/CC @jendefig ^

@jendefig
Copy link

/CC @jendefig ^

Thanks @casperdcl @tasdomas! @RCdeWit had put an admonition on the blog post. But it seems that this may be too broken to keep up, especially since the blog post starts from the repo. Looks like will need to pull this down and put it into the Content Creation Dashboard for an eventual redo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants