Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): Add UI support for object store customization and prefixes #10787

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

HumairAK
Copy link
Contributor

@HumairAK HumairAK commented May 3, 2024

Description of your changes:
Frontend Follow up to: #10625
Resolves the object storage portion of: #9689

Changes to backend:

  • For each artifact, add an artifact property in mlmd that keeps track of how to track down where it is located (i.e. which provider, where are the credentials, etc.). This is the same information that is stored as a context property, introduced in Recursion support issue #1065, but because each artifact is not tied to a specific context (import or cached artifacts for example), we propogate these to their artifact properties (when newly created within a context).

  • We assume import artifacts will leverage the environment for it's credentials (e.g. via IRSA), this means both the user executor pod, and kfp ui will need these creds in their env (this is true for any situation where we use fromEnv: true). A future change could also add support for matching import artifacts with prefixes defined in kfp-launcher config.

Additionally, this PR adds the UI support for changes introduced in #10625. Users will now be able to add to their kfp-launcher configmaps the locations of their s3 buckets based on the provided pipeline root, either via the default pipeline root, or one specified during pipeline execution. When using "fromEnv: true" the onus is on the user, as well as admin (that manages kfp) to ensure that the user pod and kfp ui have these credentials available. Following updates to the UI went in here:

Changes to client code:

  • fetch artifact property "store_session_info", from mlmd (now accessible via LinkedArtifact), and including this as part of the retrieval request for artifacts against the server
  • in some cases namespaces were not being added to the request for artifact retrieval, this has been amended

Changes to front end server:

  • if a store session info is detected, a long with a namespace, and we detect that fromenv=false then we require the session info to include the location of the credentials (k8s secret) and utilize those. If namespace is not provided, we reject the request.
  • I also removed some outdated legacy code related to how we used to retrieve aws credentials from the environment
  • added tests, and fixed some broken tests (in RunLists)

Notes:

  • The credential secrets are required to be in the user profile namespace, though this seems to have already been the case before [1], [2], [3]

  • Because we don't do namespace verification server side, there continues to be a security concern where users can dupe the namespace client side and access object stores from another profile namespace. Because we are storing the provider's creds location in mlmd and querying it from client side, client can technically dupe this, and attempt to read object stores if their secrets are stored else where on the cluster. However, once we resolve [backend] Security exploit in mlpipeline-UI #9889 this should no longer be an issue, because any attempt to access resources outside the user's profile namespace should be rejected.

  • With some additional changes, we can allow users to set another default object store, and resolve [feature] Allow manifests to set external Object Store to deploy Kubeflow Pipelines #10651. The additional change will require kfp to read a default kfp launcher from the kubeflow namespace (or wherever kfp is deployed).

Once accepted, I will follow up with documentation on this.

configmap
apiVersion: v1
data:
  defaultPipelineRoot: s3://mlpipeline
  providers: |-
    s3:
      default:
        endpoint: http://your-other-minio-endpoint.com
        disableSSL: true
        region: minio
        credentials:
          fromEnv: false
          secretRef:
            secretName: ds-pipeline-s3-sample
            accessKeyKey: accesskey
            secretKeyKey: secretkey
      overrides:
        # use pipeline root: s3://your-bucket/some/s3/path/a/c
        - bucketName: your-bucket
          keyPrefix: some/s3/path/a/c
          endpoint: s3.amazonaws.com
          region: us-east-2
          credentials:
            fromEnv: false
            secretRef:
              secretName: aws-s3-creds
              accessKeyKey: AWS_ACCESS_KEY_ID
              secretKeyKey: AWS_SECRET_ACCESS_KEY
        # use pipeline root: s3://test-scheme/minioAsS3/test/route
        - endpoint: http://your-other-minio-endpoint.com
          region: minio
          disableSSL: true
          bucketName: test-scheme
          keyPrefix: minioAsS3/test/route
          credentials:
            fromEnv: false
            secretRef:
              secretName: minio-creds
              accessKeyKey: accesskey
              secretKeyKey: secretkey
kind: ConfigMap
metadata:
  name: kfp-launcher
  • run a pipeline, select a pipeline root that matches one from the kfp-launcher config (ensure this config is in the user's namespace, as well as the required credentials)
  • confirm artifacts are stored in the matching bucket and path
  • verify that the artifacts are previewed correctly in the kfp UI
    • you want to verify artifacts in artifact info, metrics visualizations, and run compares (try different visualizations)

Testing Instructions

To test this PR you will need to:

  1. Build the images
  • Build the KFP front end image
  • Build the driver image
  • Build the launcher image

You can also use these images here:

  • quay.io/hukhan/kfp-frontend:issue-10787
  • quay.io/hukhan/kfp-launcher:issue-10787
  • quay.io/hukhan/kfp-driver:issue-10787
  1. Configure kfp deployment to use these images

Frontend
quay.io/hukhan/kfp-frontend:issue-10787
substitute this in the frontend deployment:

Driver/launcher:
In the KFP API Server deployment, add the environment variables:
V2_LAUNCHER_IMAGE=quay.io/hukhan/kfp-launcher:issue-10787
V2_DRIVER_IMAGE=quay.io/hukhan/kfp-driver:issue-10787

  1. Deploy KFP, I recommend using a full kfp deployment so you can test with a multi-user setup.

  2. Create a kfp-launcher config in the user namespace that lists your S3/GCS/Minio credentials, for the formatting see the instructions in this PR..

  • test with gcs token creds
  • test with aws s3 creds provided via secret
  • test with aws irsa (fromEnv in kfp launcher config)
  • test with minio
  1. Any secret you mention in kfp-launcher also needs to be added to the user's profile namespace.

  2. Upload a pipeline that outputs artifacts

Some pipleines to test with:

  1. Run a pipeline, select a pipeline root that matches one from the kfp-launcher config (once again the kfp-launcher config and the secret to utilize should be in the user's profile namespace)

  2. test their visuals in the UI, key areas to test:

  • Run details Artifact Info tab
  • Run details visualization tab
  • Compare Runs, test multiple visualizations
  • Confirm s3 artifacts are stored in expected locations within the respective object stores

Checklist:

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HumairAK
Copy link
Contributor Author

HumairAK commented May 3, 2024

I'll take a look at the failing tests
As well as follow up with a demo, and some images folks can test this pr with (currently blocked on building the ui image due to this issue).

@HumairAK
Copy link
Contributor Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

1 similar comment
@HumairAK
Copy link
Contributor Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

@HumairAK
Copy link
Contributor Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

Flakey tests seem to be failing I think, retrying a couple of times to confirm (I'm unable to reproduce this in my environment).

Edit: yes that seemed to be the case, they are passing now.

@juliusvonkohout
Copy link
Member

@HumairAK feel free to ping me on slack or join the security meeting if you are interested in fixing exploit mentioned above.

@juliusvonkohout
Copy link
Member

And I am also investigating Apache ozone as a minio replacement.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Note that for runlist, the mock error response only returns one valid
run list with error set, the other is undefined, so to support multiple
runIds the mock error response will need to be adjusted.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
..and clean up imports.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
@HumairAK
Copy link
Contributor Author

@juliusvonkohout yes this is something we 100% need to fix

I'm currently limited on bandwidth, but if in a couple of weeks if this is not picked up by someone else I can take it on

Copy link

@HumairAK: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-upgrade-test 8a59da1 link false /test kubeflow-pipeline-upgrade-test
kubeflow-pipeline-e2e-test 8a59da1 link false /test kubeflow-pipeline-e2e-test
kubeflow-pipelines-samples-v2 8a59da1 link false /test kubeflow-pipelines-samples-v2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@HumairAK
Copy link
Contributor Author

Rebased, and I've updated with detailed testing instructions and also provided already built images from the pr images in order to help speed up review on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants