Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Store Pipeline IR in database, not object storage #10509

Open
HumairAK opened this issue Feb 21, 2024 · 5 comments · May be fixed by #10790
Open

[feature] Store Pipeline IR in database, not object storage #10509

HumairAK opened this issue Feb 21, 2024 · 5 comments · May be fixed by #10790
Assignees

Comments

@HumairAK
Copy link
Contributor

HumairAK commented Feb 21, 2024

Feature Area

What feature would you like to see?

Currently the Object Store in KFP is largely used for artifacts, except for one outlier, which is the Pipeline IR.

I agree with the inline comments that this should be stored in the DB just like everything else that's not an artifact.

What is the use case or pain point?

Moving this to be stored in db, removes api server's dependency on the object store, and will make it fore future solutions for different artifact store implementations, without having to worry about api server.

Is there a workaround currently?

No

Anything else?

There's also archive logging, but this seems delegated to the backend engine (currently argo, but soon tekton as well), I'm not sure what to do about this one.


Love this idea? Give it a 👍.

@HumairAK
Copy link
Contributor Author

Related: #10510

@HumairAK
Copy link
Contributor Author

HumairAK commented Feb 28, 2024

follow up from Feb 02, 2024 call

@chensun suggests we might actually be storing pipeline ir in both db and object storage

It is not clear if the object store is being used any more for pipeline IR, we should confirm if that's indeed the case, if so we should remove this from apiserver and just rely on the db for this.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 29, 2024
@gmfrasca
Copy link
Member

gmfrasca commented May 3, 2024

bumping to unstale.

I've looked into this, at a decent glance it does appear that the Pipeline IR stored in Object Storage goes unused*, and I believe we can remove that copy of the definition since it creates duplicate sources-of-truth and just rely on the definition stored in DB.

A couple other findings:

  1. I did find one area of code that checks ObjStore for a PipelineVersion if it can't find it in the DB. Since it's a failsafe we can likely leave it, at least temporarily, even though data wouldn't be placed in those 'backup' destinations.
  2. It does appear that PipelineURI (which points to the pipeline definition location in the object store) needs to remain as it appears to be leveraged for the upload-from-web mechanism.

@gmfrasca
Copy link
Member

gmfrasca commented May 3, 2024

/assign @gmfrasca

@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 4, 2024
gmfrasca added a commit to gmfrasca/data-science-pipelines that referenced this issue May 6, 2024
…ponsibilies. Fixes kubeflow#10509

Signed-off-by: Giulio Frasca <gfrasca@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants