Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vertexai): multimodal with all modalities #4110

Merged
merged 9 commits into from May 14, 2024

Conversation

Deleplace
Copy link
Contributor

Go version of the Python sample at https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_all_modalities.py

Region tag generativeaionvertexai_gemini_all_modalities

@Deleplace Deleplace requested a review from a team as a code owner April 25, 2024 10:02
Copy link

snippet-bot bot commented Apr 25, 2024

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Apr 25, 2024
@Deleplace Deleplace enabled auto-merge (squash) April 25, 2024 13:02
// generateMultimodalContent generates a response into w, based upon the prompt
// and video provided.
// video and image are a Google Cloud Storage paths starting with "gs://"
func generateMultimodalContent(w io.Writer, prompt, video, image, projectID, location, modelName string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all these string arguments, it's easy for callers to get confused. I'd define and pass a struct in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it possible to have this take a model and just call GenerateModel, and to the client and model creation elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try with a struct.

All the options here are compromises between sometimes conflicting goals...

@Deleplace Deleplace requested a review from jba April 26, 2024 09:16
Copy link
Contributor

@grayside grayside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and suggestions. Questions feel like the answers could be blockers, but I'm open to LGTM with creation of follow-up bugs.

vertexai/multimodal-all/multimodalall.go Outdated Show resolved Hide resolved
vertexai/multimodal-all/multimodalall.go Outdated Show resolved Hide resolved

// generateMultimodalContent generates a response into w, based upon the multimodal prompt
// provided.
func generateMultimodalContent(w io.Writer, prompt multimodalPrompt, projectID, location, modelName string) error {
Copy link
Contributor

@grayside grayside May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the function name generateMultimodalContent is the same as generativeaionvertexai_gemini_video_with_audio from #4110.

question: Why do both samples exist? Would it make more sense to associate the same sample code with both region tags?
suggestion: If two samples are needed, rename one for clarity in the event someone is comparing them side-by-side.

This may benefit from feedback from @gericdong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need language parity for the doc pages, usually Python comes first and then other languages are added in tabs at the same place: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/send-multimodal-prompts#all_modalities

Yes I will rename the func, for clarity/disambiguation

Comment on lines +51 to +52
// The generated text may look like "The moment in the image happens at
// approximately 00:49 in the video. The context of the moment is..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Was this meant to be captured as a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is intended as a hint for the reader, about what kind of answer the model should provide.

Because of the very wide range of possible answers, this is not a test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error coverage here of "not an error" is light, but from the Python samples it looks like the standard is "non-error, non-empty response" which this is doing.

Deleplace and others added 3 commits May 14, 2024 19:31
@Deleplace Deleplace requested a review from grayside May 14, 2024 17:39
@Deleplace Deleplace merged commit f4bb2dc into GoogleCloudPlatform:main May 14, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants