New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vertexai): multimodal with all modalities #4110
feat(vertexai): multimodal with all modalities #4110
Conversation
Here is the summary of changes. You are about to add 1 region tag.
This comment is generated by snippet-bot.
|
// generateMultimodalContent generates a response into w, based upon the prompt | ||
// and video provided. | ||
// video and image are a Google Cloud Storage paths starting with "gs://" | ||
func generateMultimodalContent(w io.Writer, prompt, video, image, projectID, location, modelName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all these string arguments, it's easy for callers to get confused. I'd define and pass a struct in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it possible to have this take a model and just call GenerateModel, and to the client and model creation elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try with a struct.
All the options here are compromises between sometimes conflicting goals...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some questions and suggestions. Questions feel like the answers could be blockers, but I'm open to LGTM with creation of follow-up bugs.
|
||
// generateMultimodalContent generates a response into w, based upon the multimodal prompt | ||
// provided. | ||
func generateMultimodalContent(w io.Writer, prompt multimodalPrompt, projectID, location, modelName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the function name generateMultimodalContent
is the same as generativeaionvertexai_gemini_video_with_audio
from #4110.
question: Why do both samples exist? Would it make more sense to associate the same sample code with both region tags?
suggestion: If two samples are needed, rename one for clarity in the event someone is comparing them side-by-side.
This may benefit from feedback from @gericdong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need language parity for the doc pages, usually Python comes first and then other languages are added in tabs at the same place: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/send-multimodal-prompts#all_modalities
Yes I will rename the func, for clarity/disambiguation
// The generated text may look like "The moment in the image happens at | ||
// approximately 00:49 in the video. The context of the moment is..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Was this meant to be captured as a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is intended as a hint for the reader, about what kind of answer the model should provide.
Because of the very wide range of possible answers, this is not a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error coverage here of "not an error" is light, but from the Python samples it looks like the standard is "non-error, non-empty response" which this is doing.
Co-authored-by: Adam Ross <grayside@gmail.com>
Co-authored-by: Adam Ross <grayside@gmail.com>
Go version of the Python sample at https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_all_modalities.py
Region tag
generativeaionvertexai_gemini_all_modalities