You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?
The text was updated successfully, but these errors were encountered:
I was perusing the codebase looking for the same answer. Afaict when it calls gpt 4 vision (or whatever other model you happen to specify) it happens here:
. Notice how screenshot data is sent along with perhaps a text prompt. That being said, I dunno if it will be super simple to replace your own local vision model or not.
I perused source code, and found I have missunderstanding the role of vision-model. Vision-model in this project is not segment or locate all element (this achevied by JS script), but just check whether anything bad happen.
I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?
The text was updated successfully, but these errors were encountered: