How to use local vision model to replace gpt-4 turbo? #180

DDDog-WANG · 2024-04-12T03:25:27Z

I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?

djkramnik · 2024-04-30T01:04:17Z

I was perusing the codebase looking for the same answer. Afaict when it calls gpt 4 vision (or whatever other model you happen to specify) it happens here:

skyvern/skyvern/forge/sdk/api/llm/api_handler_factory.py

Line 98 in 31e1470

    
           response = await router.acompletion(model=main_model_group, messages=messages, **parameters)

. Notice how screenshot data is sent along with perhaps a text prompt. That being said, I dunno if it will be super simple to replace your own local vision model or not.

DDDog-WANG · 2024-04-30T01:09:12Z

I perused source code, and found I have missunderstanding the role of vision-model. Vision-model in this project is not segment or locate all element (this achevied by JS script), but just check whether anything bad happen.

suchintan · 2024-05-03T08:16:48Z

https://github.com/Skyvern-AI/skyvern/pull/251/files

New models can be added similar to the approach here - you could try it out with hosted ollama models once #242 is implemented

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use local vision model to replace gpt-4 turbo? #180

How to use local vision model to replace gpt-4 turbo? #180

DDDog-WANG commented Apr 12, 2024

djkramnik commented Apr 30, 2024

DDDog-WANG commented Apr 30, 2024

suchintan commented May 3, 2024 •

edited

How to use local vision model to replace gpt-4 turbo? #180

How to use local vision model to replace gpt-4 turbo? #180

Comments

DDDog-WANG commented Apr 12, 2024

djkramnik commented Apr 30, 2024

DDDog-WANG commented Apr 30, 2024

suchintan commented May 3, 2024 • edited

suchintan commented May 3, 2024 •

edited