Interactive automated browsing? (GPT-4V autopilot, simple scripting, etc) #131

walking-octopus · 2023-12-14T19:48:11Z

I wonder if adding some mechanisms for keeping a session alive and using the CLI to send commands to it can help automate simple actions or hack together entire agents.

Say I want to create a script to auto-order N of item X. Or maybe open a session, start a chat with GPT-4, ask it a question, and have it use that headless browser to take a screenshot, annotate clickable parts with IDs, send back the screenshot and have it choose the interaction, etc, until it got the information it wanted to continue the chat with the user.

I don't know if it's a little beyond the scope of the project, but I think it can definitely be neat.

nmstoker · 2024-05-18T10:42:59Z

Wouldn't this be better done directly with Playwright which underlies shot-scraper?

walking-octopus · 2024-05-18T10:50:22Z

Yes, revisiting this issue, I do see it is most definitely not in the spirit of doing one thing well... I'll close it now.

walking-octopus closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive automated browsing? (GPT-4V autopilot, simple scripting, etc) #131

Interactive automated browsing? (GPT-4V autopilot, simple scripting, etc) #131

walking-octopus commented Dec 14, 2023

nmstoker commented May 18, 2024

walking-octopus commented May 18, 2024

Interactive automated browsing? (GPT-4V autopilot, simple scripting, etc) #131

Interactive automated browsing? (GPT-4V autopilot, simple scripting, etc) #131

Comments

walking-octopus commented Dec 14, 2023

nmstoker commented May 18, 2024

walking-octopus commented May 18, 2024