[HN Gopher] TaxyAI: Open-source browser automation with GPT-4 ___________________________________________________________________ TaxyAI: Open-source browser automation with GPT-4 Author : kcorbitt Score : 70 points Date : 2023-03-28 17:07 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | seydor wrote: | Wow this kind of thing makes plugins obsolete. I thought it would | take more than a week | krembanan wrote: | This is very cool, impressive work in 2 weeks! Each action seems | to have some delay after it, is there any reason for that? Is it | because you are streaming the OpenAI response and performing the | actions as they come? If not, I imagine streaming the query | response and executing each action as they emit would speed it up | quite a bit? | serjester wrote: | Why use GPT-4? The latency is significantly worse than 3.5 and | this seems simple enough that the performance delta is marginal. | If I was going for robustness, I probably wouldn't be using AI in | the first place. | | Edit: I noticed they support both but I'm assuming by the speed | all the demos are using 3.5? | Karrot_Kream wrote: | I find that GPT-4 works much better with ReAct than GPT-3 for | more complex tasks. | dopeboy wrote: | Anxiously been waiting for something like this - very cool. | | One use case I've had is that I hate spending time on my | linkedin, twitter, etc newsfeeds. But there are a handful of | people I care about and want to keep tabs on. | | Is there a way I could use TaxyAI to setup a role to monitor my | LinkedIn newsfeed and keep tabs on certain people + topics and | then email me a digest of that? | [deleted] | ashcorbitt22 wrote: | It was such an amazing, surreal experience using taxy to complete | a task! It made the task enjoyable and exciting! | WonderBuilder wrote: | This is amazing already! Very exciting. I'll make sure I follow | this project's progress. It also reminds me of Adept and their | goal with ACT-1. I still haven't seen their product launch, | though... | snihalani wrote: | TAKE. MY. MONEY. NOW. | Imnimo wrote: | It will be interesting to see whether this sort of approach works | better than something using GPT-4's vision capabilities. | Obviously websites are built to be easy to use visually rather | than easy to use via the DOM. On the other hand, it's much less | clear how to ground action proposals in the visual domain - how | do you ask GPT where on an image of the screen it wants to click? | dpflan wrote: | Curious: Can someone explain what they are excited to use this | for? Can someone provide a large scale use-case/scenario? | koch wrote: | Filling out job applications using my resume | kcorbitt wrote: | Hey HN! My brother Arctic_fly and I spent the last two weeks | since the GPT-4 launch building Taxy, an open source Chrome | extension that lets you automate arbitrary tasks in your browser | using GPT-4. You can see a few demos in the Github README, but | basically it works like this: | | 1. You open the extension and write the task you'd like done (eg. | "schedule a meeting with David tomorrow at 2"). | | 2. Taxy pulls the DOM of the current page, puts it through a | pipeline to remove all non-semantic information, hidden elements, | etc and sends it to GPT-4 along with your text instructions. | | 3. GPT-4 tries to figure out what action to take. In our prompt | we give it the option to either click an element or set an | input's value. We use the ReAct paradigm | (https://arxiv.org/abs/2210.03629) so it explains what it's | trying to do before taking an action, which both makes it more | accurate and helps with debugging. | | 4. Taxy parses GPT-4's response and performs the action requested | on the page. It then goes back to step (2) and asks GPT-4 for the | next action to take with the updated page DOM. It also sends the | list of actions already taken as part of the current task so | GPT-4 can detect if it's getting stuck in a loop and abort. :) | | 5. Once GPT-4 has decided the task is done or it can't make any | more progress, it responds with a special action indicating it's | done. | | Right now there are a lot of limitations, and this is more a | "research preview" than a finished product. That said, I've found | it surprisingly capable for a number of tasks, and I think it's | in a stable enough place we can share. Happy to answer any | questions! | koolba wrote: | Very cool. The "sending everything of relevance on the page to | OpenAI" is of course creepy. But that's table stakes for anything | like this until people can run them externally. | | This would make a cool, "magic box", at the top of a web page. | Type in what you want to do, it sends it to the server along with | the DOM extract (same site server). Server asks magical LLM how | to do it, and then spits it back to the client. So no plug-in | needed and data flow would pass through the source server. | Arctic_fly wrote: | Already useful across a variety of domains, and it's in early | days yet! | | Just yesterday I used to create a GitHub issue with minimal | effort. ___________________________________________________________________ (page generated 2023-03-28 23:00 UTC)