[HN Gopher] TaxyAI: Open-source browser automation with GPT-4
       ___________________________________________________________________
        
       TaxyAI: Open-source browser automation with GPT-4
        
       Author : kcorbitt
       Score  : 70 points
       Date   : 2023-03-28 17:07 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | seydor wrote:
       | Wow this kind of thing makes plugins obsolete. I thought it would
       | take more than a week
        
       | krembanan wrote:
       | This is very cool, impressive work in 2 weeks! Each action seems
       | to have some delay after it, is there any reason for that? Is it
       | because you are streaming the OpenAI response and performing the
       | actions as they come? If not, I imagine streaming the query
       | response and executing each action as they emit would speed it up
       | quite a bit?
        
       | serjester wrote:
       | Why use GPT-4? The latency is significantly worse than 3.5 and
       | this seems simple enough that the performance delta is marginal.
       | If I was going for robustness, I probably wouldn't be using AI in
       | the first place.
       | 
       | Edit: I noticed they support both but I'm assuming by the speed
       | all the demos are using 3.5?
        
         | Karrot_Kream wrote:
         | I find that GPT-4 works much better with ReAct than GPT-3 for
         | more complex tasks.
        
       | dopeboy wrote:
       | Anxiously been waiting for something like this - very cool.
       | 
       | One use case I've had is that I hate spending time on my
       | linkedin, twitter, etc newsfeeds. But there are a handful of
       | people I care about and want to keep tabs on.
       | 
       | Is there a way I could use TaxyAI to setup a role to monitor my
       | LinkedIn newsfeed and keep tabs on certain people + topics and
       | then email me a digest of that?
        
       | [deleted]
        
       | ashcorbitt22 wrote:
       | It was such an amazing, surreal experience using taxy to complete
       | a task! It made the task enjoyable and exciting!
        
       | WonderBuilder wrote:
       | This is amazing already! Very exciting. I'll make sure I follow
       | this project's progress. It also reminds me of Adept and their
       | goal with ACT-1. I still haven't seen their product launch,
       | though...
        
       | snihalani wrote:
       | TAKE. MY. MONEY. NOW.
        
       | Imnimo wrote:
       | It will be interesting to see whether this sort of approach works
       | better than something using GPT-4's vision capabilities.
       | Obviously websites are built to be easy to use visually rather
       | than easy to use via the DOM. On the other hand, it's much less
       | clear how to ground action proposals in the visual domain - how
       | do you ask GPT where on an image of the screen it wants to click?
        
       | dpflan wrote:
       | Curious: Can someone explain what they are excited to use this
       | for? Can someone provide a large scale use-case/scenario?
        
         | koch wrote:
         | Filling out job applications using my resume
        
       | kcorbitt wrote:
       | Hey HN! My brother Arctic_fly and I spent the last two weeks
       | since the GPT-4 launch building Taxy, an open source Chrome
       | extension that lets you automate arbitrary tasks in your browser
       | using GPT-4. You can see a few demos in the Github README, but
       | basically it works like this:
       | 
       | 1. You open the extension and write the task you'd like done (eg.
       | "schedule a meeting with David tomorrow at 2").
       | 
       | 2. Taxy pulls the DOM of the current page, puts it through a
       | pipeline to remove all non-semantic information, hidden elements,
       | etc and sends it to GPT-4 along with your text instructions.
       | 
       | 3. GPT-4 tries to figure out what action to take. In our prompt
       | we give it the option to either click an element or set an
       | input's value. We use the ReAct paradigm
       | (https://arxiv.org/abs/2210.03629) so it explains what it's
       | trying to do before taking an action, which both makes it more
       | accurate and helps with debugging.
       | 
       | 4. Taxy parses GPT-4's response and performs the action requested
       | on the page. It then goes back to step (2) and asks GPT-4 for the
       | next action to take with the updated page DOM. It also sends the
       | list of actions already taken as part of the current task so
       | GPT-4 can detect if it's getting stuck in a loop and abort. :)
       | 
       | 5. Once GPT-4 has decided the task is done or it can't make any
       | more progress, it responds with a special action indicating it's
       | done.
       | 
       | Right now there are a lot of limitations, and this is more a
       | "research preview" than a finished product. That said, I've found
       | it surprisingly capable for a number of tasks, and I think it's
       | in a stable enough place we can share. Happy to answer any
       | questions!
        
       | koolba wrote:
       | Very cool. The "sending everything of relevance on the page to
       | OpenAI" is of course creepy. But that's table stakes for anything
       | like this until people can run them externally.
       | 
       | This would make a cool, "magic box", at the top of a web page.
       | Type in what you want to do, it sends it to the server along with
       | the DOM extract (same site server). Server asks magical LLM how
       | to do it, and then spits it back to the client. So no plug-in
       | needed and data flow would pass through the source server.
        
       | Arctic_fly wrote:
       | Already useful across a variety of domains, and it's in early
       | days yet!
       | 
       | Just yesterday I used to create a GitHub issue with minimal
       | effort.
        
       ___________________________________________________________________
       (page generated 2023-03-28 23:00 UTC)