[HN Gopher] Show HN: Open-source macOS AI copilot (using vision ...
       ___________________________________________________________________
        
       Show HN: Open-source macOS AI copilot (using vision and voice)
        
       Heeey! I built a macOS copilot that has been useful to me, so I
       open sourced it in case others would find it useful too.  It's
       pretty simple:  - Use a keyboard shortcut to take a screenshot of
       your active macOS window and start recording the microphone.  -
       Speak your question, then press the keyboard shortcut again to send
       your question + screenshot off to OpenAI Vision  - The Vision
       response is presented in-context/overlayed over the active window,
       and spoken to you as audio.  - The app keeps running in the
       background, only taking a screenshot/listening when activated by
       keyboard shortcut.  It's built with NodeJS/Electron, and uses
       OpenAI Whisper, Vision and TTS APIs under the hood (BYO API key).
       There's a simple demo and a longer walk-through in the GH readme
       https://github.com/elfvingralf/macOSpilot-ai-assistant, and I also
       posted a different demo on Twitter:
       https://twitter.com/ralfelfving/status/1732044723630805212
        
       Author : ralfelfving
       Score  : 333 points
       Date   : 2023-12-12 13:17 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | pyryt wrote:
       | Do you have use case demo videos somewhere? Would be great to see
       | this in action
        
         | ralfelfving wrote:
         | There's one at 00:30 in this YouTube video (timestamped the
         | link): https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s
        
       | faceless3 wrote:
       | Wrote some similar scripts for my Linux setup, that I bind with
       | XFCE keyboard shortcuts:
       | 
       | https://github.com/samoylenkodmitry/Linux-AI-Assistant-scrip...
       | 
       | F1 - ask ChatGPT API about current clipboard content F5 - same,
       | but opens editor before asking num+ - starts/stops recording
       | microphone, then passes to Whisper (locally installed), copies to
       | clipboard
       | 
       | I find myself rarely using them however.
        
         | ralfelfving wrote:
         | Nice!
        
       | ProfessorZoom wrote:
       | e-e-e-electron... for this..
        
         | ralfelfving wrote:
         | I don't know man. I'm new to development, it's what I chose,
         | probably don't know any better. Tell me what you would have
         | chosen instead?
        
           | xNeil wrote:
           | electron's a really nice option, specially for people that
           | aren't interested in porting their apps or spending too much
           | time on development
           | 
           | this is a macOS specific app it seems - if you want better
           | performance and more integration with the OS, i'd recommend
           | using swift
        
             | ralfelfving wrote:
             | Time to learn learn Swift in the next project then! Thank
             | you for the deets.
        
               | Filligree wrote:
               | The good news is you already have a tool to help you with
               | inevitable XCode issues. _grin_
        
           | lolinder wrote:
           | Don't mind them--there's a certain subset of HN that is upset
           | that web tech has taken over the world. There are some
           | legitimate gripes about the performance of some electron
           | apps, but with some people those have turned into compulsive
           | shallow dismissals of any web app that they believe could
           | have been native.
           | 
           | There's nothing wrong with using web tech to build things!
           | It's often easier, the documentation is more comprehensive,
           | and if you ever wanted to make it cross-platform election
           | makes it trivial.
           | 
           | If you were working for a company it might be worth
           | considering the trade-offs--do you need to support Macs with
           | less RAM?--but for a side project that's for yourself and
           | maybe some friends, just do what works for you!
        
             | ralfelfving wrote:
             | Thank you for the explanation! At the end of the day, I'm a
             | newbie and I'm in it to learn something new with each
             | project. Next time I'll probably try my hand at a different
             | framework.
        
               | millzlane wrote:
               | I just watched a video about building a startup. One of
               | the key points was to use what you know to get an MVP.
               | Don't fret over which language or library to use (unless
               | the goal is to learn a new framework). Just get building.
               | I may not be a pro dev, but there is one thing I have
               | learned over the years from hanging out amongst all of
               | you. And that is, it doesnt matter if you are using emacs
               | or vim, tabs vs spaces, or Java vs Python, the end
               | product after all is what matters at the end of the day.
               | Code can always be refactored.
               | 
               | Good luck in your development journey.
        
           | jdamon96 wrote:
           | ignore the naysayers; nice job building out your idea
        
             | ralfelfving wrote:
             | Thank you! I got pretty thick skin, but always a bit of
             | insecurity involved in doing something the first time --
             | first public GH repo and Show HN :D
        
           | airstrike wrote:
           | I think the parent comment is a shallow dismissal, but since
           | you're asking, I would have built in SwiftUI
        
           | guytv wrote:
           | What's important is to get an product out there. Nobody cares
           | what stack you use. just us geeks. don't get discouraged. you
           | did well :)
        
           | programmarchy wrote:
           | My two cents: I think you made a good, practical choice. If
           | you're happy with Electron, I'd say stick with it, especially
           | if you have cross-platform plans in the future.
           | 
           | If you want to niche down into a more macOS specific app, you
           | could learn AppKit and SwiftUI and build a fully native macOS
           | app.
           | 
           | If you want to stay cross-platform, but you're not happy with
           | Electron, then it might be worth checking out Tauri. It
           | provides a JavaScript-based API to display native UI
           | components, but without packaging a V8 runtime with your app
           | bundle. Instead, it uses a native JavaScript host e.g. on
           | macOS it uses WebKit, so it significantly reduces the
           | download size of your app.
           | 
           | In terms of developing this into a product, on one hand it
           | seems like deep integration with the host OS is the best way
           | to build a "moat", but then again, Apple could release their
           | own version and quickly blow a product like that out of the
           | water.
        
         | atraac wrote:
         | Ah yes, cause what's better than building a real, working MVP?
         | Learning Rust for half a year just so you can 'optimize' the f
         | out of an app that does two REST calls.
        
           | wtallis wrote:
           | To be fair, this _does_ sound like the kind of app that would
           | benefit from being able to launch instantly, and potentially
           | registering with the OS as a service in a way that cross-
           | platform frameworks like Electron cannot easily accommodate.
           | But Rust would not be the easiest choice to avoid those
           | limitations.
        
       | havkom wrote:
       | A lot of negative comments here. However, I liked it!
       | 
       | Perfect Show HN and a great start of a product if the author
       | wants to.
        
         | ralfelfving wrote:
         | Thank you, it's my first GH project & Show HN.. and.. yeah..
         | learning here :D
        
           | jonplackett wrote:
           | Also think this is fun.
           | 
           | In general I'm pretty excited about LLM as interface and what
           | that is going to mean going forward.
           | 
           | I think our kids are going to think mice and keyboards are
           | hilariously primitive.
        
             | ralfelfving wrote:
             | Before we know it, even voice might be obsolete when we can
             | just think :) But maybe at that point, even thinking
             | becomes obsolete because the AI:s are doing all the
             | thinking for us?!
        
       | swiftcoder wrote:
       | Worth mentioning that if you are in a corporate environment,
       | running a service that sends arbitrary desktop screenshots to a
       | 3rd party cloud service is going to run afoul of pretty much
       | every security and regulatory control in existence
        
         | thelittleone wrote:
         | The control for that is endpoints should be locked down to
         | prevent install of non approved apps. Any org under regulatory
         | controls would have some variation of that. Safe to assume an
         | orgs users are stupid or nefarious and build defences
         | accordingly.
        
         | ralfelfving wrote:
         | I assume that anyone capable of cloning the app, starting the
         | it on their machine and obtaining + adding an OpenAI API key
         | understands that some data is being sent offsite -- and will be
         | aware of their corporate policies. I think that's a fair
         | assumption.
        
           | greenie_beans wrote:
           | that's a fair assumption. feels like swiftcoder is just
           | trying to gotcha
        
         | brookst wrote:
         | True, but also true of other screen capture utilities that send
         | data to the cloud. Your PSA is true, but hardly unique to this
         | little utility. And probably not surprising to the intended
         | audience.
        
         | isoprophlex wrote:
         | You're telling me... the cloud... is other people's computers?!
        
         | abrichr wrote:
         | This is exactly why in https://github.com/OpenAdaptAI/OpenAdapt
         | we have implemented three separate PII scrubbing providers.
         | 
         | Congrats to the op on shipping!
        
       | jondwillis wrote:
       | You should add an option for streaming text as the response
       | instead of TTS. And also maybe text in place of the voice command
       | as well. I have been tire-kicking a similar kind of copilot for
       | awhile, hit me up on discord @jonwilldoit
        
         | ralfelfving wrote:
         | There's definitely some improvements to shuttling the data
         | between interface<->API, all that was done in a few hours on
         | day 1 and there's a few things I decided to fix later.
         | 
         | I prefer speaking over typing, and I sit alone, so probably
         | won't add a text input anytime soon. But I'll hit you up on
         | Discord in a bit and share notes.
        
           | jondwillis wrote:
           | Yeah, just some features I could see adding value and not
           | being too hard to implement :)
        
         | tomComb wrote:
         | > text in place of the voice command as well
         | 
         | That would be great for people with Mac mini who don't have a
         | mic.
        
           | ralfelfving wrote:
           | Hmmm... what if I added functionality that uses the webcam to
           | read your lips?
           | 
           | Just kidding. Text seem to be the most requested addition,
           | and it wasn't on my own list :) Will see if I add it, should
           | be fairly easy to make it configurable and render a text
           | input window with a button instead of triggering the
           | microphone.
           | 
           | Won't make any promises, but might do it.
        
       | amelius wrote:
       | Please include "OpenAI-based" in the title. (Now many people here
       | are disappointed).
        
         | ralfelfving wrote:
         | Fair point, didn't think it would matter so much. Can't edit it
         | any more, otherwise I'd change it to add OpenAI to the title!
        
       | ukuina wrote:
       | This is very cool! Thank you for working on it and sharing it
       | with us.
        
         | ralfelfving wrote:
         | Thank you for checking it out! <3
        
       | netika wrote:
       | Such a shame it uses Vision API, i.e. it can not be replaced by
       | some random self-hosted LLM.
        
         | ralfelfving wrote:
         | It can be replaced with a self-hosted LLM, simply change the
         | code where the Vision API is being called. That's true for all
         | of the API calls in the app.
        
         | freedomben wrote:
         | Actually it's open source, so it _can_ be replaced by some
         | random self-hosted LLM
        
           | iandanforth wrote:
           | For example, one of these:
           | 
           | https://opencompass.org.cn/leaderboard-multimodal
        
       | jackculpan wrote:
       | This is awesome
        
         | ralfelfving wrote:
         | Thanks, glad you liked it!
        
       | knowsuchagency wrote:
       | This is brilliant!
        
         | ralfelfving wrote:
         | Glad you liked it!
        
       | satchlj wrote:
       | It's not working for me, I get a "Too many requests" http error
        
         | ralfelfving wrote:
         | Hmm.. OpenAI bunch a few things into some error. Iirc this
         | could be because you're out of credits / don't have a valid
         | payment method on file, but it could also be that you're
         | hitting rate limits. The Vision API could be the culprit, while
         | in beta you can only call it X amount of times per day (X
         | varies by account).
         | 
         | Make the console.log:s for the three API calls a bit more
         | verbose to find out which call is causing this, and if there's
         | more info in the error body.
        
       | I_am_tiberius wrote:
       | I would love to have something like this but using an open source
       | model and without any network requests.
        
         | trenchgun wrote:
         | Probably in three months, approximately.
        
         | dave1010uk wrote:
         | LLaVA, Whisper and a few bash scripts should be able to do it.
         | I don't know how helpful the model is with screenshots though.
         | 
         | 1. Download LLaVA from https://github.com/Mozilla-
         | Ocho/llamafile
         | 
         | 2. Run Whisper locally for speech to text
         | 
         | 3. Save screenshots and send to the model, with a script like
         | https://til.dave.engineer/openai/gpt-4-vision/
        
       | behat wrote:
       | Nice! Built something similar earlier to get fixes from chatgpt
       | for error messages on screen. No voice input because I don't like
       | speaking. My approach then was Apple Computer Vision Kit for OCR
       | + chatgpt. This reminds me to test out OpenAI's Vision API as a
       | replacement.
       | 
       | Thanks for sharing!
        
         | ralfelfving wrote:
         | Thanks! You could probably grab what I have, and tweak it a
         | bit. Try checking if you can screenshot just the error message
         | and check what the value of the window.owner is. It should be
         | the name of the application, so you could just append `Can you
         | help me with this error I get in ${window.owner}?` to the
         | Vision API call.
        
       | thomashop wrote:
       | Just used it with the digital audio workstation Ableton Live. It
       | is amazing! Its tips were spot-on.
       | 
       | I can see how much time it will save me when I'm working with a
       | software or domain I don't know very well.
       | 
       | Here is the video of my interaction:
       | https://www.youtube.com/watch?v=ikVdjom5t0E&feature=youtu.be
       | 
       | Weird these negative comments. Did people actually try it?
        
         | pelorat wrote:
         | I mean it does send a screenshot of your screen off to a 3rd
         | party, and that screenshot will most likely be used in future
         | AI training sets.
         | 
         | So... beware when you use it.
        
           | thomashop wrote:
           | Beware of it seeing a screenshot of my music set? OpenAI will
           | start copying my song structure?
           | 
           | You can turn it on and off. Not necessary to turn it on when
           | editing confidential documents.
           | 
           | You never enable screen-sharing in videoconferencing
           | software?
        
             | aaronscott wrote:
             | I completely agree. A huge business with a singular focus
             | isn't going to pivot into the music business (or any of the
             | myriad use cases the general public throws at it). And if
             | they did use someone's info, it's more likely an unethical
             | employee than a genuine business tactic.
             | 
             | Besides, the parent program uses the API, which allows
             | opting out of training or retaining that data.
        
               | mecsred wrote:
               | Yes this makes perfect sense. As we know, businesses
               | definitely do not treat data as a commodity and engage in
               | selling/buying data sets on the open market as a "genuine
               | business tactic". Therefore, since the company in
               | question doesn't have a clear business case for data
               | collection _currently_ , we can be sure this data will
               | never be used against our interests by any company.
        
           | zwily wrote:
           | OpenAI claims that data sent via the API (as opposed to
           | chatGPT) will not be used in training. Whether or not you
           | believe them is a separate question, but that's the claim.
        
         | ralfelfving wrote:
         | So glad when I saw this, thanks for sharing this! It was
         | exactly music production in Ableton was the spark that lit this
         | idea in my head the other week. I tried to explain to a friend
         | that don't use GPT much that with Vision, you can speed up your
         | music production and learn how to use advanced tools like
         | Ableton more quickly. He didn't believe me. So I grabbed a
         | Ableton screenshot off Google and used ChatGPT -- then I felt
         | there had to be a better way, I realized that I have my own
         | use-cases, and it all evolved into this.
         | 
         | I sent him your video, hopefully he'll believe me now :)
        
           | thomashop wrote:
           | You may be interested in two proof of concepts I've been
           | working on. I work with generative AI and music at a company.
           | 
           | MidiJourney: ChatGPT integrated into Ableton Live to create
           | MIDI clips from prompts. https://github.com/korus-
           | labs/MIDIjourney
           | 
           | I have some work on a branch that makes ChatGPT a lot better
           | at generating symbolic music (a better prompt and music
           | notation).
           | 
           | LayerMosaic allows you to allow MusicGen text-to-music loops
           | with the music library of our company.
           | https://layermosaic.pixelynx-ai.com/
        
             | ralfelfving wrote:
             | Oooh. Yes, very interested in MusicGen. I played with
             | MusicGen for the first time the other week and created a
             | little script that uses GPT to create the prompt and params
             | which is stored to a text file along with the output. Let
             | it loop for a few hours to get a few 100 output files that
             | allowed me to learn a bit more about what kind of prompts
             | that gave reasonable output (it was all bad, lol!)
        
             | ralfelfving wrote:
             | My brain read midjourney until I clicked on the GH link.
             | What a great name, MIDIjourney!
        
             | ralfelfving wrote:
             | Oh LayerMosaic is dope. I'm not entirely sure how it works,
             | but the sounds coming out of it is good -- so you have me
             | intrigued! Can I read more about it somewhere, I might have
             | a crazy idea I'd like to use this for.
        
         | mikey_p wrote:
         | Is it just me or is it incredibly useless?
         | 
         | "Here's a list of effects. Here's a list of things that make a
         | song. Is it good? Yes. What about my drum effects? Yes here's
         | the name of the two effects you are using on your drum channel"
         | 
         | None of this is really helpful and I can't get over how much it
         | sounds like Eliza.
        
           | thomashop wrote:
           | I made that video right at the start but since then I've
           | asked it for example what kind of compression parameters
           | would fit with a certain track and it could explain to me how
           | to find an expert function which I would have had to consult
           | a manual for otherwise.
        
       | e28eta wrote:
       | Did you find that calling it "OSX" in the prompt worked better
       | than macOS? Or was that just an early choice that you didn't
       | spend much time on?
       | 
       | I was skimming through the video you posted, and was curious.
       | 
       | https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s
       | 
       | code link: https://github.com/elfvingralf/macOSpilot-ai-
       | assistant/blob/...
        
         | ralfelfving wrote:
         | No, this is an oversight by me. To be completely honest, up
         | until the other day I thought it was still called OSX. So the
         | project was literally called cOSXpilot, but at some point I
         | double checked and realize it's been called macOS for many
         | years. Updated the project, but apparently not the code :)
         | 
         | I suspect OSX vs macOS has marginal impact on the outcome :)
        
           | e28eta wrote:
           | Haha, makes perfect sense, thanks for the reply!
        
         | hot_gril wrote:
         | Heh. I remember calling it Mac OS back in the day and getting
         | corrected that it's actually OS X, as in "OS ten," and hasn't
         | been called Mac OS since Mac OS 9. Glad Apple finally saw it my
         | way (except it's cased macOS).
        
       | qainsights wrote:
       | Great. I created `kel` for terminal users. Please check it out at
       | https://github.com/qainsights/kel
        
         | causal wrote:
         | Chatblade is another good one:
         | https://github.com/npiv/chatblade
        
         | dave1010uk wrote:
         | Very cool! Have you had much luck with Llama models?
         | 
         | I made Clipea, which is similar but has special integration
         | with zsh.
         | 
         | https://github.com/dave1010/clipea
        
       | Jayakumark wrote:
       | Was following these two projects by someuser on Github which
       | makes similar things possible with Local models. Sending
       | screenshot to openai is expensive , if done every few seconds or
       | minutes.
       | 
       | https://github.com/KoljaB/LocalAIVoiceChat
       | 
       | While the below one uses openai - don't see why it can't be
       | replaced with above project and local mode.
       | 
       | https://github.com/KoljaB/Linguflex
        
         | ralfelfving wrote:
         | Nice! Although the productivity increase from being able to
         | resolve blockers more quickly adds up to a lot (at least for
         | me), local models would be more cost effective -- and probably
         | feel less iffy for many people.
         | 
         | I went for OpenAI because I wanted to build something quickly,
         | but you should be able to replace the external API calls with
         | calls to your internal models.
        
       | stephenblum wrote:
       | You made real-life Clippy! for the Mac. This would be great to be
       | for other mac apps too. Add context of current running apps.
        
         | ralfelfving wrote:
         | It should work for any macOS app. It just takes a screenshot of
         | the currently active window, you can even append the
         | application name if you'd like.
        
       | lordswork wrote:
       | This looks very cool. Does anyone know of something similar for
       | Windows? (or does OP intend to extend support to Windows?)
        
         | ralfelfving wrote:
         | Hey, OP here. I don't have a Windows machine so have not been
         | able to confirm if it works, and probably won't be able to
         | develop/test for it either -- sorry! :/
         | 
         | I suspect that you should be able to take my code and only
         | require a few tweaks to make it work tho, shouldn't be much
         | about it that is macOS only.
        
           | coolspot wrote:
           | For testing/development, you can download a free Windows VM
           | here: https://developer.microsoft.com/en-
           | us/windows/downloads/virt...
        
       | poorman wrote:
       | Currently imagining my productivity while waiting 10 seconds for
       | the results of the `ls` command.
        
         | ralfelfving wrote:
         | It's a basic demo to show people how it works. I think you can
         | imagine many other examples where it'll save you a lot of time.
        
           | hot_gril wrote:
           | The demo on Twitter is a lot cooler, partially because you
           | scroll to show the AI what the page has. Maybe there's a more
           | impressive demo to put on the GH too?
        
       | jamesmurdza wrote:
       | Have you thought about integrating the macOS accessibility API
       | for either reading text or performing actions?
        
         | ralfelfving wrote:
         | No, my thought process never really stretched outside of what I
         | built. I had this particular idea, then sat down to build it. I
         | had some idea of getting OpenAI to respond with keyboard
         | shortcuts that the application could execute.
         | 
         | E.g. in Photoshop: "How do I merge all layers" --> "To merge
         | all layers you can use the keyboard shortcut Shift + command +
         | E"
         | 
         | If you can get that response in JSON, you could prompt the user
         | if they want to take the suggested action. I don't see myself
         | using it very often, so didn't think much further about it.
        
       | quinncom wrote:
       | I'd love to see a version of this that uses text input/output
       | instead of voice. I often have someone sleeping in the room with
       | me and don't want to speak.
        
         | ralfelfving wrote:
         | You're not the first to request. Might add it, can't promise
         | tho.
        
       | hackncheese wrote:
       | Love it! Will definitely use this when a quick screenshot will
       | help specify what I am confused about. Is there a way to hide the
       | window when I am not using it? i.e. I hit cmd+shift+' and it
       | shows the window, then when the response finishes reading, it
       | hides again?
        
         | ralfelfving wrote:
         | There's a way for sure, it's just not implemented. Allowing for
         | more configurability of the window(s) is on my list, because it
         | annoys me too! :)
        
           | hackncheese wrote:
           | Annoyance Driven Development(tm)
        
       | qup wrote:
       | I have a tangential question: my dad is old. I would love to be
       | able to have this feature, or any voice access to an LLM,
       | available to him via an easy-to-press external button. Kind of
       | like the big "easy button" from staples. Is there anything like
       | that, that can be made to trigger a keypress perhaps?
        
         | ralfelfving wrote:
         | I personally have no experience with configuring or triggering
         | keyboard shortcuts beyond what I learned and implemented in
         | this project. But with that said, I'm very confident that what
         | you're describing is not only possible but fairly easy.
        
       | Art9681 wrote:
       | Make sure to set OpenAI API spend limits when using this or
       | you'll quickly find yourself learning the difference between the
       | cost of the text models and vision models.
       | 
       | EDIT: I checked again and it seems the pricing is comparable.
       | Good stuff.
        
         | ralfelfving wrote:
         | I think a prompt cost estimator might be a nifty thing to add
         | to the UI.
         | 
         | Right now there's also a daily API limit on the Vision API too
         | that kicks in before it gets really bad, 100+ requests
         | depending on what your max spend limit is.
        
       | qirpi wrote:
       | Awesome! I love it! I was just about to sign up for ChatGPT Plus,
       | but maybe I will pay for the API instead. So much good stuff
       | coming out daily.
       | 
       | How does the pricing per message + reply end up in practice? (If
       | my calculations are right, it shouldn't be too bad, but sounds a
       | bit too good to be true)
        
         | ralfelfving wrote:
         | I have a hard time saying how much this particular application
         | cost to run, because I use the Voice+Vision APIs for so many
         | different projects on a near daily basis and haven't
         | implemented a prompt cost estimator.
         | 
         | But I also pay for ChatGPT Plus, and it's sooo worth it to me.
         | 
         | If you'd like to skip Plus and use something else, I don't
         | think my project is the right one. I'd STRONGLY suggest you
         | check out TypingMind, the best wrapper I've found:
         | https://www.typingmind.com/
        
           | qirpi wrote:
           | Wow, thanks for sharing that link, I've been looking for
           | something like this :)
        
       | spullara wrote:
       | Did you not find the built-in voice-to-text and text-to-speech
       | APIs to be sufficient?
        
         | ralfelfving wrote:
         | Didn't even think of them to be honest.
        
       | zmmmmm wrote:
       | I've been wanting to build something like this by integrating
       | into the terminal itself. Seems very straight forward and avoids
       | the screen shotting. So you would just type a comment in the
       | right format and it would recognise it:                   $ ls
       | a.txt b.txt c.txt              $ # AI: concatenate these files
       | and sort the result on the third column         $ #....         $
       | # cat a.txt b.txt c.txt | sort -k 3
       | 
       | This already works brilliantly by just pasting into CodeLLaMa so
       | it's purely terminal integration to make it work. All i need is
       | the rest of life to stop being so annoyingly busy.
        
         | paulmedwards wrote:
         | I wrote a simple command line app to let me quickly ask a quick
         | question in the terminal - https://github.com/edwardsp/qq. It
         | outputs the command I need and puts it in the paste buffer. I
         | use it all the time now, e.g.                   $ qq
         | concatenate all files in the current directory and sort the
         | result on the third column         cat * | sort -k3
        
           | zmmmmm wrote:
           | yep absolutely - have seen a few of those. And how well they
           | work is what inspires me to want the next parts, which are
           | (a) send the surrounding lines and output as context - notice
           | above I can ask it about "these files" (b) automatically add
           | the result to terminal history so I can avoid copy/paste if I
           | want to run it. I think this could make these things
           | absolutely fluid, almost like autocomplete (another crazy
           | idea is to _actually_ tie it into bash-completion so when you
           | press tab it does the above).
           | 
           | CodeLLama with GPU acceleration on Mac M1 is almost instant
           | in response, its really compelling.
        
       | smcleod wrote:
       | Nice project, any plans to make it work with local LLMs rather
       | than "open"AI?
        
         | ralfelfving wrote:
         | Thanks. Had no plans, but might give it a try at some point.
         | For me, personally, using OpenAI for this isn't an issue.
        
         | hmottestad wrote:
         | I think that LM Studio has an OpenAI "compliant" API, so if
         | there is something similar that supports vision+text then it
         | would be easy enough to make the base URL configurable and then
         | point it to localhost.
         | 
         | Do you know of a simple setup that I can run locally with
         | support for both images and text?
        
       ___________________________________________________________________
       (page generated 2023-12-12 23:00 UTC)