[HN Gopher] Launch HN: Vocode (YC W23) - Library for voice conve...
       ___________________________________________________________________
        
       Launch HN: Vocode (YC W23) - Library for voice conversation with
       LLMs
        
       Hey everyone! Kian and Ajay here from Vocode-an open source library
       for building LLM applications you can talk to. Vocode makes it easy
       to take any text-based LLM and make it voice-based. Our repo is at
       https://github.com/vocodedev/vocode-python and our docs are at
       https://docs.vocode.dev.  Building realtime voice apps with LLMs is
       powerful but hard. You have to orchestrate the speech recognition,
       LLM, and speech synthesis in real-time (all async)-while handling
       the complexity of conversation (like understanding when someone is
       finished speaking or handling interruptions).  Our library is easy
       to get up and running-you can set up a conversation in <15 lines of
       code. Check out our Gen Z GPT hotline demo:
       https://replit.com/@vocode/Gen-Z-Phone (try it out at
       +1-650-729-9536).  It all started with our PrankGPT project that we
       built for fun (quick demo at
       https://www.loom.com/share/0d0d68f1a62f409eb5ae24521293d2dc). We
       realized how powerful voice + LLMs are but that it was hard to
       build.  Once we got everything working, it was really cool and
       useful. Talking to LLMs is better than all the voice AI experiences
       we've had before. And, we imagined a host of cool applications that
       people can build on top of that.  So, we decided to build a
       developer tool to make it easy. Our library is open source and
       gives you everything you need in a single place.  We give you a
       bunch of integrations out-of-the-box to speech
       recognition/synthesis providers and let you swap them out easily.
       We have platform support across web and telephony (via Twilio),
       with mobile coming soon. We also provide abstractions for streaming
       conversation (this is good for realtime apps like phone calls) and
       for command-based/turn-based applications (like voice-based chess).
       And, we provide customizability around how the conversation is done
       --things like how to know when someone is finished speaking,
       changing emotion, sending filler audio if there are delays, etc.
       In terms of "how do you make money" - we have a hosted version that
       we're going to charge for (though right now you can get it for
       free! https://app.vocode.dev) and we're also going to build
       enterprise products in the future.  We'd love for you to try it out
       and give us some feedback! And, if you have any demos you'd like to
       see - let us know and we'll take a crack at building them. We're
       curious about your experiences using or building voice AI, what
       features or use cases you'd love to see, and any other ideas you
       have to share!
        
       Author : KianHooshmand
       Score  : 168 points
       Date   : 2023-03-29 15:43 UTC (7 hours ago)
        
       | airstrike wrote:
       | EDIT: never mind, I must be dreaming
        
         | kritr wrote:
         | I can't actually seem to find this with the search term
         | "Vocode".
        
       | ksarw wrote:
       | Congrats on the launch! One step closer to Jarvis.. ;)
        
       | air7 wrote:
       | This is really cool! I've been waiting for such a library to show
       | up. Thank you. One thing: The documentation is currently a bit
       | scarce as to how to tweak the assistant in terms of voice/prompt
       | manipulation etc.
       | 
       | For example, it would be very instructional if you could show how
       | you implemented the Gen-Z demo (great idea btw).
        
         | KianHooshmand wrote:
         | thank you for the kind words! absolutely agree - we're gonna
         | beef up our tutorials and documentation... just have had so
         | much to do but it's definitely one of our focuses now. stay
         | tuned! :)
        
           | ajaynraj wrote:
           | also! the code for the demo is available (and running!) at
           | https://replit.com/@vocode/Gen-Z-Phone
        
         | [deleted]
        
       | whitemary wrote:
       | Sounds great. FYI The site does not work well on Firefox iOS.
        
         | KianHooshmand wrote:
         | Ah! Have not tried this but will look into it - thank you :)
         | 
         | Our docs are hosted on Mintlify
        
       | peteforde wrote:
       | I just called your voice demo, and immediately started sending
       | the number to my friends. What an incredibly impressive and
       | convincing demo. I'm going to update my standard mentoring
       | wisdom: the only thing more compelling than a great product video
       | is a phone number that you can call to have your first voice
       | conversation with an AI.
       | 
       | If HN allowed memes - and thank goodness that it does not - there
       | would be a room full of sombre gentlemen slow-clapping for you
       | right here.
       | 
       | I hope that number survives the ineveitable deluge. How many
       | callers can your system handle simultaneously?
        
         | KianHooshmand wrote:
         | Thank you!! Really glad you enjoyed it
         | 
         | We actually have no clue... but it seems to be holding up well.
         | We can scale up the CPU as necessary but not sure about Twilio.
         | I guess we will find out!
        
           | joshspankit wrote:
           | I'm getting "We're sorry: an application error has occurred".
           | I'm guessing you've hit some scaling friction.
        
             | KianHooshmand wrote:
             | Yep we're definitely getting a large volume right now -
             | working on it!
        
       | mkagenius wrote:
       | How is this achieving the real time response time? My chatGPT api
       | calls are so slow.
        
         | ajaynraj wrote:
         | The short answer is that everything is streaming -- as tokens
         | come back from ChatGPT we send them as soon as possible to the
         | synthesizer. The long answer is found in our code[0] :).
         | 
         | [0] https://github.com/vocodedev/vocode-
         | python/blob/main/vocode/...
        
           | og_kalu wrote:
           | how is it sounding good though. usually text to speech models
           | need the full context to sound reasonable.
        
       | wantsanagent wrote:
       | The phone number is a really fun demo! The pronunciation is off
       | on a number of things: "LLM", dates ending w/ "AD", but the
       | response delays are surprisingly short and the conversation is
       | very natural. The 'bored and slightly annoyed' vocals make the
       | generally helpful tone of the agent seem _very_ sarcastic. Very
       | funny and interesting!
        
         | ajaynraj wrote:
         | Thanks! It's a collab with rime.ai TTS. Unlike a lot of other
         | TTS providers, they train on conversation, not
         | podcasts/audiobooks so you get those disfluencies in speech
         | that make it seem natural!
        
       | jdcampolargo wrote:
       | Congrats. Do you have the repo for PrankGPT?
        
         | KianHooshmand wrote:
         | thank you! it's not live right now... but stay tuned for april
         | 1 :)
        
       | user- wrote:
       | It has some issues. It would only respond when I said "Hello??"
       | after long silences, and would ignore anything else I said. Or
       | maybe my voice sucks
        
         | ajaynraj wrote:
         | sorry you had that experience! Would love to help you get the
         | bot running locally so we can figure out what's going on --
         | here's our Discord: https://discord.gg/NaU4mMgcnC
        
       | teabee89 wrote:
       | The phone demo is incredible, but due to sound quality, I find it
       | is speaking too fast and when it's telling me company names I
       | literally had to ask to repeat or spell it out in NATO alphabet.
       | Also not a fan of the "what'up?", would prefer something like
       | "Yes, how may I help you?" just like an information hotline.
       | Other than that it's quite impressive!
        
         | ajaynraj wrote:
         | thanks! we have a more "informational" phone number at
         | +19105862633 that speaks a little slower (but sounds more
         | robotic).
        
       | dalexeenko wrote:
       | Very cool, congrats Ajay and Kian!
        
         | KianHooshmand wrote:
         | thank you!
        
       | Jeff_Brown wrote:
       | For those of us who can't call it for reasons like national
       | borders, could someone post a demo video? I'm not finding it on
       | Youtube.
        
         | ajaynraj wrote:
         | here's one from Twitter!
         | https://twitter.com/altryne/status/1640880190401257473?s=20
        
       | endisneigh wrote:
       | it feels like every single company in the current YC batch has
       | decided to pivot to LLMs
        
         | [deleted]
        
         | jjallen wrote:
         | It feels like LLMs can help me more and more each day with the
         | stuff I want to build.
        
         | joshspankit wrote:
         | To me that speaks of the possibilities for LLMs to solve a lot
         | of big problems
        
         | SkyPuncher wrote:
         | Generally, the hardest part of startups is the "fuzzy" product
         | capabilities. LLM make it practical to codify much of what has
         | previously been either (1) bruteforce/tedium (2) too labor
         | intensive.
         | 
         | Like all startup waves, we'll see a bunch of them fail.
         | However, I think we're going to see a lot of neat stuff come
         | out of this as well.
        
         | Kiro wrote:
         | I'm genuinely curious about this. I also get the feeling that
         | many are pivots. ChatGPT hadn't even been released when the
         | deadline for YC W23 was. Sure, GPT-3 was released earlier but
         | it still feels like most companies are reactions to recent
         | trends. If most are pivots, what did they pivot from?
        
           | robopsychology wrote:
           | Crypto tax reporting tools for enterprise?
        
       | davidxc wrote:
       | This is really amazing, thanks for building and sharing this!
        
         | KianHooshmand wrote:
         | thank you! love your feedback and please feel free to drop any
         | questions in discord/on github
        
       | 19h wrote:
       | Not sure if I understood that right -- is that something like
       | Whisper + an LLM? Like [0]?
       | 
       | If OpenAI adds speech input to ChatGPT -- and considering the
       | upcoming plugins -- isn't a possible enterprise specialisation of
       | VoCode the only viable long term investment?
       | 
       | [0] https://twitter.com/ggerganov/status/1640022482307502085
        
         | KianHooshmand wrote:
         | And yes! It's STT/LLM/TTS where you can choose between
         | different providers and run it across different platforms. It
         | can be turn based (like the demo you linked from twitter) or
         | streaming (this allows for conversation with interruptions!)
        
           | pbronez wrote:
           | Another big win here would be multi-lingual support.
        
         | KianHooshmand wrote:
         | Our belief is that at some point OpenAI will add a speech-to-
         | speech model. This will improve the library functionality
         | (since now the whole stack is controlled by a single entity, so
         | the product will naturally be better latency/quality wise).
         | 
         | Our library is open source so that we can all build a
         | development/utility layer on top of whatever foundational
         | models are created. Plugins of course also improve what the
         | agents can do. And right, we will be building enterprise
         | focused products in the future!
        
           | ttul wrote:
           | OpenAI will absolutely add voice and my guess is that their
           | voice support will rival anything on the market because they
           | will train the voice model alongside the text and image
           | models. This is likely months away if not weeks away.
           | 
           | Obviously just my $0.02:
           | 
           | I'd start building for the enterprise right now. Visualize a
           | future where there are several multimodal AGIs that work with
           | voice, images, and text. Be the enterprise voice layer for
           | all of them. Build your moat there.
        
             | KianHooshmand wrote:
             | We totally agree - thank you for the feedback! :)
        
       | jdiez17 wrote:
       | Would be cool to support multi-language conversations. Just tried
       | the Gen Z hotline and I got her to switch to Spanish (read back
       | with a hilarious accent), but the voice recognition doesn't
       | handle me speaking Spanish.
        
         | KianHooshmand wrote:
         | We haven't added the ability to switch languages mid
         | conversation... but that's a very cool feature!
         | 
         | You can configure the initial language with the library though!
         | So it works across several languages that are supported by the
         | STT/TTS providers you choose
        
       | altryne1 wrote:
       | I called the GEN-Z phone like and it pretty much blew me away in
       | response speed. It replied often faster than my family from the
       | other side of the world would!
        
         | ajaynraj wrote:
         | thank you!! websockets have been around forever but they're
         | still so fast.
        
       | wanderingmind wrote:
       | This looks awesome. My only nitpick is, I will suggest
       | transcription integration with whisper.cpp[1], which in my simple
       | CPU based tests (likely your most user base), works much much
       | faster compared to OpenAI whisper
       | 
       | [1] https://github.com/ggerganov/whisper.cpp
        
         | KianHooshmand wrote:
         | We definitely want to do this! We've been talking about it
         | (it's much better like you said for realtime); it's been hard
         | to juggle everything we've wanted to add.. which is why we
         | think this makes so much more sense open source!
         | 
         | We want the repo to be community built and a public good...
         | would love contributors to start adding integrations we can't
         | get to ourselves
        
       | yodon wrote:
       | Thanks for going ahead and building this so the rest of us can
       | focus on using it!
        
         | KianHooshmand wrote:
         | Of course! We loved working on this and chose to open source
         | precisely for this reason. Heavily inspired by the work people
         | are doing on Langchain and providing a usability/developer
         | layer on top of foundational models.
         | 
         | Nothing like this existed for voice so we started cranking on
         | it!
        
       | moritonal wrote:
       | When I had time I was looking for an option to replace the Alexa
       | in my house with an LLM+Whisper. When I have time I'll try to
       | setup an extension to Home Assistant that's capable of
       | interpreting voice and translating that into HA actions.
        
         | ajaynraj wrote:
         | Home Assistant is such a cool project :) great idea!
        
           | altryne1 wrote:
           | Look at home assistant, it will come this year from them if
           | anyone
        
         | joshspankit wrote:
         | I feel like GPT4 would be happy to help.
         | 
         | Though the winning version will likely be something like a
         | local ChatGPT plugin ( _please let's make this plugin style a
         | standard that we can use for local AIs_ )
        
       | alasdair_ wrote:
       | This makes all of Amazon's many billions of investment in Alexa
       | almost worthless. If there is some kind of "command" plugin to
       | this, I'd love to hook it up to Home Assistant and completely
       | replace the Alexa ecosystem.
        
       | adept_js wrote:
       | [flagged]
        
       | mdolon wrote:
       | This was one of the coolest demos I've seen in a while. You
       | should share that number around more prominently (and get more
       | bandwidth, starting to get errors!), it does a fantastic job of
       | explaining what you do.
        
         | ajaynraj wrote:
         | thank you!! we also have another number which is prompted to
         | act as a spokesperson for the product: (650) 835-7163
        
       | all2 wrote:
       | I asked GenZGPT "what's your name?" and she said something like
       | "I'm a lim, but you can call me whatever you like." So I said
       | "pick a name", and she said "how about you call me Zephyr,
       | queen".
       | 
       | My immediate reaction was to figure out what to name this thing.
       | 
       | I also love that it can run locally. I need to get some hardware
       | so I can have it run locally, and screen out spam calls. And
       | maybe have it schedule appointments for me.
       | 
       | An AI butler needs a number of interface points:
       | 
       | - browser
       | 
       | - shell (cuz I might want it to SSH into a box and do stuff)
       | 
       | - email (browser could take care of this)
       | 
       | - phone
       | 
       | - text
       | 
       | And also IOT access, so she can call my cellphone and tell me
       | when someone breaks in.
        
         | asdfzalsd wrote:
         | How were you able to get it running?
         | 
         | I tried to get it running my local and with the hosted web-app
         | but it doesn't work :(
         | 
         | mind if I shoot you discord dm?
        
           | ajaynraj wrote:
           | would love to help you get it running as well!
           | https://discord.gg/NaU4mMgcnC
        
             | asdfzalsd wrote:
             | discord link is broken :(
        
               | all2 wrote:
               | It worked for me.
        
           | all2 wrote:
           | I used the web demo available here
           | https://replit.com/@vocode/Gen-Z-Phone, punch the run button
           | and then spam the phone number +1 650 729 9536
        
       | marcodiego wrote:
       | Can it be run fully locally?
        
         | KianHooshmand wrote:
         | yes! You can run the local version here in your bash
         | https://docs.vocode.dev/python-quickstart#self-hosted
        
           | Vespasian wrote:
           | I think this used to mean can it be run offline and right now
           | (usually) whenever there is an LLM involved the answer is
           | soundly no
        
             | KianHooshmand wrote:
             | Ah! Right now our default is set to use OpenAI... but you
             | can actually use local LLMs by creating a custom agent.
             | We're going to add a full stack of local STT/TTS/LLM...
             | just haven't had time for it yet!
             | 
             | If anyone wants to help with it we're totally open for
             | contributions :)
        
       | npilk wrote:
       | The Gen-Z GPT phone demo is really something. It's fascinating
       | how differently I speak to this model compared to how I interact
       | with more "formal" and text-first models.
        
         | ajaynraj wrote:
         | thank you!! The difference between a conversation with a
         | command-based assistant and a conversational assistant backed
         | by a LLM is subtly significant -- you don't expect to have real
         | conversations with the former and you actually engage with the
         | latter.
        
       | ilovepuppies wrote:
       | Congrats on the launch! Just got the demo React app up and
       | running, very cool. I've wanted to interact with an LLM via real
       | time speech for a while now, this will be perfect.
       | 
       | Important feedback on the live demo page: Make the default output
       | sampling rate a normal talking speed. Right now it defaults to
       | the highest rate if you don't set it / know which rate is best.
       | First thing I did on the page was click the mic. The voice was
       | too fast, and since the active mic disables the settings, I
       | thought I couldn't change them so it might be broken. Also you
       | want to make it clear that you can change the settings by turning
       | off the mic. That took me a while to figure out.
       | 
       | Again, well done!
        
         | ajaynraj wrote:
         | thanks!! Sampling rate actually shouldn't affect talking speed
         | - you can adjust the voice speed with this parameter[0] :)
         | 
         | [0] https://github.com/vocodedev/vocode-
         | python/blob/main/vocode/...
        
           | ilovepuppies wrote:
           | To clarify, here's the demo URL I'm referring to:
           | https://demo.vocode.dev/
           | 
           | You're right sampling rate doesn't change speed, whoops. But
           | on that page you have to change / set the "Set Output
           | Sampling Rate" to slow down the default voice speed.
        
             | ajaynraj wrote:
             | Ah, got it -- that demo is a bit old and definitely has
             | some bugs, my bad!
        
       ___________________________________________________________________
       (page generated 2023-03-29 23:00 UTC)