[HN Gopher] Show HN: PodText.ai - Search anything said on a podc...
       ___________________________________________________________________
        
       Show HN: PodText.ai - Search anything said on a podcast, highlight
       text to play
        
       Hi HN, wanted to share a project that I've been working on
       recently.  PodText allows users to find anything said on a podcast.
       You can also listen and share clips to a specific part of the
       podcast audio, simply by highlighting the text of that part.
       Currently there are just over 25k podcast episodes and I'm adding a
       lot more in the coming weeks (yes my GPU bill is painful).  In
       order to monetize it, I'm building a sponsorship database to help
       sponsors find podcasts and vice versa. This will be sold in the
       form of a $99/month "PodText Business" subscription. I bet I could
       charge a lot more to large sponsors but I'll tweak that as I talk
       to potential customers.  Right now the UI is very bare bones
       (doesn't even have pagination) but I'll polish it once the data
       pipeline is working well. Please let me know if you run into any
       bugs or have any questions about the site or business model.  PS:
       I'm a regular on HN using my real name but can't post under that
       account since my employer will fire me if they found out about this
       project :-)
        
       Author : anonbuilder
       Score  : 62 points
       Date   : 2023-02-09 17:33 UTC (5 hours ago)
        
 (HTM) web link (podtext.ai)
 (TXT) w3m dump (podtext.ai)
        
       | nshm wrote:
       | https://podscript.ai/ too!
        
       | mosselman wrote:
       | How can I search for a full name or phrase? Something like "Steve
       | Jobs"? Now I would get Steve and Jobs results.
        
       | 71a54xd wrote:
       | Haha I built this for fun a few months ago to full text search my
       | own favorite podcasts. What are you guys using to do full text
       | search / indexing?
        
         | anonbuilder wrote:
         | Currently I'm using Algolia but others have pointed out some
         | alternatives I'll have to check out. Would love to hear any
         | feedback/ideas you have from your project!
        
       | who-shot-jr wrote:
       | Hello, this looks fantastic, I was looking for something like
       | this a while ago.
        
       | lappa wrote:
       | This is a really interesting project with a lot of potential.
       | 
       | If I were a sponsor looking for a podcast I would want my search
       | process to look something like this:
       | 
       | - Search for a term relevant to my line of business
       | 
       | - See a list of podcasts ordered by % of utterances which contain
       | my key phrase throughout their last N episodes
       | 
       | - Annotation of how many listeners each podcast had in last N
       | episodes
        
         | podtext wrote:
         | Appreciate the feedback! I'll keep these use cases in mind as I
         | build out PodText Business
        
       | sva_ wrote:
       | I thought about making something like this, but one important
       | part - which seems to be missing here - is speaker diarization
       | (identify who says what.)
       | 
       | In a world of increasing automated content generation, the "who"
       | might become just as important as the "what" of information.
        
       | BonoboIO wrote:
       | What software do you use to transcribe the speech? Whisper?
        
         | anonbuilder wrote:
         | Yes, using Whisper running on banana.dev :)
        
           | nshm wrote:
           | Do I understand correctly banana pricing is that it costs
           | $1.87 per hour, so the hour of audio with large model costs
           | you about $1? Thats probably a bit too expensive compared to
           | cloud providers.
        
       | jcowdy wrote:
       | Nice work! It would be great if after clicking on the text within
       | the podcast that matches my term, I was brought to that section
       | of the transcript rather than to the beginning.
        
         | anonbuilder wrote:
         | That's a great piece of feedback! I'll get this fixed soon,
         | super useful suggestion
        
       | Oras wrote:
       | This is great, I was working on something similar in the last few
       | days, but since it is hard to cover every podcast, I stopped to
       | think of a way to niche down. I feel your pain with GPU and
       | scalability to transcript podcasts.
       | 
       | I was thinking of adding something like this for the UI
       | https://github.com/johan-akerman/SpotifyTranscripts in case you
       | find it useful.
       | 
       | Good luck! It is a really nice project.
        
         | anonbuilder wrote:
         | Thanks! I'll check out their UI, mine definitely needs a lot of
         | work :)
        
       | pablomendes wrote:
       | Happy to exchange notes with you on our learnings building
       | https://podsearch.page
        
       | skykooler wrote:
       | This is neat, but would it be possible to search for a multi-word
       | phrase? If I search for a sequence of words I just get results
       | that match one or more of the words but not the phrase itself.
        
       | SCUSKU wrote:
       | Here's my crack at a podcast transcription website:
       | https://podscription.app
       | 
       | I made this while unemployed and the skills I learned from making
       | it helped me land my new job!
        
       | kevmo314 wrote:
       | Very neat! Is it possible to browse by topic instead of by
       | podcast? You mentioned 25k podcast episodes but right now I can
       | only browse ~100 of them or so unless I come up with some
       | keywords.
        
       | mritchie712 wrote:
       | This is amazing. How do you search for two terms at once? e.g.
       | "aboriginal" and "origin". Doesn't seem possible to require both
       | terms are present.
        
         | anonbuilder wrote:
         | Thanks! The search function is built with Algolia, I'm sure
         | they support boolean ops like "AND" but I'll need to dig into
         | their API. I think if you search both terms, transcripts
         | containing both should be ranked higher.
        
           | kwerk wrote:
           | I'm doing a similar personal product. Highly recommend
           | switching to Typesense before your Algolia trial is up. I've
           | heard good things about Meilisearch but Typesense has been
           | rock solid for me.
        
             | Oras wrote:
             | I second that. Typesense is fantastic. I used it for a job
             | board with 3 million and it did great.
        
             | anonbuilder wrote:
             | Haven't heard about Typesense - thanks for the pointer! Btw
             | if you want to trade notes on our projects, feel free to
             | email me: team@podtext.ai
        
           | pablomendes wrote:
           | You might want to try semantic search instead of fiddling
           | with keywords. Disclaimer: I'm building a plug-and-play
           | semantic search API at https://kailualabs.com
        
             | anonbuilder wrote:
             | I've been thinking about how to improve search, would love
             | to try a demo of your product!
        
       | mbesto wrote:
       | Super interesting - I did a search on a common business name like
       | "Dropbox" and its clear some ads show up. Wonder if there is any
       | way to parse these out so they don't show in the results?
        
         | anonbuilder wrote:
         | That's a great point, I'm parsing out ads to determine
         | sponsorships anyway so filtering these from search should be
         | straightforward. Thanks for the feedback :)
        
       ___________________________________________________________________
       (page generated 2023-02-09 23:00 UTC)