[HN Gopher] Show HN: Podcast API
       ___________________________________________________________________
        
       Show HN: Podcast API
        
       Author : wenbin
       Score  : 165 points
       Date   : 2020-11-18 17:48 UTC (5 hours ago)
        
 (HTM) web link (www.listennotes.com)
 (TXT) w3m dump (www.listennotes.com)
        
       | mikece wrote:
       | Is this service cooperating with or in competition to the Podcast
       | Index?
       | 
       | https://podcastindex.org/
        
         | a_brawling_boo wrote:
         | I was wondering the same thing. This one is a paid $$$ API.
        
       | stevoski wrote:
       | As an avid podcast creator and consumer, I'd love to take a full
       | look at this, but I kept getting the full captcha experience. You
       | know what I mean, "select the squares with sidewalks" :(
       | 
       | OP, are you aware of this?
        
       | petercooper wrote:
       | It's not a replacement for this by any means, but in case anyone
       | would find a reasonably up to date list of around 600,000
       | podcasts useful, here you go: https://gofile.io/d/MjYVy7 - No
       | episodes, just the name, creator, and feed URLs for further
       | crawling on your own.
        
       | darrenwestall wrote:
       | Happy customer here. I've seen a few people suggest a free
       | service but from our testing this is far more comprehensive and
       | with better search quality.
       | 
       | We use the service in conjunction with iframely to load podcast
       | episodes that can be listened to with ease.
       | 
       | Great product, customer service and documention.
       | 
       | Thanks from team Paiger <3
        
       | [deleted]
        
       | mongol wrote:
       | Suggestion for pivot: add a podcast playing web application
       | (basically podcast subscriptions, you have already most other in
       | place), and charge a more reasonable amount for that plus
       | unlimited regular search. The pro subscription is way too
       | expensive for me.
       | 
       | Edit: I didn't notice that this is about a new API service
        
       | OJFord wrote:
       | > API Terms of Use
       | 
       | > Applications using the Listen API must not pre-fetch, cache,
       | index, or store any content on the server side. Note that the id
       | and the pub_date (e.g., latest_pub_date_ms, pub_date_ms...) of a
       | podcast or an episode are exempt from the caching restriction.
       | 
       | Is that.. common? I've never knowingly come across anything like
       | that before, seems weird to me. Sort of makes sense, in a 'you
       | must not try to avoid needing to pay us more because we want more
       | money' sort of way, but.. really? Also, almost entirely
       | (basically, except OSS) undetectable, surely.
       | 
       | [Edit: failure to read my own quote correctly, thanks xd1936] ---
       | And if you really take it seriously - 'must not [...] store any
       | content' - it really limits what you could even use it for, not
       | being able to store the `id` even for a later reference. I don't
       | think that's what's intended, but it seems to be what's written.
       | ---
       | 
       | (Just so I don't sound like a grumpy old git (I'm not old, at
       | least!) - I really _really_ really like the docs page
       | https://www.listennotes.com/api/docs/ only thing I'd suggest
       | perhaps is embedding the OpenAPI 'HTML' contents below the other
       | options, rather than it being a link to follow. Awesome though.)
        
         | loosetypes wrote:
         | I've noticed similar recently with many paid book search apis
         | out there and was also grossed out.
         | 
         | You're not paying for a data source at all, you're paying for
         | an expensive embedded application.
         | 
         | I don't see how it's remotely reasonable. The person managing
         | this api has stricter protections on this data (though they're
         | not even his podcasts) than we have on our personal data.
        
           | PragmaticPulp wrote:
           | You're not paying for the data, you're paying for the
           | service.
           | 
           | This is common. Companies that provide the data for offline
           | use tend to have a separate licensing and subscription fee
           | structure. Companies that provide the API tend to forbid
           | offline caching/storage of the data.
        
             | loosetypes wrote:
             | I commend the service provider for aggregating the data and
             | making a business - hope that person is able to make a
             | living from it.
             | 
             | It's an interesting service that I would be very interested
             | in using in providing a service of my own. And I'd be more
             | than happy to pay for it, but those terms are a non-
             | starter, at least for me.
             | 
             | The year is 2040. There's no running water. Grocery stores
             | mandate that all purchased liquids must be consumed prior
             | to leaving the premises.
        
             | Fnoord wrote:
             | Thing is, you prevent an API so that people don't use some
             | kind of data harvester. If your API is terrible, people
             | resort to harvesting.
        
             | OJFord wrote:
             | The service, though, is 'convenient access to the data
             | [which is already out there]'. And once I've used it, I
             | don't need it 100/sec just because that's how frequently
             | people are using my downstream service to do something with
             | some popular 'trending' podcast; I'm perfectly happy (and
             | it would be a good practice to be!) caching it for some
             | period, until I need the service again to conveniently see
             | if the data that's already out there has changed.
        
               | PragmaticPulp wrote:
               | > The service, though, is 'convenient access to the data
               | [which is already out there]'.
               | 
               | The service is whatever is described in the contract you
               | agree to when you purchase it.
               | 
               | If you don't like the terms of the contract, you can
               | always try to negotiate an alternate agreement. Or you
               | can choose not to purchase the service.
               | 
               | The seller isn't obligated to provide their services on
               | your terms, just as you're not obligated to purchase the
               | seller's services on their terms if you don't agree to
               | them.
        
         | hombre_fatal wrote:
         | Map tiling APIs do this, like Mapbox and Google. Else you could
         | circumvent all but their lowest-tier subscription plans with a
         | brain-dead caching proxy and a large disk which is what they
         | want to avoid.
         | 
         | Amazon's API famously does this as well (or used to, it's been
         | a while) by requiring any prices you show to be no more out of
         | date than N minutes forcing you to basically request on demand
         | every time you need to show it. They'd rather you just send the
         | traffic their way for people to see the price.
        
           | klysm wrote:
           | Depending on the use case, possibly a whole lot of disks.
        
             | ehsankia wrote:
             | Right, I would assume that even just the tiles for the
             | biggest cities alone would still be way more than most
             | would want to store. On the other hand, let's assume on the
             | client-side, can you not even cache a tile a user just saw
             | 10s ago but went off screen? Or is it assumed the browser
             | will cache that tile?
        
           | OJFord wrote:
           | Heh, yeah. I think my reaction's still similar though - why
           | _shouldn 't_ I be allowed to do that?
           | 
           | The alternative of course is to charge more per tile, or have
           | a base 'access fee' + small incremental charge. Pay per usage
           | doesn't work best for everything, IMO.
           | 
           | (And I'd likely still want to come back occasionally to check
           | it hasn't changed, even if I cached every tile forever.
           | (Which I probably wouldn't, if the hit rate was really low,
           | like it was a one-off, and I'm being cheap about my API usage
           | why wouldn't I also be cheap about my disk usage.))
        
             | PragmaticPulp wrote:
             | > why shouldn't I be allowed to do that?
             | 
             | Short answer: Because that's the contract.
             | 
             | Companies that provide data for offline use will have a
             | separate licensing modeling, usually with subscriptions for
             | updates or perhaps a finite license term. MaxMind's GeoIP
             | database is a popular example.
        
               | OJFord wrote:
               | That's not really an answer though, that was the starting
               | point.
               | 
               | And this isn't a one-off dataset, we're discussing an API
               | pricing model - there _will_ be new podcasts, existing
               | podcasts ' metadata _will_ change; people using this API
               | _will_ want to make repeated calls, they just might also
               | reasonably want to cache results.
               | 
               | If this were my service, I just wouldn't do pay-per-API-
               | call, or at least not _only_. Of course, the free tier
               | presents more of a problem then, but I 'd probably just
               | restrict it more making it less attractive, and have a
               | lower entry point than the $100pcm that's a flat-fee for
               | some but not all extra features, showing images at all
               | (and not in free), for example.
               | 
               | As it is, I reckon loads of users cache results - not
               | maliciously, just because they haven't read that they're
               | not supposed to - and that OP has no idea (because how
               | would they).
        
               | hombre_fatal wrote:
               | Pay-per-use is just the simplest and most straightforward
               | and possibly fairest way to couple the value your API
               | gives someone with the amount they pay in return.
               | 
               | Or, from the eyes of the user, they get full access to
               | the API yet don't have to pay much if their project gets
               | no traction.
               | 
               | The downside is that users can lie, but it's mainly just
               | low-end users who would lie. Pay-per-user licenses are
               | similar: a startup or a hackathon is most likely to share
               | the license between a few people while larger companies
               | are going to be honest because (1) they can afford it and
               | (2) they don't want trouble at scale.
               | 
               | So you can ignore most abuse.
               | 
               | The problem with other payment structures for ListenNotes
               | is that it's a relatively small database. You can clone
               | the whole thing trivially. It doesn't even mirror/host
               | the audio feeds. Its only value is that it put in the
               | work of structuring and normalizing the metadata.
               | 
               | If you built a business on top of ListenNotes, you'd save
               | more and more money as you grow bigger and bigger if you
               | were simply cloning the whole thing with your own
               | crawler. So the more value you would get from
               | ListenNotes, the less you're actually paying them. Or
               | ListenNotes would have to price their per-call fee so
               | high that they could somehow capture a fair price for
               | that value yet shut out smaller users.
               | 
               | Turns out "courtesy agreements" generally do work at
               | scale as larger companies become less and less likely to
               | lie just like they become less and less likely to pirate
               | Photoshop.
               | 
               | > have a lower entry point than the $100pcm that's a
               | flat-fee for some but not all extra features, showing
               | images at all (and not in free), for example.
               | 
               | The downside of this is that now you limit what people
               | can build on cheaper tiers. In fact maybe they can't even
               | build their compelling product without whatever content
               | you're paywalling behind tiers they can't afford on day
               | 1, while the goal is to let someone build anything they
               | want on day 1 so that they are a large end-user on day
               | 1000.
               | 
               | After all, the ideal isn't that you scale value with your
               | customer's income but rather you scale in price as they
               | convert value into income. It, of course, is all just
               | trade-offs.
        
         | nefitty wrote:
         | I worked on a food tracking PWA, and getting it to be useful
         | offline was horrible. We'd have to hit the API at least once a
         | day to grab commonly used foods and refresh our temporary
         | cache. The data did not change at all... eggs don't suddenly
         | have a different calorie count the next time you eat them lol
        
           | OJFord wrote:
           | A database of all of the world's foods though could easily be
           | larger than I'd like a calorie counting app on my phone to be
           | though, for example. So it's not necessarily silly - network
           | can be cheaper than disk.
        
         | xd1936 wrote:
         | > Note that the id and the pub_date of a podcast or an episode
         | are exempt from the caching restriction
         | 
         | > it really limits what you could even use it for, not being
         | able to store the `id` even for a later reference
        
           | OJFord wrote:
           | :facepalm: - thanks, I'll (keep it but) edit my comment to
           | reflect that correction.
        
         | sjs382 wrote:
         | IIRC, Mapbox has similar terms for both their map tiles and
         | their geo lookup results.
        
         | Gaelan wrote:
         | At least for the actual audio, I understand that podcasters get
         | grumpy when people cache that server-side, because they depend
         | on server logs to get viewership numbers for advertisers, so if
         | a popular client downloads the audio once and distributes it to
         | all their customers, they can't make money off any of those
         | customers.
        
       | dtran wrote:
       | Great work Wenbin! What's the hardest part about maintaining this
       | API?
        
         | wenbin wrote:
         | Thanks for asking!
         | 
         | The hardest part is to make small incremental improvements over
         | a long period of time :)
         | 
         | Like most software projects, this API is never a finished
         | product. It's always work-in-progress.
         | 
         | Small incremental improvements are not glamorous, typically not
         | newsworthy to share to the public.
         | 
         | Some examples of small incremental improvements:
         | 
         | 1. Improve API docs. I heard that many API-focused startups
         | have a dedicated team to maintain their API doc page.
         | 
         | 2. Dealing with edge cases. As more apps/websites use our API,
         | we'll see some edge cases that we would never know, which could
         | be as simple as adding a data field in the response with 2
         | lines code change, or changing search index that requires to
         | re-index the whole thing for a few days. There could also be
         | some strange edge cases with billing, e.g. what if a user
         | subscribe to the paid plan, then unsubscribe, then subscribe
         | again, then do something strange, then unsubscribe...
         | 
         | 3. Customer support. This involves adding FAQ (tweaking the
         | texts) and preparing email templates to answer frequently asked
         | questions from users.
         | 
         | 4. Doing things to keep the service robust & performant, e.g.,
         | adding new alerts via Datadog/Pagerduty so we can know what go
         | wrong in time. We also need to have mechanism to be able to
         | know if a particular app sends tons of requests (e.g., send
         | request in an infinite loop) in a short amount of time and we
         | should be able to do something about it (e.g., suspend the
         | account).
        
           | hashamali wrote:
           | Are the docs custom or are you using a third party product?
           | Doesn't look like Swagger UI or Slate.
        
             | wenbin wrote:
             | It's built from scratch, which was easier than customizing
             | from some open source projects back then (early 2019).
             | 
             | But the doc is codified in openapi format:
             | https://www.listennotes.com/api/docs/#openapi
             | 
             | So you can feed the openapi spec into other doc viewers,
             | e.g., Postman, or redoc https://listen-
             | api.listennotes.com/api/v2/openapi.html
        
       | colinprince wrote:
       | When trying the Postman Web View I get:
       | 
       | Profile cannot be found This public profile may have been
       | disabled or deleted
        
         | wenbin wrote:
         | Already contacted Postman customer support :)
        
       | jcims wrote:
       | I was tinkering a bit recently in an effort to build a simple
       | system that finds 'related' podcasts and see if I can see the
       | network effect play out over time. I did this by building a graph
       | of people (hosts/guests) and episodes and started folding in
       | tags/topics. None of this is in my wheelhouse, and I found:
       | 
       | - It takes a _lot_ of work to curate a substantial collection of
       | podcasts. There are lists all over the place but it 's hard to
       | know what's really in there.
       | 
       | - I attmpmted to use SpaCy and/or NLTK to do some 'Named Entity
       | Recognition' in order to extract topics/people/orgnaziations from
       | episode titles and descriptions. This was surprisingly brittle.
       | The string 'Sean Carroll', for example, wasn't detected as a
       | person by either framework (IIRC). It also seems quite brittle to
       | punctuation and other context (e.g. beginning or end of a
       | sentence). This was using the default models shipped with both. I
       | started off with just the english models but expanded as there
       | were _lots_ of names being skipped silently. That helped less
       | than I had hoped.
       | 
       | - I have yet to find a good UI for exploring a graph. I used
       | Neo4j and the built in 'browser' is not intended for that
       | purpose. Gephi has good capability for filtering and analytics,
       | but it takes quite a bit of getting used to and the graph itself
       | isn't amenable to dynamic navigation.
       | 
       | That's all. Bookmarking this as it would really help.
        
         | wenbin wrote:
         | Many people use our Listen Later playlists to curate podcasts /
         | episodes by topics.
         | 
         | Here are some examples: https://www.listennotes.com/podcast-
         | playlists/
         | 
         | Each playlist has a rss feed. So you can subscribe to the
         | playlist on any podcast app (except for spotify or the like)
        
           | jcims wrote:
           | Yes this looks great, thanks!!!
        
           | eahlberg wrote:
           | Big fan of Listennotes in its entirety, but this feature is a
           | real gem!
        
         | cjlm wrote:
         | I've been thinking of doing the same for my graph visualization
         | newsletter source/target [0].
         | 
         | I'd love to connect if you're interested in collaborating!
         | sourcetarget@cjlm.ca
         | 
         | [0] https://sourcetarget.email
        
       | pedro1976 wrote:
       | Cool project! I wonder if your price points are randomly picked
       | or if 100 is the sweet spot, which would be odd.
       | 
       | Do you plan to add some text-to-speech magic, so one can search
       | for the actual podcast content? That would be a killer feature
       | for me :)
        
         | nshm wrote:
         | Yeah, modern open source speech recognition like Vosk can have
         | the cost like $2c per hour (70 times less than Google STT cost
         | $1.4/hour) and should be just enough for search.
         | 
         | @wenbin do you need any help with it?
        
       | bredren wrote:
       | Great work.
       | 
       | I first heard about Listen Notes when you were interviewed on the
       | Django chat podcast.
       | 
       | Here's that episode if people want to learn more about the tech
       | behind the site. https://lnns.co/Td9vzk47qQ3
        
       | ccvannorman wrote:
       | Super happy to see a BoostVC "cockroach" still at it 2.5 years
       | later! Keep up the good work. I use ListenNotes myself to
       | discover and play podcasts.
        
       | droopybuns wrote:
       | More interesting alternative:
       | https://podcastindex.org/?utm_source=podnews.net&utm_medium=...
        
       | dsco wrote:
       | We used ListenNotes a while back in a web based podcast player
       | and have only good things to say about the API. It's reasonably
       | priced, much easier to deal with than Apple's API and email
       | support is speedy!
        
         | [deleted]
        
       | pkamb wrote:
       | I'd like to see a crowdsourced "Genius for Podcasts".
       | 
       | Most podcast producers are terrible about correctly adding
       | metadata: Chapters, images, episode notes, descriptions, etc.
       | 
       | Let the superfans upload custom metadata to be displayed
       | alongside the episode as it's playing in your podcast player.
        
         | willcodeforfoo wrote:
         | I'm building something very similar to this!
        
       | praveenperera wrote:
       | I wish there was a question in the FAQ asking:
       | 
       | "Why would I use this over the iTunes API?"
        
         | wenbin wrote:
         | Great question! Will add.
         | 
         | For itunes api: 1. You can't search episodes 2. You can't get a
         | lot of search results of podcasts. 3. Their terms of use may
         | not allow you to do what you want to do
        
         | bredren wrote:
         | If that is what the official Podcasts app returns, it's bad
         | search results.
        
       | parondea wrote:
       | Love seeing development in the podcast space. One specific
       | problem I've been wanting solved for a long while is difficulty
       | with sharing podcasts with friends across podcast apps. If you're
       | not using the same podcast app as your friend, it's always a pain
       | to manually search and find the podcast in your own app. I'd love
       | a universal podcast url, something like `podcast://<podcast_url>`
       | that individual podcast apps can understand, which links you to
       | the podcast within your desired app, similar to the "default
       | browser" behaviour on mobile and desktop. Has anyone come across
       | something like this?
        
         | DoctorOW wrote:
         | I mean podcasts are an extension of RSS if I recall
         | correctly... I don't see why this wouldn't be possible.
        
         | adolph wrote:
         | This is a big problem for iOS. My spouse uses the default
         | Podcast app. I use Overcast. Anytime she sends me something to
         | listen iOS tries to open it in Podcasts. When I send something
         | from Overcast it gets sent as an Overcast URL.
        
         | Macha wrote:
         | Podcasts are just RSS feeds. Nothing stopping a app registering
         | itself as a handler for the RSS mime type, at least on
         | desktop/Android (I don't know how iOS works here). I doubt most
         | users would have a RSS reader installed at this stage, so most
         | users wouldn't even have a risk of getting it revealed as a
         | list of links to audio files by using the wrong app.
        
       | OJFord wrote:
       | > Trusted by 2,007 companies and developers.
       | 
       | Haven't seen this before, an actual figure rather than 'these big
       | names' (and you have no idea if it's just some small team
       | somewhere for some toy test/demo, or a significant piece of the
       | whole organisation's puzzle).
       | 
       | I'm (just idly) curious what number you waited for (assuming you
       | did) before making that public. Because, and obviously it'll vary
       | a bit for different people, there's going to be some number below
       | which it has negative impact, not just (probably some other, with
       | a 'meh' range between, number) above which it has the positive
       | impact that is it's raison d'etre.
        
         | cbowal wrote:
         | When I see usage figures touted in the form "Trusted by X
         | companies" I assume X is the total number of signups they've
         | had.
        
       | mrweasel wrote:
       | So like Podcastindex.org, but not free?
        
       ___________________________________________________________________
       (page generated 2020-11-18 23:00 UTC)