[HN Gopher] Launch HN: Milk Video (YC W21) - Edit online event r...
       ___________________________________________________________________
        
       Launch HN: Milk Video (YC W21) - Edit online event recordings
       quickly
        
       Hello HN gang! Lenny and Ross here, working on Milk Video
       (https://milkvideo.com/), a browser-based tool to turn long videos
       into watchable clips. We speed up the workflow for marketers
       editing long, boring Zoom recordings and webinars into visually
       engaging clips with quality templates and styled captions.  Ross
       and I met 8 years ago in Shanghai, where we worked at an education
       startup and organized tech and design events. When we realized
       Covid was creating a tsunami of webinars, Ross noticed the growing
       cost of editing all the new content as B2B companies replaced their
       in-person marketing channels with online events.  Most registrants
       to online events don't end up attending. They may be interested in
       the content, but they won't take time to watch an entire webinar
       recording. Webinar content has a short shelf life unless it is
       reworked into a friendlier format. Doing that with traditional
       video editing software is cumbersome, so it often doesn't happen.
       It's time-intensive to review videos for key moments, ask designers
       to create appropriate graphics and captions, and receive final
       approval from managers.  We started out contacting companies
       organizing webinars, and learned they were stuck in a vicious cycle
       of constantly having to focus on the next upcoming event. We
       started manually editing videos for them to better understand how
       the most engaging bits could be reworked. Doing this manually
       revealed a glaring problem: the technology interfacing with video
       has changed dramatically, but the editing software hasn't. Video
       editing software is designed for film makers or social media, and
       businesses creating video content have very different needs.  Milk
       Video uses a transcript-to-video based interface to review long
       recordings and minimize the mental effort around editing. We
       transcribe uploaded videos, present you with the content so you can
       quickly clip the best parts, and allow you to use templates to
       compose visually interesting layouts with additional assets, like
       logos or static text.  We made a drag-and-drop interface for
       creating short video clips with styled word-by-word captions. In a
       world where people often don't have their audio on, the timestamp
       information on a machine-generated transcript is perfect for
       creating interesting visual elements, such as captions styled one
       word at a time. This also makes content accessible by default. And
       because most webinars or Zoom recordings are visually similar, we
       have the ability to recommend which video templates might be best
       suited for their uploaded content in the future.  The frontend is a
       React app based on Redux Toolkit and Recoil.js. Our performant
       transcript interface is made possible due to Slate.js. Our backend
       is a Ruby on Rails app and depends on a non-trivial number of
       serverless functions hosted on Google Cloud and AWS. Our speech-to-
       text provider is AssemblyAI, who we found were both cheaper, faster
       and better than Google and Amazon.  We would love your feedback on
       the tool. We are spending a lot of time working directly with our
       first users, and would appreciate all of the input we can get. I'm
       also happy to go into detail around how any specific parts work!
       We'll be in the comments and are eager to hear all your thoughts!
        
       Author : rememberlenny
       Score  : 109 points
       Date   : 2021-02-24 16:00 UTC (7 hours ago)
        
       | 1234letshaveatw wrote:
       | So it wasn't by choice, but most of my grad school business
       | classes are now virtual. Most have a team presentation aspect
       | where you collaborate on a powerpoint and charts and spreadsheets
       | then "present" virtually by tacking on audio and video and then
       | upload the whole mess after a lot of stress and panic.
       | 
       | Have you considered creating a free "edu" edition that would
       | generate a mashup of uploaded videos, ppt and media, watermarked
       | or whatever? The students would then conversion to a paid model
       | for work? I would use it
        
         | whoisjuan wrote:
         | "The students would then conversion to a paid model for work?"
         | 
         | Honestly that almost never works for startups.
         | 
         | It never happens or at least it never happens fast enough. I'm
         | fairly sure that large companies like Microsoft that
         | popularized student licenses, are benefiting from this, but
         | through very long cycles of adoption.
         | 
         | At the end of the day if you get discounted Office for 4 or 5
         | years, you will highly likely continue doing it once you lose
         | the discount. The secret sauce there is distribution. Office is
         | so massively popular that you don't even need to advocate for
         | that at work. It's the default choice.
        
           | 1234letshaveatw wrote:
           | I work in IT for engineering.
           | 
           | CAD sw for the engineering schools is crazy competitive. If
           | you are graduating and know a CAD/simulation suite really
           | well it helps you get a job (and therefore it will influence
           | where you apply, which influences the employers, etc)
        
             | whoisjuan wrote:
             | I totally believe this. But it's highly contextual to your
             | job/area. CAD software is very specialized so it definitely
             | makes sense that you can influence or take job decisions
             | based on your knowledge of a particular suite.
             | 
             | What I believe is that if you're a startup building a
             | generic business tool and you target students with the hope
             | they will be your advocates or future buyers, you should
             | adjust your expectations.
             | 
             | I believe that giving student discounts as a long-term
             | adoption strategy makes sense. Unfortunately, startups have
             | to fight for adoption in the short term. If you can afford
             | to give student discounts you should do it. It's a good
             | thing to do. But don't count on this as a way to drive
             | adoption.
        
         | rememberlenny wrote:
         | Yes! Non-profit and academic use case is currently free.
         | 
         | It's worth noting, the use case you are mentioning is
         | interesting, but not exactly the workflow we are building for.
         | 
         | We are focused on improving the workflow around a
         | marketer/demand generation/sales person at a company who needs
         | to use existing content to get attention on social
         | media/blog/email.
         | 
         | One of the ways we are thinking about this is around increasing
         | the "shelf-life" of quality content, which doesn't get
         | discovered because its long. This is very much a problem that
         | appeared when businesses became content creators, as opposed to
         | individuals.
         | 
         | That being said, if you have any questions, please email either
         | Ross (ross@milkvideo.com) or me (lenny@milkvideo.com) and we
         | will be happy to help!
        
           | 1234letshaveatw wrote:
           | Gotcha, I respect that you have a focused model.
           | 
           | I would also think that local government would be a good
           | target audience- We have sat through some endless school
           | board zooms (due to COVID). I am sure "shrink and share"
           | would work there as well.
        
             | rememberlenny wrote:
             | At a high level, there are two kinds of video processing
             | work: synthesizing/organizing and creation.
             | 
             | There are a number of new companies appearing in the
             | synthesizing/organizing space, because there was an
             | explosion in the quantity of video being produced, and the
             | software to parse it exists (speech-to-text, object
             | recognition, better-search, etc).
             | 
             | At least for companies, people intuitively want to share
             | all the "best parts", but we found that people won't
             | actually watch them. Instead highlighting one specific
             | piece that visually looks engaging is a better way to
             | capture attention, and then engage.
             | 
             | Our thought process here is that there is only so much
             | organizing/synthesizing most people will do, but there is
             | an endless about of creation someone can do.
             | 
             | Re: government - For what it's worth, the US Chamber of
             | Commerce is one of our paying customers.
        
               | 1234letshaveatw wrote:
               | It would be sexier to synthesize the cyber truck
               | unveiling over the local planning board, but then again
               | Buffett didn't shy away from investing in garbage pickup
        
       | 35mm wrote:
       | I'm interested to try this as my day job involves editing 2-3
       | webinars and 2-5 Zoom interviews per week.
       | 
       | Currently I use Premiere Pro with some templates I've created.
       | 
       | I haven't found any of the transcript based editing tools to be
       | robust enough. Descript is buggy and slow on my MacBook Pro
       | (which has zero issues running Prem Pro & After Effects).
       | Transcriptive has issues where it gets out of sync with the
       | original.
       | 
       | What would be really helpful is detecting speaker changes, long
       | pauses, the start and end of slide presentations (or switching to
       | different decks), and transcription if it actually works smoothly
       | and stays in sync across edits.
        
         | rememberlenny wrote:
         | Would you email me, and we can chat? lenny@milkvideo.com
         | 
         | The inspiration behind making this is to replace Premiere Pro,
         | so I'd love to understand your MVP to solve your problem.
         | 
         | I do think we can be very performant for you, given that all
         | the processing is done on the cloud, and you are only ever
         | interfacing with JavaScript/video tags. That being said, there
         | is work to do!
         | 
         | We don't have speaker diarization right now, but it's just a
         | feature flag for us. Also the start/end content is something
         | that we don't have active, but is planned for next week.
        
       | vanpelt wrote:
       | Hey Lenny! Congrats on the launch, so excited to see your product
       | come together.
        
         | rememberlenny wrote:
         | Thank you Chris for your advice and support! The partnership
         | you and Luke have is an inspiration for Ross and I.
        
       | acemarke wrote:
       | > The frontend is a React app based on Redux Toolkit and
       | Recoil.js
       | 
       | Hey, great to see Redux Toolkit being used in the wild! Would
       | love to hear your thoughts on using RTK, and I'm particularly
       | curious about the combination of RTK + Recoil together. What use
       | cases are you using each of those for?
       | 
       | Please let me know if you've got any suggestions for improving
       | RTK! I'm usually in the Reactiflux Discord evenings US time, and
       | always happy to chat.
        
         | rememberlenny wrote:
         | Redux Toolkit is INCREDIBLE. I have the utmost respect for the
         | developers working on it. I've worked on 6 large redux based
         | applications, and they were all implemented incredibly
         | differently. This has been the first time I really love the
         | implementation.
         | 
         | I am using RTK for the overall app state and Recoil for the on-
         | page state. I make API requests and store the results in the
         | redux store, but the hooks/prop passing is too slow for
         | handling video players/transcript manipulation.
         | 
         | I initially had everything in RTK, but noticed the render cycle
         | for dispatching to and listening to the store was creating
         | unusual issues.
         | 
         | With Recoil, Im able to represent the video player's current
         | time state, and then listen to it in the other parts of the
         | app. Similarly, when I have the transcript updating the time,
         | the React Context based API performs better than the
         | hook/props.
         | 
         | Happy to dig more into this. I'll reach out via Twitter too!
        
       | ahstilde wrote:
       | Pretty cool to see a product I use on HN.
       | 
       | I've been using Milk to promote my podcast (
       | https://www.allschemesconsidered.com ) on LinkedIn. Here's some
       | sample videos:
       | 
       | https://www.linkedin.com/posts/mraakashshah_cloud-aws-gcp-ug...
       | 
       | https://www.linkedin.com/posts/mraakashshah_heres-a-teaser-f...
       | 
       | All the social media companies are pushing video really hard in
       | their algos (and stories ). Recording and editing a podcast is
       | super fun for me, but the audience-building part was a drag. Milk
       | lets me make professional quality highlights super easily.
       | Ironically, the viewership on these highlight videos is 100x the
       | listenership on the podcast.
       | 
       | Anyway, I like the software.
       | 
       | Disclaimer: Ross found and pitched me on Milk, but I've been a
       | happy user ever since.
        
         | rosscranwell wrote:
         | It's been great working with you, Aakash! Thanks for the
         | support.
        
       | mfleit wrote:
       | I'm using Type Studio https://typestudio.co What is the
       | difference?
        
         | rememberlenny wrote:
         | We aren't focused on being a transcript editing tool. You can
         | upload a video, get a transcript, and edit it in Milk Video,
         | but thats not our focus.
         | 
         | Our focus is helping make a visual clip that is engaging, based
         | on the transcript information.
         | 
         | Here are some examples I posted above:
         | 
         | -
         | https://twitter.com/rememberlenny/status/1339618249575714816...
         | 
         | - https://twitter.com/rabois/status/1310644068326629376?s=20
         | 
         | -
         | https://twitter.com/m_cieplinski/status/1356331228954292224?...
        
       | annelibby wrote:
       | Go, Lenny!
        
       | derrickli978 wrote:
       | We use both Milk and Descript at Macro for our podcast workflows.
       | 
       | Milk is really good at creating short (for us it's ~1min),
       | impactful Twitter + LinkedIn marketing collaterals based on each
       | podcast guest (we can add our logo, the guest's background, the
       | guest's bio + picture, transcript, etc.).
       | 
       | Descript is amazing at editing the entire podcast and make sure
       | we have the overall content needed to publish.
       | 
       | Can't imagine doing our podcast without Milk + Descript.
        
         | rememberlenny wrote:
         | Thank you! Thank you! Thank you!
         | 
         | Per the podcast point, last night (phew) we launched the
         | ability to upload audio files and work with them.
         | 
         | We are focused on webinars/Zoom recordings, but now you can
         | upload a podcast and create a promo tile.
         | 
         | These are some other links that were made in Milk Video:
         | 
         | - https://twitter.com/m_cieplinski/status/1356331228954292224
         | 
         | - https://twitter.com/rabois/status/1310644068326629376
         | 
         | - https://twitter.com/rememberlenny/status/1339618249575714816
        
       | mchusma wrote:
       | I have some constructive feedback. The main video on your
       | homepage confused me immensely. It was a guy from brex talking
       | about retool, but they weren't very eloquent (no offense) and it
       | rambled on forever. I thought it was going to be a before and
       | after thing showing how bad it was before and how amazing it was
       | after...but it was just a before? Or was that after milk? I also
       | thought for a minute I was on the wrong video, hearing about
       | retool.
       | 
       | I use brex and retool and descript and canva so I think I would
       | be a target user but just didn't get it at all from the video.
       | 
       | My feedback would be to make the company "Acme" and show before
       | or after...but definitely for a video product that first video
       | should be a really good video.
       | 
       | My 2 cents.
        
         | rememberlenny wrote:
         | Extremely valuable and noted! Thank you for taking the time to
         | write this out and share.
        
       | elviejo wrote:
       | How does Milk compare to Descript?
       | 
       | what are the advantages / disadvantages ?
        
         | rememberlenny wrote:
         | Thanks for asking this. This is like comparing Photoshop and
         | Canva.
         | 
         | Descript is hands down the leader in any transcript based
         | video/audio editing. They set the standard for detailed editing
         | and magically manipulating audio/video.
         | 
         | We are focused on the workflow around creating something
         | visually appealing, that uses a Zoom recording in it.
         | Specifically, the transcript-based interface is only for
         | speeding up the review process, but our main focus is on visual
         | templates to drop in the video/captions.
         | 
         | One way to think of it is that we took the Descript Audiogram
         | feature, and built out a workflow that creates a wider variety
         | applicable to marketing/sales related needs.
         | 
         | We are solving the problem where you need to quickly take a
         | video recording and make something you/your team can proudly
         | share on social media, that reflects your company's brand
         | guidelines/visual aesthetic.
        
       | nate wrote:
       | Congrats y'all! Looks interesting. One use case I believe you're
       | talking about that I'd love to see even more fleshed out is the
       | tool to find the interesting clips for me? I'd love if after I
       | just did this upload, you were like: "here's 3 clips that seem
       | interesting to our AI." Maybe even a summarization algorithm
       | would suffice at finding the most relevant chunks in transcript?
       | Or maybe something more fancy if it's doable. But I'd love a best
       | effort stab at the clips so I don't even have to think about
       | finding them :)
        
         | rememberlenny wrote:
         | I love this idea in concept.
         | 
         | The point of the transcript is to lower the bar on who can
         | review video content. One layer on top of that is moving the
         | technical work (cutting video) into an editorial role (picking
         | the parts that are recommended).
         | 
         | We aren't trying to position ourselves to do the clip
         | picking/recommendation now, but we have already done some
         | machine learning based analysis to make this easier to find. We
         | have a video processing task that looks for "scene changes"
         | based on image threshold changes, so the metadata associated to
         | when a new person joins/slide changes/etc is present.
         | 
         | The original thinking here is that we can recommend "templates"
         | that correspond with certain video (ie. multiple speakers vs
         | single presenter).
        
       | gramakri wrote:
       | In the mobile view of https://milk.video/pricing, all I see are
       | numbers/unlimited . I guess the row titles got scrolled down
        
         | rememberlenny wrote:
         | Thanks! Will fix this. The app definitely doesn't work on a
         | small screen, but the homepage/pricing should.
         | 
         | FWIW - they are Webflow templates, but do a great job at making
         | it easy to manage.
        
       | gamesbrainiac wrote:
       | Just checked this out. I prefer Descript. Better editing, as well
       | as overdub.
        
         | rememberlenny wrote:
         | Thanks for signing up and trying it out.
         | 
         | We actually drive people to use Descript for most use cases,
         | that aren't relevant.
         | 
         | Think Photoshop vs Canva.
         | 
         | Since speech-to-text APIs have become really good (props to
         | companies like AssemblyAI (https://www.assemblyai.com/), the
         | transcript-based interfaces are going to become much more
         | common.
         | 
         | Our product goal is to solve the use case around making the
         | visual output, when editing/correction isn't the goal. That
         | being said, the editor should be performant and work well, so
         | lots to improve there.
         | 
         | As an aside, there are a few evolving open-source libraries
         | that consume the output of these STT services
         | (https://github.com/bbc/react-transcript-editor) and make
         | turnkey transcript interfaces.
         | 
         | The newest/most developed one I like is based on Slate, and
         | made by a really amazing engineer at the Wall Street Journal
         | named Pietro.
         | 
         | Link: https://github.com/pietrop/slate-transcript-editor
        
       | yeldarb wrote:
       | We tried out the demo last summer (which was a bit of manu-mation
       | before the product was fully built out -- kudos to Lenny & crew
       | for doing things that don't scale) and had a great experience!
       | Here's an example of one of the videos we got via Milk:
       | https://www.youtube.com/watch?v=O4jOqVqyAo8
       | 
       | Excited to circle back and try it out again now that it's
       | software instead of humans doing the heavy lifting behind the
       | scenes!
        
         | rememberlenny wrote:
         | Thank you!
         | 
         | Context here - before we started working on the current
         | software, we planned to do a opaque marketplace for post-
         | production video work. To vet the idea, we reached out to
         | companies with webinars and manually edited their videos.
         | 
         | In the process, one finding was that its hard to make styled
         | word-by-word highlighted captions. This resulted in a small
         | utility app that turned SRT and VTT caption files into a
         | dynamically sized/styled caption videos, and later evolved into
         | todays product.
        
       ___________________________________________________________________
       (page generated 2021-02-24 23:00 UTC)