[HN Gopher] Text2LIVE: Text-Driven Layered Image and Video Editing
       ___________________________________________________________________
        
       Text2LIVE: Text-Driven Layered Image and Video Editing
        
       Author : lnyan
       Score  : 96 points
       Date   : 2022-07-10 09:49 UTC (1 days ago)
        
 (HTM) web link (text2live.github.io)
 (TXT) w3m dump (text2live.github.io)
        
       | cube2222 wrote:
       | That's really cool, and could realistically end up being very
       | useful as an end-user product.
       | 
       | I'm just waiting for the GIF keyboard that creates GIF's based on
       | your prompt instead of searching through an existing database of
       | them.
       | 
       | That will be truly next-level.
        
         | egfx wrote:
        
           | [deleted]
        
           | moritonal wrote:
           | Wtf, after waiting about 2mins for your site to load all I
           | could tell from your product page is that I could somehow
           | spend $1,299.00 on whatever it was.
        
             | metadat wrote:
             | I also had the same experience, this feels like a huckster
             | product plug for SEO.
             | 
             | The immediate redirect is a little odd as well, malware
             | payload delivery anyone?
        
       | upupandup wrote:
       | how long until we can write stuff like: make everybody nude in
       | this kpop video?
        
       | Existenceblinks wrote:
       | Looks promising. I think this would end up with some well-defined
       | schema + some DSL.
        
       | minimaxir wrote:
       | This just reinforces my hypothesis that OpenAI's release of CLIP
       | on 2021 was more impactful to image research than the DALL-E
       | paper.
        
         | zmgsabst wrote:
         | If you missed it, like I did, here's a blog post by OpenAI:
         | 
         | https://openai.com/blog/clip/
        
       | metadat wrote:
       | Our best minds are working on this amazing near-magical new
       | technology.. which will end up being productized into an
       | Instagram Filter service to dynamically inject a stained glass
       | unicorn in place of a horse in a video.
       | 
       | That's cool and all, but also really stupid and a pointless
       | distraction compared to how novel the underlying mathematics and
       | science are. This will quickly become a commodity and humans will
       | acclimate to seeing such tricks. The content produced won't even
       | be considered particularly impressive.
       | 
       | Damn. I was hoping the singularity would be better.
        
         | skybrian wrote:
         | Yes, lowering the cost of special effects means that they're
         | not that special and there will be lots of crap. On the other
         | hand, it lowers the cost of filmmaking, so there should be
         | storytellers who use this to good effect, where the special
         | effect isn't that noticeable but serves the story?
         | 
         | Though, a question is whether the good storytellers can be
         | found easily? It seems like the situation is similar in fan
         | fiction.
        
         | naillo wrote:
         | I think now is a good time (now that they're out of research
         | zone and actually working) for a flood of new people to enter
         | this space and try to come up with more creative ideas than
         | that. It's an exciting time for cool ideas and cool projects if
         | we take off our cynicism hat for a bit.
        
           | metadat wrote:
           | How is this going to be accessible to the new wave of people
           | you're imagining?
           | 
           | I'd be all for it! Just not clear on a plausible path for how
           | this better future comes to pass.
        
             | naillo wrote:
             | Well you will have to do some work to understand them (read
             | papers etc). But distilled (from larger models) versions of
             | these things are fairly capable of being computed even on
             | the web etc. There's definitely cool low hanging fruit here
             | (though it's not plug and play import a library). The main
             | thing is that these have been proven to be as powerful as
             | they are (last year it wasn't clear they'd be able to get
             | this good), so with some effort there's definitely cool
             | stuff to be built. I'm excited (and working on it myself).
        
         | TaylorAlexander wrote:
         | I see it differently. As a robotics engineer I know the biggest
         | impediment to robotics development is getting computers to
         | understand the real world. The work on multimodal neurons,
         | which see the word cake and know to associate it with images of
         | cake, is a key stepping stone along the way to a fully
         | functional embodied AI that can solve difficult real world
         | problems. CLIP, DALL-E, and all these off shoots are
         | representations of what we can pull from these efforts today.
         | But long term this work will be incorporated in to bigger and
         | more capable AI systems.
         | 
         | Just think: when I ask you "walk in to the workshop, grab a
         | hammer and a box of nails, and meet me on the roof to help me
         | secure some loose shingles" your mind is already imagining the
         | path you will take to get there, what it will look like when
         | you locate and grab the hammer and nails, and you've filled in
         | that to get on the roof you have to meet me in the back yard to
         | climb the ladder, which I never mentioned.
         | 
         | All these tiny details your mind can do effortlessly take huge
         | efforts like CLIP to sort out how to make it work. And even
         | CLIP is only text and images. There is a lot more to go from
         | there.
         | 
         | A lot of people focus on DALL-E and the artifacts that come out
         | along the way, but these are not the destination, just little
         | stops showing the progress we are making on a much larger
         | journey.
        
         | [deleted]
        
         | zitterbewegung wrote:
         | The thesis "Best new minds" is not even correct. If the "Best
         | new minds" didn't create things like social networking and
         | instagram we wouldn't even have the data sources to build upon
         | these new algorithms to even work. Also, without a bunch of
         | video game makers actually pushing graphics cards we also would
         | have the hardware to do these things. So the "best new minds"
         | accidentally enabled more "best new minds".
        
         | avgcorrection wrote:
         | So what?
         | 
         | 1. Their priorities are wrong so they are not the best minds
         | 
         | 2. If (1) is false because the best minds can have stupid
         | priorities, then The Best Minds is not the be-all-end-all of
         | everything
        
       | graiz wrote:
       | We're still in the first innings of this stuff. Zero-shot/CLiP
       | work will extend to audio, music, music videos and perhaps full
       | movies. I would love to see: "The Phantom Menace with Jar Jar
       | Binks replaced by leonardo dicaprio using the voice of James Earl
       | Jones"
        
       ___________________________________________________________________
       (page generated 2022-07-11 23:00 UTC)