[HN Gopher] Phenaki: A model for generating minutes-long, changi...
       ___________________________________________________________________
        
       Phenaki: A model for generating minutes-long, changing-prompt
       videos from text
        
       Author : vagabund
       Score  : 53 points
       Date   : 2022-09-29 18:36 UTC (4 hours ago)
        
 (HTM) web link (phenaki.video)
 (TXT) w3m dump (phenaki.video)
        
       | ag8 wrote:
       | Wow--this qualitatively feels a lot more impressive than the Meta
       | model. The two-minute video is better than anything I've seen in
       | video generation on that scale.
        
         | anigbrowl wrote:
         | I'm happy about it because all the celebs who paid $ to churn
         | out an 'AI music video!!?!' with Stable Diffusion and whatever
         | shitty demo they had lying around are suddenly revealed as
         | tryhard chasing the hype cycle rather than innovators.
        
       | anigbrowl wrote:
       | This addresses several of the shortcomings in the AI video
       | technology that's the current top story on HN. It's entertaining
       | to consider the possibility that the explosion of innovation is
       | partly due to artificially generated papers and business entities
       | that are busily iterating upon each others' capabilities while we
       | write micro-editorials about what that means.
        
         | blondin wrote:
         | this does not sound too far-fetched. the paper says anonymous
         | authors pending review...
        
       | GaggiX wrote:
       | I didn't read both papers deep enough but I think that Make-A-
       | Scene by being conditioned only on image embeddings is incapable
       | of generating videos that required a broader understanding that
       | cannot be encoded in an image embedding like "Camera zooms
       | quickly into the eye of the cat" Make-A-Scene is more like text-
       | to-animated_image, this model seems more powerful.
        
       | Hard_Space wrote:
       | That long embedded video is the nearest T2V has got to breaking
       | my cynicism about how long it is going to take to become (at
       | least) coherent.
       | 
       | Check it out alongside the project page to see the text that
       | formed it alongside, or just watch it here:
       | 
       | https://phenaki.video/stories/2_minute_movie.webp
       | 
       | However, there are some flourishes and timing that are not
       | indicated from the prompt text, and I think there is some manual
       | tweaking at play (which is okay, it's still impressive).
        
       | Hard_Space wrote:
       | This appears to be just one plank of a tripartite shock assault
       | on the October conference season.
       | 
       | The other two, also by anonymous authors using the same
       | formatting, are:
       | 
       | AudioGen: Textually Guided Audio Generation
       | https://openreview.net/forum?id=CYK7RfcOzQ4
       | 
       | and
       | 
       | Re-Imagen: Retrieval-Augmented Text-to-Image Generator
       | https://openreview.net/forum?id=XSEBx0iSjFQ
       | 
       | There is a samples site for AudioGen, but it is currently flooded
       | and inaccessible:
       | 
       | https://anonymous.4open.science/w/iclr2023_samples-CB68/repo...
        
         | abeppu wrote:
         | What's the commonality in formatting you're paying attention
         | to? I think the conference asks everyone to use their
         | template/style.
         | 
         | But the architecture figures look like they have different
         | styles. E.g. the Re-Imagen paper uses rows/stacks of small
         | colored circles to represent output tensors, and colored
         | rectangles of different ratios to indicate shape differences,
         | where the phenaki paper uses stacks of squares for output
         | tensors, and differently shaped elements to distinguish
         | different kinds of components.
         | 
         | https://github.com/ICLR/Master-Template
        
       ___________________________________________________________________
       (page generated 2022-09-29 23:01 UTC)