[HN Gopher] Phenaki: A model for generating minutes-long, changi... ___________________________________________________________________ Phenaki: A model for generating minutes-long, changing-prompt videos from text Author : vagabund Score : 53 points Date : 2022-09-29 18:36 UTC (4 hours ago) (HTM) web link (phenaki.video) (TXT) w3m dump (phenaki.video) | ag8 wrote: | Wow--this qualitatively feels a lot more impressive than the Meta | model. The two-minute video is better than anything I've seen in | video generation on that scale. | anigbrowl wrote: | I'm happy about it because all the celebs who paid $ to churn | out an 'AI music video!!?!' with Stable Diffusion and whatever | shitty demo they had lying around are suddenly revealed as | tryhard chasing the hype cycle rather than innovators. | anigbrowl wrote: | This addresses several of the shortcomings in the AI video | technology that's the current top story on HN. It's entertaining | to consider the possibility that the explosion of innovation is | partly due to artificially generated papers and business entities | that are busily iterating upon each others' capabilities while we | write micro-editorials about what that means. | blondin wrote: | this does not sound too far-fetched. the paper says anonymous | authors pending review... | GaggiX wrote: | I didn't read both papers deep enough but I think that Make-A- | Scene by being conditioned only on image embeddings is incapable | of generating videos that required a broader understanding that | cannot be encoded in an image embedding like "Camera zooms | quickly into the eye of the cat" Make-A-Scene is more like text- | to-animated_image, this model seems more powerful. | Hard_Space wrote: | That long embedded video is the nearest T2V has got to breaking | my cynicism about how long it is going to take to become (at | least) coherent. | | Check it out alongside the project page to see the text that | formed it alongside, or just watch it here: | | https://phenaki.video/stories/2_minute_movie.webp | | However, there are some flourishes and timing that are not | indicated from the prompt text, and I think there is some manual | tweaking at play (which is okay, it's still impressive). | Hard_Space wrote: | This appears to be just one plank of a tripartite shock assault | on the October conference season. | | The other two, also by anonymous authors using the same | formatting, are: | | AudioGen: Textually Guided Audio Generation | https://openreview.net/forum?id=CYK7RfcOzQ4 | | and | | Re-Imagen: Retrieval-Augmented Text-to-Image Generator | https://openreview.net/forum?id=XSEBx0iSjFQ | | There is a samples site for AudioGen, but it is currently flooded | and inaccessible: | | https://anonymous.4open.science/w/iclr2023_samples-CB68/repo... | abeppu wrote: | What's the commonality in formatting you're paying attention | to? I think the conference asks everyone to use their | template/style. | | But the architecture figures look like they have different | styles. E.g. the Re-Imagen paper uses rows/stacks of small | colored circles to represent output tensors, and colored | rectangles of different ratios to indicate shape differences, | where the phenaki paper uses stacks of squares for output | tensors, and differently shaped elements to distinguish | different kinds of components. | | https://github.com/ICLR/Master-Template ___________________________________________________________________ (page generated 2022-09-29 23:01 UTC)