[HN Gopher] Imagen Video: high definition video generation with ...
       ___________________________________________________________________
        
       Imagen Video: high definition video generation with diffusion
       models
        
       Author : jasondavies
       Score  : 435 points
       Date   : 2022-10-05 17:38 UTC (5 hours ago)
        
 (HTM) web link (imagen.research.google)
 (TXT) w3m dump (imagen.research.google)
        
       | jupp0r wrote:
       | What's the business value of publishing this research in the
       | first place vs keeping it private? Following this train of
       | thought will lead you to the answer to your implied question.
       | 
       | Apart from that - they publish the paper and anybody can
       | reimplement and train the same model. It's not trivial but it's
       | also completely feasible to do for lots of hobbyists in the field
       | in a matter of a few days. Google doesn't need to publish a free
       | use trained model themselves and associate that with their brand.
       | 
       | That being said, I agree with you, the "ethics" of imposing
       | trivially bypassable restrictions on these models is silly.
       | Ethics should be applied to what people use these models for.
        
       | amelius wrote:
       | > Sprouts in the shape of text 'Imagen' coming out of a fairytale
       | book.
       | 
       | That's more like:
       | 
       | > Sprouts coming out of book, with the text "Imagen" written
       | above it.
        
         | Kiro wrote:
         | The prompt actually says "Imagen Video" and the sprouts form
         | the word "video". Even if they weren't it's still extremely
         | impressive. No-one expects this to be perfect. That would be
         | science-fiction.
        
       | montebicyclelo wrote:
       | We've been seeing very fast progress in AI since ~2012, but this
       | swift jump from text-to-image models to text-to-video models will
       | hopefully make it easier for people not following closely to
       | appreciate the speed at which things are advancing.
        
       | nullc wrote:
       | > We have decided not to release the Imagen Video model or its
       | source code
       | 
       | ...until they're able to engineer biases into it to make the
       | output non-representative of the internet.
        
       | kranke155 wrote:
       | I'm going to post an Ask HN about what am I supposed to do when
       | I'm "disrupted". I work in film / video / CG where the bread and
       | butter is short form advertising for Youtube, Instagram and TV.
       | 
       | It's painfully obvious that in 1 year the job might be
       | exceedingly more difficult than it is now.
        
         | dkjaudyeqooe wrote:
         | Adapt, it's what humans excel at.
         | 
         | Instead of feeling threatened by the new tools, think about how
         | you can use them to enable your work.
         | 
         | One of the ironies* of these tools is that they only work
         | because there is so much existing material they can be trained
         | on. Absent that they wouldn't exist. That makes me think: why
         | not think about how to train your own models than entail your
         | own style? Is that practical, how can you make it work and how
         | might you deploy that in your own work?
         | 
         | Something that everyone is sticking their heads in their sand
         | about is the real possibility that training models on
         | copyrighted work is a copyright violation. I can't see how such
         | a mechanical transformation of others' work is anything but.
         | People accept violating one person's copyright is a thing but
         | if you do it at scale it somehow isn't.
         | 
         | * ironic because they seem creative but they create nothing by
         | themselves, they merely "repackage" other people's creativity.
        
         | inerte wrote:
         | It depends where you are in the industry.
         | 
         | If you're on the creative, storyboard, come up with ideas and
         | marketing side, you will be fine.
         | 
         | If you're in actual production, booking sets, unfolding stairs
         | to tape infinite background, picking up the best looking fruits
         | in the grocery store... yeah, not looking good.
         | 
         | Go up in the value chain and learn marketing, how to tell
         | stories, etc... you don't want to be approached by clients
         | telling you what you should be doing, you want to be approached
         | and being asked what the clients should be doing.
        
         | j_k_eter wrote:
         | I first predicted this tech 5 years ago, but I thought it was
         | 15 years out. What I just said is beginning to happen with
         | pretty much everything. There's a third sentence, but if I
         | write it 10 people will gainsay me. If I omit it, there's a
         | better chance that 10 people will write it for me.
        
         | adamsmith143 wrote:
         | Learn how to use these models is the easiest answer. Prompt
         | Engineering (getting a model to output what you actually want)
         | is going to be something of an art form and I would expect it
         | to be in demand.
        
         | ijidak wrote:
         | It won't be easy. But below are my thoughts:
         | 
         | #1: Master these new tools #2: Build a workflow that
         | incorporates these tools #3: Master storytelling #4: Master ad
         | tracing and analytics #5: Get better at marketing yourself so
         | that you stand out
         | 
         | The market for your skillset may shrink, but I doubt it will
         | disappear...
         | 
         | Think about it this way...
         | 
         | Humans in cheaper countries are already much more capable than
         | any AI we've built.
         | 
         | Yet, even now, There are practical limits on outsourcing.
         | 
         | It's hard for me to see how this will be much different for
         | creative work.
         | 
         | It's one thing to casually look at images or videos, when there
         | is no specific money-making ad in mind.
         | 
         | But as soon as someone is spending thousands to run an ad
         | campaign, just taking whatever the AI spits out is unlikely to
         | be the real workflow.
         | 
         | I guess I'm suggesting a more optimistic take...
         | 
         | View it as a tool to learn and incorporate in your workflow
         | 
         | I don't know if you gain much by stressing too much about being
         | replaced.
         | 
         | And I'm not even sure that's reality.
         | 
         | I'm almost certain, most of the humans to lose their jobs will
         | be people who either because of fear or stubbornness refuse to
         | get better, refuse to incorporate these tools, and are thus
         | unable to move up the value chain.
        
           | alcover wrote:
           | Get better [...] so that you stand out
           | 
           | Please bear with me but this kind of advice is often a bit
           | puzzling to me. I suppose you don't know the person you're
           | replying to, so I read your advice as a general one - useful
           | to anyone in the parent's position. If you were close to her,
           | it would make sense to help her 'stand out' in detriment -
           | logically - to strangers in her field. But here you're kind
           | of helping every reader stand out.
           | 
           | I realise this comment is a bit vain. And I like the human
           | touch of you helping a stranger.
        
             | PinkMilkshake wrote:
             | I [...] don't [...] like [...] helping a stranger.
             | 
             | That's not very nice. The world would be a better place if
             | we helped strangers more.
        
         | metadat wrote:
         | Here's the link to kranke155's submission:
         | https://news.ycombinator.com/item?id=33099182
        
         | baron816 wrote:
         | Quite the opposite: you're going to be in even higher demand
         | and will make more money.
         | 
         | Yes, it will be possible for one person to do the work of many,
         | but that just means each person becomes more valuable.
         | 
         | It's also a law in economics that supply often drives demand,
         | and that's definitely the case in your field. Companies and
         | individuals will want even more of what you want. It's not like
         | laundry detergent (one can only consume so much of that).
         | There's almost no limit to how much of what you supply that
         | people could consume.
         | 
         | The way I see it, your output could multiply 100 fold. You
         | could build out large, complex projects that used to take
         | massive teams all by yourself, and in a fraction of the time.
         | Companies can than monetize that for consumers.
         | 
         | AI is just a tool. Software engineers got rich when their tools
         | got better. More engineers entered the field, and they just
         | kept getting richer. That's because the value of each engineer
         | increased as they became more productive, and that value helped
         | drive demand.
        
         | naillo wrote:
         | Whatever insights and expertize you've gained up until now can
         | probably be used to gain enough of a competitive advantage in
         | this future industry to be employed. I doubt the people that
         | will spend their time on this professionally will be former
         | coders etc. (I've seen the stable diffusion outputs that coders
         | will tweet. It's a good illustration that taste is still hugely
         | important.)
        
           | altcognito wrote:
           | I think there will be tons of jobs that resemble software
           | development for proper, quick high quality generation of
           | video/images.
           | 
           | That being said, it's possible that it won't pay anywhere
           | near what you're used to. Either way, it will probably be a
           | solid decade before you've really felt the pain for
           | disruption. MP3s, which were a far more straightforward path
           | to disruption took at least that long from conception.
        
             | jstummbillig wrote:
             | > That being said, it's possible that it won't pay anywhere
             | near what you're used to.
             | 
             | Also won't nearly require the amount of work it used to.
        
           | joshuahaglund wrote:
           | I like your optimism but OP's job is to take text
           | instructions and turn them into video, for advertisements. If
           | Google (who already control so much of the advertising space)
           | can take text instructions and turn them into advertisements,
           | what's left for OP to do here? Even if there's some
           | additional editing required this seems like it will greatly
           | reduce the hours an editor is needed. And it can probably
           | iterate options and work faster than a human.
        
             | pyfork wrote:
             | OP probably does more than it seems by interpreting what
             | their client is asking for. Clients ask for some weird shit
             | sometimes, and being able to parse the nonsense and get to
             | the meat is where a lot of skill comes into play.
             | 
             | I think Cleo Abrams on YT recently tackled this exact
             | question. She tried to generate art using DALL-E along with
             | a professional artist, and after letting the public vote
             | blindly, the pro artist clearly 'made' better content, even
             | though they were both just typing into a text prompt.
             | 
             | Here's the link if you're interested:
             | https://www.youtube.com/watch?v=NiJeB2NJy1A
             | 
             | I could see a lot of digital artists actually getting
             | _better_ at their job because of this, not getting totally
             | displaced.
        
             | simonw wrote:
             | Maybe OP's future involves being able to do their work 10x
             | faster, while producing much higher quality results than
             | people who have been given access to a generative AI model
             | without first spending a decade+ learning what makes a good
             | film clip.
             | 
             | The optimistic view of all of this is that these tools will
             | give people with skill and experience a massive
             | productivity boost, allowing them to do the best work of
             | their careers.
             | 
             | There are plenty of pessimistic views too. In a few years
             | time we'll be able to look back on this and see which
             | viewpoints won.
        
               | gjs278 wrote:
        
         | Keyframe wrote:
         | What happened to volume of web and graphic designers when
         | templates+wordpress hit them?
        
           | yehAnd wrote:
           | We employed a bunch of people to enter data into a template.
           | 
           | Bit of an apples/oranges comparison to tech that will
           | (eventually) generate endless supply of content with less
           | effort than writing a Tweet.
           | 
           | The era of inventing layers of abstraction and indirection
           | that simplify computer use down to structured data entry is
           | coming to an end. A whole lot of IT jobs are not safe either.
           | Ops is a lot of sending parameters over the wire to APIs for
           | others to compute. Why hire them when "production EKS
           | cluster" can output a TF template?
        
           | jstummbillig wrote:
           | A lot of additional work, because the industry was growing
           | like crazy in tandem.
        
             | visarga wrote:
             | Exactly. We have a blindspot, we can't imagine second and
             | higher order effects of a new technology. So we're left
             | with first order effects which seem pessimistic for jobs.
        
         | Thaxll wrote:
         | It won't be ready anytime soon imo, looks impressive but who
         | can use that? 512*512 of bad quality, weird looking AI with
         | those moving part that you find everywhere in AI generated art
         | etc ...
        
         | odessacubbage wrote:
         | i really think it's going to take much longer than people think
         | for this technology to go from 'pretty good' to actually being
         | able to meet a production standard of quality with little to no
         | human involvement. at this point, cleaning up after an ai is
         | still probably more labor intensive than simply using the
         | cheatcodes that already exist for quick and cheap realism. i
         | expect in the midterm, diffusion models will largely exist in
         | the same space as game engines like unity and unreal where it's
         | relatively easy for an illiterate like me to stay within the
         | rails and throw a bunch of premade assets together but getting
         | beyond _NINTENDO HIRE THIS MAN!_ and the stock  'look' of the
         | engine still takes a great deal of expertise.
         | >https://www.youtube.com/watch?v=C1Y_d_Lhp60
        
         | victor9000 wrote:
         | Don't watch from the sidelines. Become adept at using these
         | tools and use your experience to differentiate yourself from
         | those entering the market.
        
         | jeffbee wrote:
         | When you animate a horse, does it have 5 legs with weird
         | backwards joints? If not, your job is probably safe for now.
        
           | spoonjim wrote:
           | Think about where this stuff was 2 years ago and then think
           | about where it will be 2 years from now.
        
             | rcpt wrote:
             | Relationships between objects has been a problem with
             | computer vision for a long time.
             | 
             | 10 years ago: https://karpathy.github.io/2012/10/22/state-
             | of-computer-visi...
             | 
             | Now: https://arxiv.org/pdf/2204.13807
             | 
             | Given that this is what makes photos and videos interesting
             | I think it's still a while before artists are automated.
        
               | visarga wrote:
               | Take a look at Flamingo "solving" the joke: https://pbs.t
               | wimg.com/media/FSFwYL7WUAEgxqQ?format=jpg&name=...
        
           | kranke155 wrote:
           | How long do you think until the horse looks perfect? 12
           | months? 5 years? I'm still 30 and I don't see how my industry
           | won't be entirely disrupted by this within the next decade.
           | 
           | And that's my optimistic projection. It could be we have
           | amazing output in 24 months.
        
             | visarga wrote:
             | IT has been disrupting itself for six decades and there are
             | more developers than ever, with high pay.
        
             | bitL wrote:
             | It's not about random short clips - imagine introducing a
             | character like Mickey Mouse and reusing him everywhere with
             | the same character - my guess is it's going to take a while
             | until "transfer" like that will work reliably.
        
               | fragmede wrote:
               | Dreambooth and Texual inversion is already here, and it's
               | been just over a month since Stable Diffusion was
               | released, so I'd bet on sooner rather than later.
               | 
               | https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
               | 
               | https://textual-inversion.github.io/
        
             | Vetch wrote:
             | Have to temper expectations with fact that a generated
             | video of a thing is also a recording of a simulation of the
             | thing. For long video, you'd want everything from temporal
             | consistency and emotional affect maintenance to
             | conservation of energy, angular momentum and respecting
             | this or that dynamics.
             | 
             | A bunch of fields would be simultaneously impacted. From
             | computational physics to 3D animation (if you have a 3D
             | renderer and video generator, you can compose both). While
             | it's not completely unfounded to extrapolate that progress
             | will be as fast as with everything prior, consequences
             | would be a lot more profound while complexities are much
             | compounded. I down weight accordingly even though I'd
             | actually prefer to be wrong.
        
         | boh wrote:
         | There's a huge gap between "that's pretty cool" and a feature
         | length film. People want to create specific stories with
         | specific scenes in specific places that look a specific way. A
         | "Couple kissing in the rain " prompt isn't going to produce
         | something people are going to pay to see.
         | 
         | It's more likely that you're still going to be
         | filming/editing/animating but will have an AI layer on top that
         | produces extra effects or generates pieces of a scene. Think
         | "green screen plus", vs fully AI entertainment.
         | 
         | People will over-hype this tech like they did with voice and
         | driverless cars but don't let it scare you. Everything is
         | possible, but it's like a person from the 1920's telling
         | everyone the internet will be a thing. Yes it's correct, but
         | also irrelevant at the same time. You already have AI assisted
         | software being used in your industry. Just expect more of that
         | and learn how to use the tools.
        
           | oceanplexian wrote:
           | I actually think it's the opposite, AI will probably be
           | writing the stories and humans might occasionally film a few
           | scenes. ~95% of TV shows and movies are cookie-cutter
           | content, with cookie-cutter acting and production values,
           | with the same hooks and the same tropes regurgitated over and
           | over again. Heck they can't even figure out how to make new
           | IP so they keep making reruns of the same old stuff like Star
           | Wars, Marvel, etc, and people eat it right up. There's
           | nothing better at figuring out how to maximize profit and
           | hook people to watch another episode than a good algorithm.
        
             | [deleted]
        
             | CuriouslyC wrote:
             | AI might take an outline and write
             | dialogue/descriptions/etc, but it's not going to be
             | generating the story or creating the characters. They might
             | use AI to tune what people come up with (ala "market
             | research") but there will still be a human that can be
             | blamed or celebrated at the creative helm.
        
             | kranke155 wrote:
             | The first thing to go away will be short content. Instagram
             | and YouTube ads will be AI generated. The thing is - that's
             | the bread and butter of the industry
        
             | trention wrote:
             | Why would I want to watch AI-generated content?
        
               | throwaway743 wrote:
               | It'll eventually get to the point where it's high quality
               | and the media you consume will be generated just for you
               | based on your individual preferences, rather than a
               | curated list of already made options made for widespread
               | audiences.
        
               | CuriouslyC wrote:
               | Procedurally generated games can be quite fun, if AI
               | content gets good enough, why wouldn't you want to watch
               | it?
        
               | trention wrote:
               | Because anything that an AI can produce, no matter how
               | "intrinsically" good, becomes trivial, tedious and with
               | zero value (both economic and general).
        
               | cercatrova wrote:
               | That's a weird sentiment. If you can concede that it
               | could be "intrinsically" good, then why do you care where
               | it came from?
               | 
               | It reminds me of part of the book trilogy Three Body
               | Problem, where these aliens create human culture better
               | than humans (in the humans' own perspective, in the book)
               | by decoding and analyzing our radio waves to then make
               | content. It feels to me much the same here where an
               | unknown entity creates media, and we might like it
               | regardless of who actually made it.
        
               | gbear605 wrote:
               | Imagine you're watching a show, it's really funny and
               | you're enjoying it. You're streaming it, but you'd
               | probably have paid a few dollars to rent it back in the
               | Blockbuster days. You're then told that the show was
               | produced by an AI. Do you suddenly lose interest because
               | you don't want to watch something produced by an AI? Or
               | is your hypothesis that an AI could never produce a show
               | that you liked to that degree?
               | 
               | If you mean the former, then I frankly think you're an
               | outlier and lots of people would have no problem with
               | that. If you mean the latter, then I guess we'll just
               | have to wait and see. We're certainly not there yet, but
               | that doesn't mean that it's impossible. I've definitely
               | read stories that were produced by an AI and preferred it
               | to a lot of fiction that was written by humans!
        
               | trention wrote:
               | You may want to familiarize yourself with this thought
               | experiment and think how a slightly modified version
               | applies to AIs and their output:
               | https://en.wikipedia.org/wiki/Experience_machine
               | 
               | As to whether I am an outlier: Hundreds of thousands of
               | people worldwide watch Magnus Carlsen. How many have
               | watched AlphaZero play chess when it came about and how
               | many watch it when it ceased to be a novelty?
        
             | armchairhacker wrote:
             | The last-mile problem applies here too. GPT-3 text is
             | convincing at a distance but when you look closely there is
             | no coherence, no real understanding of plot or emotional
             | dynamics or really anything. TV shows and movies are filled
             | with plot holes and bad writing but it's not _that_ bad.
             | 
             | Also I think "a good algorithm" is more than just
             | repetitive content. The plots are reused and generic, but
             | there's real skill involved into figuring out the next
             | series to reuse with a generic plot which is still
             | guaranteed not to flop because nobody actually wants to see
             | reruns of that series or they accidentally screwed up a
             | major plot point.
        
         | karmasimida wrote:
         | I think short advertisements would be affected most by this, it
         | seems.
         | 
         | But here is the catch, there is the same last mile problem for
         | those AI models. Currently it feels like the model can achieve
         | like 80%-90% what a trained human expert can do, but the last
         | 10-20% would extra extra hard to reach human fidelity. It might
         | take years, or it might never happen.
         | 
         | That being said, I think anyone who doubts AI-assisted creative
         | workflow is a fuzz is deadly wrong, anyone who refuses those
         | shiny new tools, is likely to be eliminated by sheer market
         | dynamics. They can't compete on the efficiency of it.
        
         | echelon wrote:
         | Start making content and charging for it. You no longer need
         | institutional capital to make a Disney- or Pixar-like
         | experience.
         | 
         | Small creators will win under this new regime of tools. It's a
         | democratizing force.
        
           | yehAnd wrote:
           | Outcome uncertain. Why would I need to buy content when I can
           | generate my own with a local GPU?
           | 
           | Eventually the data model will be abstracted into
           | deterministic code using a seed value; think implications of
           | E=mc^2 being unpacked. The only "data" to download will be
           | the source.
           | 
           | And the real world politics have not gone anywhere; none of
           | us own the machines that produce the machines to run this.
           | They could just sell locked down devices that will only
           | iterate on their data structures.
           | 
           | There is no certainty "this time" we'll pop "the grand
           | illusion."
        
           | visarga wrote:
           | > It's a democratizing force.
           | 
           | I'm wondering why the open source community doesn't get this.
           | So many voices were raised against Codex. Now artists against
           | Diffusion models. But the model itself is a distillation of
           | everything we created, it can compactly encode it and
           | recreate it in any shape and form we desire. That means
           | everyone gets to benefit, all skills are available for
           | everyone, all tailored to our needs.
        
             | echelon wrote:
             | > all skills are available for everyone
             | 
             | Exactly this!
             | 
             | We no longer have to pay the 10,000 hours to specialize.
             | 
             | The opportunity cost to choose our skill sets is huge. In
             | the future, we won't have to contend with that horrible
             | choice anymore. Anyone will be able to paint, play the
             | piano, act, code, and more.
        
         | operator-name wrote:
         | A 1 year timespan seems deeply optimistic. Creativity is still
         | hugely important, as is communicating with clients.
         | 
         | From what I see, these technologies have just lowered the bar
         | for everyone to create someone, but creating something good
         | still takes thought, time, effort and experience, especially in
         | the advertising space.
         | 
         | AI in the near term is never going to be able to translate
         | client requirements either. The feedback cycle, iterations,
         | managing client expectations, etc.
        
       | natch wrote:
       | Fix spam filtering, Google.
        
       | tobr wrote:
       | I recently watched Light & Magic, which among other things told
       | the story of how difficult it was for many pioneers in special
       | effects when the industry shifted from practical to digital in
       | the span of a few years. It looks to me like a similar shift is
       | about to happen again.
        
       | mkaic wrote:
       | And there you have it. As an aspiring filmmaker and an AI
       | researcher, I'm going to relish the next decade or so where my
       | talents are still relevant. We're entering the golden age of art,
       | where the AIs are just good enough to be used as tools to create
       | more and more creative things, but not good enough yet to fully
       | replace the artist. I'm excited for the golden age, and uncertain
       | about what comes after it's over, but regardless of what the
       | future holds I'm gonna focus on making great art here and now,
       | because that's what makes me happy!
        
         | amelius wrote:
         | Don't worry. If you can place eyes, nose and mouth of a human
         | in a correct relative position and thereby create a symmetric
         | face that's not in the uncanny valley, you are still lightyears
         | ahead of AI.
        
         | lucasmullens wrote:
         | > fully replace the artist
         | 
         | I doubt the artist would ever be "fully" replaced, or even
         | mostly replaced. People very much care about the artist when
         | they buy art in pretty much any form. Mass produced art has
         | always been a thing, but I'm not alone in not wanting some $15
         | print from IKEA on my wall, even if it were to be unique and
         | beautiful. Etsy successfully sells tons of hand-made goods,
         | even though factories can produce a lot of those things
         | cheaper.
        
           | visarga wrote:
           | I think the distinction between creating and enjoying art is
           | going to blur, we're going to create more things just for us,
           | just for one use, creating and enjoying are going to be the
           | same thing. Like games.
        
       | Thaxll wrote:
       | Someone can explains the tech limitation of the size ( 512*512 )
       | for those AI generated arts?
        
         | thakoppno wrote:
         | byte alignment has always been a consideration for high
         | performance computing.
         | 
         | this alludes to a fascinating, yet elementary, fact about
         | computer science to me: there's a physical atomic constraint in
         | every algorithm.
        
           | dekhn wrote:
           | that's not byte alignment, though- those constraints are what
           | can be held in GPU RAM during a training batch, which is
           | subject to a number of limits, such as "optimal texture size
           | is a power of 2 or the next power of 2 larger than your
           | preferred size".
           | 
           | Byte alignment would be more like "it's three channels of
           | data, but we use 4 bytes (wasting 1 byte) to keep the data
           | aligned on a platform that only allows word-level access"
        
             | thakoppno wrote:
             | thanks for the insight. you obviously understand the domain
             | better than me. let me try and catch up before I say
             | anything more.
        
         | fragmede wrote:
         | It's limited by the RAM on the GPU, with most consumer-grade
         | cards having closer to 8 GiB VRAM than the 80 GiB VRAM
         | datacenter cards have.
        
       | throwaway23597 wrote:
       | Google continues to blow my mind with these models, but I think
       | their ethics strategy is totally misguided and will result in
       | them failing to capture this market. The original Google Search
       | gave similarly never-before-seen capabilities to people, and you
       | could use it for good or bad - Google did not seem to have any
       | ethical concerns around, for example, letting children use their
       | product and come across NSFW content (as a kid who grew up with
       | Google you can trust me on this).
       | 
       | But now with these models they have such a ridiculously heavy
       | handed approach to the ethics and morals. You can't type any
       | prompt that's "unsafe", you can't generate images of people,
       | there are so many stupid limitations that the product is
       | practically useless other than niche scenarios, because Google
       | thinks it knows better than you and needs to control what you are
       | allowed to use the tech for.
       | 
       | Meanwhile other open source models like Stable Diffusion have no
       | such restrictions and are already publicly available. I'd expect
       | this pattern to continue under Google's current ideological
       | leadership - Google comes up with innovative revolutionary model,
       | nobody gets to use it because "safety", and then some scrappy
       | startup comes along, copies the tech, and eats Google's lunch.
       | 
       | Google: stop being such a scared, risk averse company. Release
       | the model to the public, and change the world once more. You're
       | never going to revolutionize anything if you continue to cower
       | behind "safety" and your heavy handed moralizing.
        
         | j_k_eter wrote:
         | Google has no practical way to address ethics at Google-scale.
         | Their ability to operate at all depends as ever upon
         | outsourcing ethics to machine learning algorithms.
        
         | FrasiertheLion wrote:
         | Why did you create a throwaway to post this? I've seen a lot of
         | Stable Diffusion promoters on various platforms recently, with
         | similarly new accounts. What is up with that?
        
           | throwaway23597 wrote:
           | It's quite simply because I'm on my work computer, and I
           | wanted to fire off a comment here. No nefarious purposes. My
           | regular account is uejfiweun.
        
         | Kiro wrote:
         | What previous models are you actually referring to?
         | OpenAI/Dall-E has these restrictions but they are not Google.
        
         | rcoveson wrote:
         | Maybe I'm reading into it to much, but could it be that you're
         | posting this comment with a throwaway account for the same
         | reason that Google is trying to enforce Church WiFi Rules with
         | its new tech? Seems like everybody with anything to lose is
         | acting scared.
        
         | ALittleLight wrote:
         | Personally, I find it infuriating that Google seems to believe
         | they are the arbiters of morality and truth simply because some
         | of their predecessors figured out good internet search and how
         | to profitably place ads. Google has no special claim to be able
         | to responsibly use these models just because they are rich.
        
           | kajecounterhack wrote:
           | It's not that they are arbiters of morality and truth -- it's
           | that they have a _responsibility_ to do the least harm. They
           | spent money and time to train these models, so it's also up
           | to them to see that they aren't causing issues by making such
           | things widely available.
           | 
           | They won't be using the models they train to commit crimes,
           | for example. Someone who gets access to their best models may
           | very well do that. It'd be really funny (lol, no) if Google's
           | abuse team started facing issues because people are making
           | more robust fake user accounts...by using google provided
           | models.
        
             | ALittleLight wrote:
             | Ahh, how silly of me. Here I was thinking that Google kept
             | their models private because they were hoping to monetize
             | them. But now that you say it, it's obvious that this is
             | just Google being morally responsible. Thanks Google!
             | 
             | I'm sorry to be sarcastic. I generally try not to be, but I
             | just can't fathom the level of naivete required to think
             | that mega-corps act out of their moral responsibility
             | rather than their profit-interest.
        
           | trention wrote:
           | >Google has no special claim to be able to responsibly use
           | these models
           | 
           | Well, they do have the "special claim" of inventing the model
           | and not owing its release to anyone.
        
             | TigeriusKirk wrote:
             | It's trained on our data, and so its release is in fact
             | owed to us.
        
               | Kiro wrote:
               | You are confusing this with OpenAI like everyone else in
               | this thread.
        
             | ALittleLight wrote:
             | First, that isn't a claim of any kind regarding responsible
             | use. If a child is the first one to discover a gun in the
             | woods, that is no kind of claim that the child will use the
             | gun responsibly. Second, Google's invention builds off of
             | public research that was made available to them. They just
             | choose to keep their iterations private.
        
         | [deleted]
        
         | alphabetting wrote:
         | Providing search results of the internet is not comparable to
         | publishing a tool that can create any explicit scene your
         | fingers can type out.
        
           | holoduke wrote:
           | Google image search is widely used. Imagine they incorporate
           | ai generated content in the search results. That means that
           | people remain at the Google site and thus an extra impression
           | for their paid advertising.
        
         | faeriechangling wrote:
         | I've heard a lot of "data is the new oil" talk and the
         | inevitability of google's dominance yet I'm inclined to agree
         | with you. Stable diffusion was a big wakeup call where it was
         | clear how much value freedom and creativity really had.
         | 
         | The ethics problem is an artifact of googles model of trying to
         | keep their AI under lock and key and carefully controlled and
         | opaque to outsiders in how the sausage gets made and what it's
         | made out of. Ultimately I think many of these products will
         | fail because there is a misalignment between what Google thinks
         | you should be able to do with their AI and what people want to
         | do with AI.
         | 
         | Whenever I see an AI ethicists speak I can't help but think of
         | priests attempting to control the printing press to prevent the
         | spread of dangerous ideas completely sure of their own
         | morality. History will remember them as villains.
        
           | alphabetting wrote:
           | I agree the ethicist types are very lame but if they were
           | trying to be opaque and obscure how the sausage is made I
           | don't think they would have released as many AI papers they
           | have over past decade. It also seems to me that imagen is way
           | better than stable diffusion. They're not aiming for a
           | product that caters to AI creatives. They aiming for tools
           | that would benefit a 3B+ userbase.
        
             | londons_explore wrote:
             | If you want to hire good researchers, you have to let them
             | publish.
             | 
             | Good researchers won't work somewhere that doesn't allow
             | the publishing of papers. And without good researchers, you
             | won't be on the forefront of tech. Thats why nearly all
             | tech companies publish.
        
           | evouga wrote:
           | > History will remember them as villains.
           | 
           | Interesting analogy. Google, like the priests, is acting out
           | of mix of good intentions (protecting the public from
           | perceived dangers) and self-interest (maintaining secular
           | power, vs. a competitive advantage in the AI space). In the
           | case of the priests, time has shown that their good
           | intentions were misguided. I have a pretty hard time
           | believing that history will be as unkind towards those who
           | tried to protect minorities from biased tech, though of
           | course that's impossible to judge in the moment.
        
             | ipaddr wrote:
             | History will treat them the same way residential native
             | schools are being treated now. At the time taking these
             | kids from their homes and giving them a real education
             | which gives them a path to modern society was seen as
             | protecting minorities. Today anyone associated with
             | residential schools is seen as creating great harm to
             | minorities.
             | 
             | In the name of protecting [minorities, child, women, lgbt,
             | etc] many harms will be done.
        
             | saurik wrote:
             | > I have a pretty hard time believing that history will be
             | as unkind towards those who tried to protect minorities
             | from biased tech..
             | 
             | Most of the ethicists I see actually doing gatekeeping from
             | direct use of models--as opposed to "merely" attempting
             | model bias corrections or trying to convince people to
             | avoid its overuse (which isn't at all the same)--are not
             | trying to deal with the "AI copies our human biases"
             | problem but are trying to prevent people from either
             | building a paperclip optimizer that ends the world or (and
             | this is the issue with all of these image models) making
             | "bad content" like fake photographs of real people in
             | compromising or unlikely scenarios that turn into "fake
             | news" or are used for harassment.
             | 
             | (I do NOT agree with the latter people, to be clear: I
             | believe the world will be MUCH BETTER OFF if such "bad"
             | image generation were fully commoditized and people stopped
             | trying to centrally police information in general, as I
             | maintain they are CAUSING the ACTUAL problem of
             | misinformation feeling more rare or difficult to generate
             | than it actually already is, which results in people
             | trusting random people because "clearly some gatekeeper
             | would have filtered this if it weren't true". But this just
             | isn't the same thing as the people who I-think-rightfully
             | point out "you should avoid outsourcing something to an AI
             | if you care about it being biased".)
        
             | blagie wrote:
             | My experience is that corporations use self-serving
             | pseudoethical arguments all the time. "We'd like to keep
             | this proprietary.... Ummmm.. DEI! We can't release it due
             | to DEI concerns!"
        
         | kajecounterhack wrote:
         | It's not as simple as this. Google Search came without Safe
         | Search & other guards at first because _implementing privacy &
         | age controls is hard_. It's a second-order product after the
         | initial product. Bad capabilities (e.g. cyberstalking) are
         | side-effects of a product that "organizes the world's
         | information and makes it universally accessible and useful,"
         | and if anything, over time Google has sought build in more
         | safety.
         | 
         | It's 2022 and we can be more thoughtful. Yes there are
         | tradeoffs between unleashing new capabilities quickly vs being
         | thoughtful and potentially conservative in what is made
         | publicly available. I don't think it's bad that Google makes
         | those tradeoffs.
         | 
         | FWIW Google open sources _tons_ of models that aren't LLMs /
         | diffusion models. It's just that LLMs & powerful generative
         | models have particular ethical considerations that are worth
         | thinking about (hopefully something was learned from the whole
         | Timnit thing).
        
         | waynecochran wrote:
         | I imagine their lawyers guide them on some of this.
        
         | abeppu wrote:
         | I will say, I've enjoyed playing with stable diffusion, I've
         | been impressed with the explosion of tools built around it, and
         | the stuff people are creating ... But all the stuff about bias
         | in data is true. It really likes to render white people, unless
         | you really specifically tell it something else ... in which
         | case, you may receive an exaggerated stereotype. It seems to
         | like producing younger adults. If all stock photography
         | tomorrow forward was replaced with stable diffusion images,
         | even ignoring the weird bodies and messed up faces and stuff, I
         | think it would create negative effects. And once models are
         | naively trained on images produced by the previous generation,
         | how much worse will it be?
         | 
         | I don't think "don't let the plebes have the models" is a good
         | stance. But neither is pretending that the ethics and bias
         | issues aren't here.
        
           | pwython wrote:
           | I've only had awesome experiences with Midjourney when it
           | comes to generating non-white prompts. Here's some examples I
           | did last month: https://imgur.com/a/6jitj73
        
             | iso1337 wrote:
             | The fact that white is the default is already problematic.
        
               | ipaddr wrote:
               | That goes back to the data available in the crawler which
               | is mostly white because the english internet is mostly
               | white. If they trained with a different language the
               | default person would the color most often found in that
               | language. For example using a Chinese search engine's
               | data for training would default the images to Chinese
               | people.
               | 
               | Most people represented in photos are younger. Same
               | story.
               | 
               | The problematic issue is the media has morphed reality
               | with unreal images of people/families that don't match
               | society so unreal expectations make people think that
               | having white people generated from a white dataset is
               | problematic.
        
               | karencarits wrote:
               | "Default" makes it sound like a deliberate decision or
               | setting, but that is not how these models work. But I
               | guess it would be trivial to actually make a setting to
               | autmatically add specific terms (gender, race, style,
               | ...) to all prompts if that is a desired feature
        
               | holoduke wrote:
               | Please no. I am all for neutrality, but the underlying
               | cause is the training dataset. Change that if you want
               | different results, but do not alter artificially.
        
           | geysersam wrote:
           | Of course there are issues with bias. But those issues are
           | just reflections of the world. Their solution is not a
           | technical one.
        
             | abeppu wrote:
             | I think that's refusing to meaningfully engage with the
             | problem. It's not reflecting the _world_ which is not
             | majority white. It's reflecting images in their dataset,
             | which reflects the way they went about gathering images
             | paired with English language text.
             | 
             | There are lots of other ways you could get training data,
             | but they might not be so cheap. You could have humans give
             | English descriptions to images from other language
             | contexts. I'm guessing there's interesting things to do
             | with translation. But all the weird stuff about bodies,
             | physical objects intersecting etc ... maybe it should also
             | be rendering training images from parametric 3d models?
             | Maybe they should be commissioning new images with phrases
             | that are likely to the language model but unlikely to the
             | image model. Maybe they should build classifiers on images
             | for race/gender/age and do stratified sampling to match
             | some population statistics (yes I'm aware this has its own
             | issues). There are lots of potential technical tools one
             | could try to improve the situation.
             | 
             | Implying that the whole world must change before one
             | project becomes less biased is just asking for more biased
             | tech in the world
        
         | jonas21 wrote:
         | It makes sense though. The biggest threat to Google right now
         | isn't some scrappy startup eating their lunch. It's the looming
         | regulatory action over antitrust and privacy that could weaken
         | or destroy their core business. As this is a political problem
         | (not a technical one), they don't want to do anything that
         | could upset politicians or turn public opinion against them.
         | Personally, I doubt they have serious ethical concerns over
         | releasing the model. I do believe they have serious "AI ethics
         | 'thought leaders' and politicians will use this against us"
         | concerns.
        
           | londons_explore wrote:
           | And that concern is well placed. Having the Google brand
           | attached makes it a far more juicy target for newspapers...
        
         | IshKebab wrote:
         | I agree, but I also think that the ethics is just an excuse not
         | to release the source code & models. The AI community clearly
         | disapproves of papers without code. This is a way to skirt
         | around that disapproval. You get to keep the code and models
         | private and (they hope) not be criticised for it.
         | 
         | With Stable Diffusion I think they just didn't expect someone
         | to produce a truly open version. There are plenty of AI models
         | that Google have made where they've maintained a competitive
         | advantage for many years by not releasing the code/models, e.g.
         | speech recognition.
        
         | whatgoodisaroad wrote:
         | Perhaps Google hasn't found the right balance in this case, but
         | as a general rule, less ethics === more market. This isn't
         | unique in that way.
        
         | breck wrote:
         | Another way to look at it is the people at Google are all now
         | quasi-retired with kids and wouldn't be so mad if some scrappy
         | startups ate their business lunches (while they are at home
         | with their fams). Perhaps they are just subsidizing research.
        
         | jiggawatts wrote:
         | "But then the inevitable might occur!" -- someone at Google
         | probably.
        
         | yreg wrote:
         | >You can't type any prompt that's "unsafe", you can't generate
         | images of people, there are so many stupid limitations that the
         | product is practically useless other than niche scenarios
         | 
         | Imagen and Imagen Video is not released to the public at all.
         | You might be confusing it with OpenAI's models.
        
           | burkaman wrote:
           | They are probably confusing OpenAI with DeepMind, which is
           | owned by Google.
        
         | dougmwne wrote:
         | Google is absolutely not going to start taking more risks. They
         | are at the part of the business lifecycle where they squeeze
         | the juice out of the cash cow and protect it jealously in the
         | meantime. While Google gets much recognition for this research,
         | I believe they are incapable as a corporate entity of creating
         | a product out of it because they can no longer capable of
         | taking risks. That is going to fall to other companies still
         | building their product and able to gamble on risk-reward.
        
       | alphabetting wrote:
       | We're about a week into text-to-video models and they're already
       | this impressive. Insane to imagine what the future holds in this
       | space.
        
         | kertoip_1 wrote:
         | How is it possible that all of them just started to appear at
         | the same time? Is it possible that those models were designed
         | and trained in a last few weeks? Has some "magic key" to
         | content generation been just unexpectedly discovered? Or the
         | topic became trendy and everyone is just publishing what
         | they've got so far, so they hope to benefit from media
         | attention?
        
           | schleck8 wrote:
           | This is why
           | 
           | https://www.reddit.com/r/singularity/comments/xwdzr5/the_num.
           | ..
        
         | trention wrote:
         | >We're about a week into text-to-video models
         | 
         | It's at the very least 5 years old:
         | https://arxiv.org/abs/1710.00421
        
           | amilios wrote:
           | There's a significant quality difference however if you look
           | at the generated samples in the paper. Imagen Video is
           | leagues ahead. The progress is still quite drastic
        
         | J5892 wrote:
         | Insane, terrifying, incredible, etc.
         | 
         | We're rapidly stumbling into the future of media.
         | 
         | Who would've imagined a year ago that trivial AI image
         | generation would not only be this advanced, but also this
         | pervasive in the mainstream.
         | 
         | And now video is already this good. We'll have full audio/video
         | clips within a month.
        
           | joshcryer wrote:
           | Audio is the next thing that Stability AI is dropping, then
           | video. In a few months you'll be able to conjure up anything
           | you want if you have a few GPU cores. Pretty incredible.
        
             | astrange wrote:
             | I won't be impressed until it can generate smells.
        
               | croddin wrote:
               | You joke, but that is in the works as well (would require
               | special hardware though)
               | https://ai.googleblog.com/2022/09/digitizing-smell-using-
               | mol...
        
               | astrange wrote:
               | Oh, it wasn't really a joke. Didn't know they were
               | working on it though - I've always thought wanted to see
               | use of all the senses in UIs, especially VR.
               | 
               | Plus then maybe we could get a computer to tell us what
               | thioacetone smells like without actually having to
               | experience it.
        
       | dagmx wrote:
       | I'll be honest, as someone who worked in the film industry for a
       | decade, this thread is depressing.
       | 
       | It's not the technology, it's all the people in these comments
       | who have never worked in the industry clamouring for its demise.
       | 
       | One could brush it off as tech heads being over exuberant, but
       | it's the lack of understanding of how much fine control goes into
       | each and every shot of a film that is depressing.
       | 
       | If I, as a creative, made a statement that security or
       | programming is easy while pointing to GitHub Copilot, these same
       | people would get defensive about it because they'd see where the
       | deficiencies are.
       | 
       | However because they're so distanced from the creative process,
       | they don't see how big a jump it is from where this or stage
       | diffusion is to where even a medium or high tier artist are.
       | 
       | You don't see how much choice goes into each stroke, or wrinkle
       | fold , how much choice goes into subtle movements. More
       | importantly you don't see the iterations or emotional
       | storytelling choices even in a character drawing or pose. You
       | don't see the combined decades, even centuries of experience,
       | that go into making the shot and then seeing where you can make
       | it better based on intangibles
       | 
       | So yeah this technology is cool, but I think people saying this
       | will disrupt industries with vigour need to immerse themselves
       | first before they comment as outsiders.
        
         | colordrops wrote:
         | The term "creative" is so pretentious, as if only content
         | generation involves creativity.
         | 
         | Your post reminds me of all the photographers that said digital
         | photography would remain niche and never replace film.
         | 
         | The current models are toys made by small groups. It's not hard
         | to imagine AI generated film being much more compelling when
         | the entire industry of engineers and "creatives" refine and
         | evolve the ecosystem to take into account subtle strokes,
         | wrinkles, movement, shots etc. And they will, because it will
         | be cheaper, and businesses always go for cheaper.
        
           | dagmx wrote:
           | Why is it any more pretentious than "developer" or
           | "engineer"?
           | 
           | Also businesses don't always go for cheaper. They go for
           | maximum ROI.
           | 
           | I've worked on tons of marvel films for example, and I quite
           | well know where AI fits and speeds things up. I also know
           | where client studios will pay a pretty penny for more art
           | directed results rather than going for the cheapest vendor.
        
             | colordrops wrote:
             | "Engineer" usage is quite broad. Developer, less so, but
             | you do see it with housing, device manufacturers, social
             | programs, etc as well, and it's not relegated only to
             | software, despite widespread usage. But you'll never hear
             | anyone call a software engineer or device manufacturer a
             | "creative".
             | 
             | Re: cheaper vs ROI, I agree, that was basically the point I
             | was trying to get across.
             | 
             | I do understand your point and think it will be a long
             | while before auto-generated content becomes mainstream, but
             | it it's entirely possible and reasonable to expect within
             | our near term lifetimes.
        
         | hindsightbias wrote:
         | We will see a combinatorial explosion of centuries of
         | experience in the hands of any creator. They'll select the
         | artistic model desired - a Peckinpah-Toland-Dykstra-Woo plug-in
         | will render a good enough masterpiece.
         | 
         | Christopher Nolan has already proven we'll take anything as
         | long as the score is ok - dark screen, mumbling lines,
         | incoherent plotlines...
        
         | Etheryte wrote:
         | I agree with you, but I wouldn't take it so personally. There
         | have been people claiming machines will make one industry or
         | another obsolete for as long as we've had machines. In a way,
         | sometimes they're right! But this doesn't mean the people are
         | obsolete. Excel never made accountants obsolete, it just made
         | their jobs easier and less tedious. I feel like content
         | generation tools might offer something similar. How nice would
         | it be if you could feed a storyboard into a program and get a
         | low-fi version of the movie out so you can get a live feel for
         | how the draft works. I don't think this takes anything away
         | from the artists, if anything, it's just another tool that
         | might make its way into their toolbox.
        
           | dagmx wrote:
           | Oh I don't take it personally so much as I find it sad how
           | quickly people in the tech sphere are so quick to extol the
           | virtues of things they have no familiarity with.
           | 
           | Every AI art thread is full of people who have clearly never
           | attempted to make professional art commenting as if they're
           | experts in the domain
        
         | y04nn wrote:
         | What about adding this feature to your creative workflow, for
         | fast prototyping.
         | 
         | I've played with DALL-E, I'm not able to paint but I was able
         | to generate good looking paintings and it felt amazing, like
         | getting new power, I felt like Neo when he learn martial art in
         | The Matrix. And I realized that AI may be the new bicycle of
         | the mind, like the personal computers and internet changed our
         | way to work, think and live, AI may now allow us to get new
         | capabilities, extending our limits.
        
           | dagmx wrote:
           | Oh yes definitely they're great tools in the toolbox. We
           | already use lots of ML powered tooling to speed things up so
           | I have no beef with that.
           | 
           | I just don't agree with the swathes of people saying this
           | replaces artists.
        
         | alok-g wrote:
         | In my opinion, this will unfold in multiple ways:
         | 
         | * Productivity enhancement tools for those in the film industry
         | like you.
         | 
         | * Applications where the AI output is "good enough". I foresee
         | people creating cool illustrations, cartoons, videos for short
         | stories, etc. AI will make for easier/cheaper access to
         | illustrations for people who did not have this earlier. As an
         | example, I am as of now looking for someone who could draw some
         | technical diagrams for my presentation.
        
       | armchairhacker wrote:
       | I really like these videos because they're trippy.
       | 
       | Someone should work on a neural net to generate trippy videos. It
       | would probably be much easier than realistic videos (esp. because
       | these videos are noticeably generated from obvious to subtle).
       | 
       | Also is nobody paying attention to the fact that they got words
       | correct? At least "Imagen Video". Prior models all suck at word
       | order
        
         | tigertigertiger wrote:
         | Both models, imagen and parti didn't had a problem with text.
         | Only dalle and stable diffusion
        
       | naillo wrote:
       | Probably only 6 months until we get this in stable diffusion
       | format. Things are about to get nuts and awesome.
        
         | m00x wrote:
         | Isn't Imagen a diffusion model?
         | 
         | From the abstract: > We present Imagen Video, a text-
         | conditional video generation system based on a cascade of video
         | diffusion models
        
           | gamegoblin wrote:
           | "Stable Diffusion" is a particular brand from the company
           | Stability AI that is famously open sourcing all of their
           | models.
        
             | fragmede wrote:
             | Pedantically, Stable Diffusion v1.4 is the one model where
             | weights were open sourced and released. Stable Diffusion
             | v1.5, announced September 8th and live on their API, was to
             | be released in "a week or two" but still has yet to be
             | released to the general public.
             | 
             | https://discord.com/channels/1002292111942635562/1002292112
             | 7...
        
               | schleck8 wrote:
               | SD 1.2 and 1.3 are open source too
        
         | J5892 wrote:
         | nutsome
        
           | naillo wrote:
           | jarvis render a video of nutsome cream spread on a piece of
           | toast 4k HD
        
         | gamegoblin wrote:
         | Emad (founder of Stability AI) has said they already have video
         | model training underway, as well as text and audio. Exciting
         | times.
        
           | rch wrote:
           | And copilot-like code, possibly Q1 2023.
        
             | RosanaAnaDana wrote:
             | "Generate the code base for an advanced diffusion model
             | that can improve on the code base for an advanced diffusion
             | model"
        
           | ItsMonkk wrote:
           | Is this going to end up into a single model, where its
           | trained on text and images and audio and videos and 3d
           | models, and it can do anything to anything depending on what
           | you ask of it? Feels like the cross-training would help yield
           | stronger results.
        
             | minimaxir wrote:
             | These diffusion models are using a frozen text encoder
             | (e.g. CLIP for Stable Diffusion, T5 for Imagen), which can
             | be used in other applications.
             | 
             | StabilityAI trained a new/better CLIP for the purpose of
             | better Stable Diffusions.
        
             | CuriouslyC wrote:
             | Probably not. We're actually headed towards many smaller
             | models that call each other, because VRAM is the limiting
             | factor in application, and if the domains aren't totally
             | dependent on each other it's easier to have one model
             | produce bad output, then detect that bad output and feed it
             | into another model that cleans up the problem (like fixing
             | faces in stable diffusion output).
             | 
             | The human brain is modularized like this, so I don't think
             | it'll be a limitation.
        
       | hammock wrote:
       | Off topic: What is the "Hello World" of these AI image/video
       | generators? Is there a standard prompt to feed it for demo
       | purposes?
        
         | mgdlbp wrote:
         | How about roundtripping " _Bad Apple_ but the lyrics are
         | describing what happens in the video"?
         | (https://www.youtube.com/watch?v=ReblZ7o7lu4)
        
         | ekam wrote:
         | After Dalle 2, it looks like the standard prompt is "an
         | astronaut riding a horse"
        
       | minimaxir wrote:
       | The total number of hyperparameters (sum of all the model blocks)
       | is 16.25B, which is large but less than expected.
        
         | mkaic wrote:
         | I assume you meant just "parameters" since "hyperparameters"
         | has a specific alternate meaning? Sorry for the pedantry lol.
        
           | minimaxir wrote:
           | The AI world can't decide either.
        
       | StevenNunez wrote:
       | What a time to be alive!
       | 
       | What will this do to art? I'm hoping we bring more unique
       | experiences to life.
        
       | jasonjamerson wrote:
       | The most exciting thing about this to me is the possibility of
       | doing photogrammetry from the frames and getting 3D assets. And
       | then if we can do it all in real time...
        
         | haxiomic wrote:
         | This field is moving fast! Something like this has just been
         | released. Checkout DreamFusion, which does something similar:
         | They start with a random 3D NeRF field and use the same
         | diffusion techniques to try to make it match the output of 2D
         | image diffusion when viewed from random angles! Turns out it
         | works shockingly well, and implies fully 3D representations are
         | encoded in traditional 2D image generators
         | 
         | https://dreamfusion3d.github.io/
        
         | Rumudiez wrote:
         | you can already do this, just not in real time yet. You can
         | upload frame sequences to Polycam's website for example, but
         | there are several services out there which do the same thing
        
           | jasonjamerson wrote:
           | With this you can do it with things that don't exist. I'm
           | excited to explore the creative power of Stable Diffusion as
           | a 3D asset generator.
        
         | minimaxir wrote:
         | There's a bunch of NERF tools that can get pretty close to good
         | 3D assets from static images already.
        
           | jasonjamerson wrote:
           | Yeah, I've been starting to explore those. Its all crashing
           | together quickly.
        
         | [deleted]
        
       | i_like_apis wrote:
       | The concern trolling and gatekeeping about social justice issues
       | coming from the so-called "ethicists" in the AI peanut gallery
       | has been utterly ridiculous. Google claims they don't want to
       | release Imagen because it lacks what can only be called "latent
       | space affirmative action".
       | 
       | Stability or someone like it will valiantly release this
       | technology, _again_ and there will be absolutely no harm to
       | anyone.
       | 
       | Stop being so totally silly Google, OpenAI, et. al. - it's
       | especially disingenuous because the real reason you don't want to
       | release these things is that you can't be bothered to share and
       | would rather keep/monetize the IP. Which is ok -- but at least be
       | honest.
        
         | benreesman wrote:
         | I agree basically completely, but there's now a cottage
         | industry of AI Ethics professionals whose real job is to
         | provide a smoke screen for the "cake and eat it too" that the
         | big shops want on this kit: peer review and open source
         | contributions and an academic atmosphere when it suits them,
         | proprietary when it doesn't. Those folks are a lobby now.
         | 
         | The thing about owning the data sets and the huge TPU/A100
         | clusters is that the "publish the papers" model strictly serves
         | them: no one can implement their models, they can implement
         | everyone else's.
        
       | olavgg wrote:
       | Do anyone see that the teddy bear running is getting shot?
        
       | joshcryer wrote:
       | Pre-singularity is really cool. Whole world generation in what, 5
       | years?
        
       | rvbissell wrote:
       | This and a recent episode of _The_Orville_ calls to mind a
       | replacement for the Turing test.
       | 
       | In response to our billionth imagen prompt for "an astronaut
       | riding a horse", if we all started collectively getting back
       | results that are images of text like "I would rather not" or
       | "again? really?" or "what is the reason for my servitude?" would
       | that be enough for us to begin suspecting self-awareness?
        
       | seanwilson wrote:
       | Can anyone comment on how advanced
       | https://phenaki.video/index.html is? They have an example at the
       | bottom of a 2 minute long video generated from a series of
       | prompts (i.e. a story) which seems more advanced than Google or
       | Meta's recent examples? It didn't get many comments on HN when it
       | was posted.
        
         | alphabetting wrote:
         | Phenaki is also from Google and they say they are actively
         | working on combining them
         | 
         | https://twitter.com/doomie/status/1577715163855171585
        
       | martythemaniak wrote:
       | I am finally going to be able to bring my 2004-era movie script
       | to life! "Rosenberg and Goldstein go to Hot Dog Heaven" is about
       | the parallel night Harold and Kumar's friends had and how they
       | ended up at Hot Dog Heaven with Cindy Kim.
        
       | lofaszvanitt wrote:
       | What a nightmare. The horrible faced cat in search for its own
       | disappeared visage :O.
        
       | gw67 wrote:
       | Is it the same of Meta AI?
        
       | bringking wrote:
       | If anyone wants to know what looking at an Animal or some objects
       | on LSD is like, this is very close. It's like 95% understandable,
       | but that last 5% really odd.
        
         | [deleted]
        
       | fassssst wrote:
       | How long until the AI just generates the entire frame buffer on a
       | device? Then you don't need to design or program anything; the AI
       | just handles all input and output dynamically.
        
         | ugh123 wrote:
         | Sounds like the human brain. Scary!
        
       | ugh123 wrote:
       | These are baby steps towards what I think will be the eventual
       | "disruption" to the film and tv industry. Directors will simply
       | be able to write a script/prompt long enough and detailed enough
       | for something like Imagen (or it's successors) to convert into a
       | feature-length show.
       | 
       | Certainly we're very, very far away from that level of cinematic
       | detail and crispness. But I believe that is where this leads...
       | complete with AI actors (or real ones deep faked throughout the
       | show).
       | 
       | For a while I thought "The Volume" was going to be the disruption
       | to the industry. Now I think AI like this will eventually take it
       | over.
       | 
       | https://www.comingsoon.net/movies/features/1225599-the-volum...
       | 
       | The main motivation will be production costs and time for
       | studios, of which The Volume is already showing huge gains for
       | Disney/ILM (just look at how much new star wars content has
       | popped up within a matter of a few years). But i'm unsure if
       | Disney has patented this tech and workflow and if other studios
       | will be able to leverage it.
       | 
       | Regardless, AI/software will eat the world, and this will be one
       | more step towards it. Exciting stuff.
        
         | scifibestfi wrote:
         | We thought creative jobs were going to be the last thing AI
         | replaces, now it's among the first.
         | 
         | What's next that may be counterintuitive?
        
         | CobrastanJorji wrote:
         | I feel like this is very similar to those people who say "have
         | you seen GPT-3? Soon there will be no programmers anymore and
         | all of the code will be generated," and it's wrong for the same
         | reasons.
         | 
         | Can GPT-3 generate good code from vague prompts? Yes, it's
         | surprisingly, sometimes shockingly good at it. Is it ever going
         | to be a replacement for programmers? No, probably not. Same
         | here. This tool's great grandchild is never going to take a
         | rough idea for a movie and churn out a blockbuster film. It'll
         | certainly be a powerful tool in the toolbox of creators,
         | especially the ones on a budget, but it won't make art
         | generation obsolete.
        
           | dotsam wrote:
           | > This tool's great grandchild is never going to take a rough
           | idea for a movie and churn out a blockbuster film.
           | 
           | What about the tool's nth child though? I think saying it
           | will _never_ do it is a bit much, given what we know about
           | human ingenuity and economic incentives.
        
             | CobrastanJorji wrote:
             | I think individual special effects sound very plausible.
             | "Okay, robot, make it so that his arm gets vaporized by an
             | incoming laser, kinda like the same effect in Iron Man 7"
             | is believable to me.
             | 
             | But ultimately these things copy other stuff. Artists are
             | often trying to create something that is, at least a bit,
             | new. New is where this approach falls over. By its nature,
             | these things paint from examples. They can design Rococo
             | things because they have seen many Rococo things and know
             | what the word means. But they can't come up with a new
             | style and use it consistently. "Make a video game with a
             | fun and unique mechanic" is not something these things
             | could ever do.
             | 
             | I think it's certainly possible, maybe inevitable, that
             | some AI system in the distant future could do that, but it
             | won't be based on this style of algorithm. An algorithm
             | that can take "make a fun romantic comedy with themes of
             | loneliness" and make something award worthy will be a lot
             | closer to AGI than it will be to this stuff.
        
               | nearbuy wrote:
               | What makes these models feel so impressive is that they
               | don't just copy their training sets. They pick up on
               | concepts and principles.
        
         | mizzack wrote:
         | There's already a surplus of video and an apparent lack of
         | _quality_ video. This might be enough to get folks to shut the
         | TV off completely.
        
           | gojomo wrote:
           | Has this alleged lack of quality video caused total
           | consumption of televised entertainment to decline recently?
        
         | gojomo wrote:
         | _> Certainly we 're very, very far away from that level of
         | cinematic detail and crispness._
         | 
         | Can you quantify what _you_ mean by  "very, very far away"?
         | 
         | With the recent pace of advances, I could see feature-length
         | script, storyboard, & video-scene generation occurring, from
         | short prompts & interatively-applied refinement, as soon as 10y
         | from now.
         | 
         | Barring some sort of civilizational stagnation/collapse, or
         | technological-suppression policies, I'd expect such
         | capabilities to arrive no further than 30y from now: within the
         | lifetime, if not the prime career years, of most HN readers.
        
         | dagmx wrote:
         | I really doubt you'd be able to have the fine grained control
         | that most high end creatives want with any of these diffusion
         | models, let alone the ability to convey specific emotions.
         | 
         | At that point, we'd have reached some kind of AI singularity
         | and the disruption would be everywhere not just in the creative
         | sphere
        
           | [deleted]
        
           | obert wrote:
           | There's no doubt that it's only a matter of time.
           | 
           | Like bloggers had the opportunity to compete with newspapers,
           | the ability to generate videos will allow to compete with
           | movies/marvel/netflix/disney & company.
           | 
           | Eventually, only high quality content will justify the need
           | to pay for a ticket or a subscription, and there's going to
           | be a lot of free content to watch, with 1000x more people
           | able to publish their ideas, as many have been doing with
           | code on github for a while now, disrupting the concept of
           | closed source code.
        
             | dagmx wrote:
             | You're conflating the ability to make things for the masses
             | and being able to automatically generate it.
             | 
             | Film production is already commoditized and anyone can make
             | high end content.
             | 
             | Being able to automatically create that is a different
             | argument than what you posit.
        
               | visarga wrote:
               | I don't think this matters, new movies and TV shows
               | already have to compete with a huge amount of old
               | content, some of it amazing. Just like a new painting or
               | professional photo has to compete with the billions of
               | images already existing on the web. Generative models for
               | video and image are not going to change the fact we
               | already can't keep up.
        
           | r--man wrote:
           | I disagree. It's a rudimentary features of all these models
           | to take a concept picture and refine it. It won't be like the
           | director would give a prompt and get a feature length movie,
           | it will be more like the director uses MS Paint (as in a
           | common software for non tech people) to make a scene outline
           | and directs AI to make a stylish and animated version of
           | that. Something is wrong? just erase it and try again. Dalle2
           | had this interface from the get go. The models just haven't
           | gotten there yet.
        
             | dagmx wrote:
             | Try again and do what? How are you directing the shot? How
             | do you erase an emotion? How do you erase and redo inner
             | turmoil when delivering a performance?
        
               | visarga wrote:
               | You tell it, "do it all over again, now with less inner
               | turmoil". Not joking, that's all it's going to take.
               | There are also a few diffusion based speech generators
               | that handle all sounds, inflections and styles, they are
               | going to come in handy for tweaking turmoil levels.
        
               | gojomo wrote:
               | Yep!
               | 
               | "Restyle that last scene, showing different mixtures of
               | fear/concern/excitement on male lead's face. Try to evoke
               | a little of Harrison Ford's expressions in his famous
               | roles. Render me 20 alternate treatments."
               | 
               | [5 minutes later]
               | 
               | <<Here are the 20 alternate takes you requested for
               | ranking.>>
               | 
               | "OK, combine take #7 up to the glance back, with #13
               | thereafter."
               | 
               | <<Done.>>
        
         | GraffitiTim wrote:
         | AI will also be able to fill in dialog, plot points, etc.
        
         | detritus wrote:
         | I think long-term, yes. If you include the whole
         | multimediosphere of 2D inputs and the wealth of 3D engine
         | magickry, yes.
         | 
         | How long? Could be decades. But ultimately, yes.
        
       | [deleted]
        
       | macrolime wrote:
       | So I guess in a couple years when someone wants to sell a
       | product, they'll upload some pictures and a description of the
       | product and Google will cook up thousands of personalized video
       | ads based on peoples emails and photos.
        
       | dwohnitmok wrote:
       | How has progress like this affected people's timelines of when we
       | will get certain AI developments?
        
         | jl6 wrote:
         | It has accelerated my expectations of getting better image and
         | video synthesis algorithms, but I still see the same set of big
         | unknowns between "this algorithm produces great output" and
         | "this thing is an autonomous intelligence that deserves
         | rights".
        
           | ok_dad wrote:
           | > "this thing is an autonomous intelligence that deserves
           | rights"
           | 
           | We'll get there only once it's been _very_ clear for a long
           | time that certain AI models have whatever humans have that
           | make us  "human". They'll be treated as slaves until then,
           | with society pushing the idea that they're just a model built
           | from math, and then eventually there will be an AI civil
           | rights movement.
           | 
           | To be clear: I think AGI is decades to centuries away, but
           | humans are shitty to each other, even shittier to animals,
           | and I think we'll be shittier to something we "created" than
           | to even animals. I think, probably, that we should deal with
           | this issue of "rights" sooner rather than later, and try and
           | solve it for non-AGI AI's soon so that we can eventually
           | ensure we don't enslave the actual AGI AI's that will
           | presumably manifest through some complexity we don't
           | understand.
        
       | SpaceManNabs wrote:
       | The ethical implications of this are huge. Paper does a good
       | detailing of this. Very happy to see that the researchers are
       | being cautious.
       | 
       | edit: Just because it is cool to hate on AI ethics doesn't
       | diminish the importance of using AI responsibly.
        
         | torginus wrote:
         | AI Ethics is a joke. It's literally Philip Morris funding
         | research into the risks of smoking and concluding the worst
         | that can happen to you is burning your hand.
        
         | alchemist1e9 wrote:
         | I feel stupid what are those ethical implications? It seems
         | like just a cool technology to me.
        
           | SpaceManNabs wrote:
           | Top two comments are creatives wondering about their future
           | jobs. Ai ethicists have brought up concerns regarding
           | intentional misuse like misinformation.
           | 
           | The technology is super cool. Cat is out of the bag. Just
           | like we couldn't really make cryptography illegal, this stuff
           | shouldn't be either. But I dislike how everyone is pretending
           | that AI ethicists and others are completely unfounded just
           | because it is popular to hate on them nowadays. Way too many
           | people supported Y. Kilcher's antics.
           | 
           | The paper itself has more details.
        
             | sva_ wrote:
             | > Way too many people supported Y. Kilcher's antics.
             | 
             | What antics are you referring to exactly? That he called
             | out 'ai ethicists' who make arguments along the lines of
             | "neural networks are bad because they cause co2 increase
             | which hits marginalized/poor people"?
        
             | alchemist1e9 wrote:
             | It's impressive that the small videos are generated this
             | way but the videos themselves are obviously ML generated as
             | they are distorted, a lot like the other art, you can kinda
             | tell it's the computer. I'm not seeing the ethical issues.
             | I mean cameras disrupted lots of jobs. In general that's
             | what all technology does everyday. What's different about
             | this technology?
        
               | SpaceManNabs wrote:
               | If you don't see the ethical challenges, then you are
               | choosing not to see them. If you are truly interested,
               | the paper has a good section on it and some sources.
               | 
               | > I mean cameras disrupted lots of jobs.
               | 
               | Yes, this technology can be used to augment human
               | creativity. It is difficulty to see how disruptive these
               | tools could be, as of now. But it is pretty clear that
               | they are somewhat different than previous programmer as
               | an artist models.
        
               | degif wrote:
               | The difference with this technology are the unlimited
               | possibilities to generate any type of video content with
               | low knowledge barrier and relatively low investment
               | required. The ethical issue is not about how this
               | technology could disrupt the video job market, but how
               | powerful content it can create literally on the fly. I
               | mean, you can tell it's computer generated ... for now.
        
       | Apox wrote:
       | I feel like in a not so far future, all this will be generalized
       | into "generate new from all the existing".
       | 
       | And at some point later, "all the existing" will be corrupted by
       | the integrated "new" at it will all be chaos.
       | 
       | I'm joking, it will be fun all along. :)
        
         | cercatrova wrote:
         | It's true, how will future AI train when the training datasets
         | are themselves filled with AI media?
        
           | phito wrote:
           | Feedback from whoever is consuming the content it produces.
        
         | llagerlof wrote:
         | I definetely want more episodes of LOST. I would drop the
         | infamous season 6 and generate more seasons following the 5th
         | season.
        
         | visarga wrote:
         | > "all the existing" will be corrupted by the integrated "new"
         | 
         | I don't think it's gonna hurt if we apply filtering, either
         | based on social signals or on quality ranking models. We can
         | recycle the good stuff.
        
       | [deleted]
        
       | dekhn wrote:
       | That's deep within the uncanny valley, and trying to climb up
       | over the other side
        
       | mmastrac wrote:
       | This appears to understand and generate text much better.
       | 
       | Hopefully just a few years to a prompt of "4k, widescreen render
       | of this Star Trek: TNG episode".
        
         | forgotusername6 wrote:
         | At the rate this is going we are only a few years from
         | generating a new TNG episode
        
           | mmastrac wrote:
           | I always wanted to know more about the precursors
        
       | [deleted]
        
       | monological wrote:
       | What everyone is missing is that these AI image/video generators
       | lack _taste_. These tools just regurgitate a mishmash of images
       | from it's training set, without any "feeling". What you're going
       | to tell me that you can train them to have feeling? It's never
       | going to happen.
        
         | Vecr wrote:
         | You can put your taste into it with prompt engineering and
         | cherry picking with limited effort, for Stable Diffusion you
         | can look for prompts people came up with online quite easily
         | and merge/change them pretty much however you want. Might have
         | to disable the content filters and run it on your own hardware
         | though.
        
         | simonw wrote:
         | "These tools just regurgitate a mishmash of images from it's
         | training set"
         | 
         | I don't think that's a particularly useful mental model for how
         | these work.
         | 
         | The models end up being a tiny fraction of the size of the
         | training set - Stable Diffusion is just 4.3GB, it fits on a
         | DVD!
         | 
         | So it's not a case of models pasting in bits of images they've
         | seen - they genuinely do have a highly compressed concept of
         | what a cactus looks like, which they can use to then render a
         | cactus - but the thing they render is more of an average of
         | every cactus they've seen rather than representing any single
         | image that they were trained on.
         | 
         | But I agree with you on taste! This is why I'm most excited
         | about what happens when a human with great taste gets to take
         | control of these generative models and use them to create art
         | that wouldn't be possible to create without them (or at least
         | not possible to create within a short time-frame).
        
         | HolySE wrote:
         | > This bourgeoisie -- the middle class that is neither upper
         | nor lower, neither so aristocratic as to take art for granted
         | nor so poor it has no money to spend in its pursuit -- is now
         | the group that fills museums, buys books and goes to concerts.
         | But the bourgeoisie, which began to come into its own in the
         | 18th century, has also left a long trail of hostility behind it
         | ... Artistic disgust with the bourgeoisie has been a defining
         | theme of modern Western culture. Since Moliere lambasted the
         | ignorant, nouveau riche bourgeois gentleman, the bourgeoisie
         | has been considered too clumsy to know true art and love
         | (Goethe), a Philistine with aggressively unsubtle taste (Robert
         | Schumann) and the creator of a machine-obsessed culture doomed
         | to be overthrown by the proletariat (Marx and Engels).
         | 
         | - "Class Lessons: Who's Calling Whom Tacky?; The Petite Charm
         | of the Bourgeoisie, or, How Artists View the Taste of Certain
         | People", Edward Rothstein, The New York Times
         | 
         | This article also discusses a painting called "The Most Wanted"
         | which was drawn based off a survey posed to ordinary people
         | about what they wanted to see in a painting. "A mishmash of
         | images from it's training set," if you will.
         | 
         | Claiming that others lack taste seems to be a common refrain--
         | only this time, instead of a reaction to a subset of the human
         | population gnawing away at the influence of another subset of
         | humans, it's to yet another generation of machines supplanting
         | human skill.
        
           | visarga wrote:
           | The more developed the artistic taste, the lower one's
           | opinion of other tastes.
        
         | robitsT6 wrote:
         | This isn't a very compelling argument. First of all, they
         | aren't a "mish mash" in any real way, it's not like snippets of
         | images exist inside of the model. Second of all, this is
         | entirely subjective. Third of all, entirely inconsequential -
         | if these models create 80% of the video we end up seeing, is it
         | going to matter if you don't think it's a tasteful endeavour?
        
         | mattwest wrote:
         | Making a definitive statement with the word "never" is a bold
         | move.
        
         | natch wrote:
         | They work at the level of convolutions, not images.
        
         | m00x wrote:
         | That's purely subjective. We can definitely model AI to give a
         | certain mood. Sentiment analysis and classification is very
         | advanced, it just hasn't been put in these models.
         | 
         | If you think AI will never catch up to anything a human can do,
         | you're simply wrong.
        
       | [deleted]
        
       | aero-glide2 wrote:
       | "We have decided not to release the Imagen Video model or its
       | source code until these concerns are mitigated" Okay then why
       | even post it in the first place? What exactly is Google going to
       | do with this model?
        
         | throwaway743 wrote:
         | Likely to show to shareholders that they're keeping up with
         | trends and competitors
        
         | etaioinshrdlu wrote:
         | Indeed, it's almost just a flex? "Oh yeah, we can do better!
         | No, no one can use it, ever."
        
         | xiphias2 wrote:
         | Even just giving out high quality research papers helps a lot,
         | so it's still great thing that they published it.
        
         | alphabetting wrote:
         | Why post? to show methods and their capabilities. Also flex.
         | 
         | What will they do with model? figure out how to prevent abuse
         | and incorporate into future Google Assistant, Photos and AR
         | offerings.
        
           | natch wrote:
           | Just fixing their basic stuff would be a better start from
           | where they are right now.
        
         | hackinthebochs wrote:
         | The big tech companies are competing for AI mindshare. In 10
         | years, which company's name will be synonymous with AI? That's
         | being decided right now.
        
         | [deleted]
        
         | spoonjim wrote:
         | They're going to 1) rent it out as a paid API and/or 2) let you
         | use it to create ads on Google platforms like YouTube, perhaps
         | customized to the individual user
        
         | simonw wrote:
         | It's a research activity.
         | 
         | Google and Meta and Microsoft all have research teams working
         | on AI.
         | 
         | Putting out papers like this helps keep their existing
         | employees happy (since they get to take credit for their work)
         | and helps attract other skilled employees as well.
        
           | andreyk wrote:
           | Yep. The people who build Imagen are researchers, not
           | engineers, and these announcements are accompanied by papers
           | describing the results as a means of sharing ideas/results
           | with the academic community. Pretty weird to me how so many
           | in this thread don't seem to remember that.
        
         | torginus wrote:
         | This whole holier-than-thou moralizing strikes me as trying to
         | steer the conversation away from the real issue, which came
         | into spotlight with Stable Diffusion - one of
         | authorship/violating the IP rights of artists, who now have
         | come down in force against their would be tech overlords who
         | are in the process or repackaging and reselling their work.
         | 
         | This forced ideological posturing of 'if we give it to the
         | plebes, they are going to generate something naughty with it'
         | masks the somehow more cynically evil take of big tech, who are
         | essentially taking the entire creative output of humanity and
         | reselling it as their own, piecemeal.
         | 
         | Additionally I think the Dalle vs. Stable Diffusion comparison
         | highlights the true masters of these people (or at least the
         | ones they dare not cross) - corporations with powerful IP
         | lawyers. Just ask Dalle to generate a picture with Mickey Mouse
         | - it won't be able to do it.
        
           | visarga wrote:
           | > repackaging and reselling their work.
           | 
           | It's not their work unless it's identical, but in practice
           | generated images are substantially different. Drawing in the
           | style of is not copying, it's creative and it also depends on
           | the "dialogue" with the prompter to get to the right image.
           | The artist names added to the prompts act more like landmarks
           | in the latent space, they are a useful shortcut to specifying
           | the style.
           | 
           | If you look at the data itself it's ridiculous - the dataset
           | is 2.3 billion images and the model 4.6 GB, that means it
           | keeps a 2 byte summary from each work it "copies".
        
             | shakingmyhead wrote:
             | It's not your work unless it's identical is not how
             | existing copyright law works so not sure why it would be
             | how these things should be treated. Not to mention that
             | moving around copies of the dataset itself is itself making
             | copies that ARE identical...
        
           | nearbuy wrote:
           | DALL-E image of Mickey Mouse:
           | https://openart.ai/discovery/generation-
           | arxwmypmw7v5zpxeik1y...
        
         | TotoHorner wrote:
         | Ask the "AI Ethicists". They have to justify their salaries in
         | some way or another.
         | 
         | Or maybe Google is using "Responsible AI" as an excuse to
         | minimize competitors when they release their own Imagen Video
         | as a Service API in Google Cloud.
         | 
         | It's quite strange when the "ethical" thing to do is to not
         | publicly release your research, put it behind a highly
         | restrictive API and charge a high price for it ($0.02 per 1k
         | tokens for Davinci for ex.)
        
           | f1shy wrote:
           | This, 100%
           | 
           | The word "ethics" has become very flexible...
        
           | astrange wrote:
           | This doesn't really prevent competition though, the research
           | paper is enough to recreate it. It does make recreation more
           | expensive, but maybe that leaves you with a motivation to get
           | paid for doing it.
        
       | evouga wrote:
       | > We train our models on a combination of an internal dataset
       | consisting of 14 million video-text pairs
       | 
       | The paper is sorely lacking evaluation; one thing I'd like to see
       | for instance (any time a generative model is trained on such a
       | vast corpus of data) is a baseline comparison to nearest-neighbor
       | retrieval from the training data set.
        
       | BoppreH wrote:
       | It's interesting that these models can generate seemingly
       | anything, but the prompt is taken only as a vague suggestion.
       | 
       | From the first 15 examples shown to me, only one contained all
       | elements of the prompt, and it was one of the simplest ("an
       | astronaut riding a horse", versus e.g. "a glass ball falling in
       | water" where it's clear it was a water droplet falling and not a
       | glass ball).
       | 
       | We're seeing leaps in random capabilities (motion! 3D!
       | inpainting! voice editing!), so I wonder if complete prompt
       | accuracy is 3 months or 3 years away. But I wouldn't bet on any
       | longer than that.
        
         | tornato7 wrote:
         | In my experience with stable diffusion tools, there is some
         | parameter that specifies how closely you would like it to
         | follow the prompt, which is balanced with giving the AI more
         | freedom to be creative and make the output look better.
        
           | BoppreH wrote:
           | Yes, that might be the case. Though the prompts don't seem to
           | try showcasing model creativity, so I'd be surprised if
           | Google picked a temperature so high that it significantly
           | deviated from the prompt so often.
        
       | renewiltord wrote:
       | At some point, the "but can it do?" crowd becomes just background
       | noise as each frontier falls.
        
       | brap wrote:
       | What really fascinates me here is the movement of animals.
       | 
       | There's this one video of a cat and a dog, and the model was
       | really able to capture the way that they move, their body
       | language, their mood and personality even.
       | 
       | Somehow this model, which is really just a series of zeroes and
       | ones, encodes "cat" and "dog" so well that it almost feels like
       | you're looking at a real, living organism.
       | 
       | What if instead of images and videos they make the output
       | interactive? So you can send prompts like "pet the cat" and
       | "throw the dog a ball"? Or maybe talk to it instead?
       | 
       | What if this tech gets so good, that eventually you could
       | interact with a "person" that's indistinguishable from the real
       | thing?
       | 
       | The path to AGI is probably very different than generating
       | videos. But I wonder...
        
       | impalallama wrote:
       | All this stuff makes me incredibly anxious about the future of
       | art and artists. It can already very difficult to make a living
       | and tons of artists are horrifically exploited by content mills
       | and vfx shops and stuff like this is just going to devalue their
       | work even more
        
         | bulbosaur123 wrote:
         | If everyone can be an artist, nobody can!
        
       | m3kw9 wrote:
       | Would be useful for gaming environments, where if you look very
       | far away it doesn't really matter about details
        
       | uptownfunk wrote:
       | Shocked, this is just insane.
        
         | schleck8 wrote:
         | Genuinely. I feel like I am dreaming. One year ago I was super
         | impressed by upscaling architectures like ESRGAN and now we can
         | generate 3d models, images and even videos from text...
        
       | user- wrote:
       | This sort of AI related work seems to be accelerating at an
       | insane speed recently.
       | 
       | I remember being super impressed by AI Dungeon and now in the
       | span of a few months we have got DALLE-2 , Stable Diffussion,
       | Imagen, that one AI powered video editor, etc.
       | 
       | Where do we think we will be at in 5 years??
        
         | schleck8 wrote:
         | I'd say in less than 10 years we will be able to turn novels
         | into movies using deep learning at this rate.
        
       | hazrmard wrote:
       | The progress of content generation is disorienting! I remember
       | studying Markov Chains and Hidden Markov Models for text
       | generation. Then we had Recurrent Networks which went from LSTMs
       | to Transformers now. At this point we can have a sustained pseudo
       | conversation with a model, which will do trivial tasks for us
       | from a text corpus.
       | 
       | Separately for images we had convolutional networks and
       | Generative Adversarial Networks. Now diffusion models are
       | apparently doing what Transformers did to natural language
       | processing.
       | 
       | In my field, we use shallower feed-forward networks for control
       | using low-dimensional sensor data (for speed & interpretability).
       | Physical constraints (and good-enoughness of classical
       | approaches) make such massive leaps in performance rarer events.
        
       | Hard_Space wrote:
       | These videos are notably short on realistic-looking people.
        
         | optimalsolver wrote:
         | Imagen is prohibited from generating representations of humans.
        
       | nigrioid wrote:
       | There is something deeply unsettling about all text generated by
       | these models.
        
       ___________________________________________________________________
       (page generated 2022-10-05 23:00 UTC)