[HN Gopher] Imagen Video: high definition video generation with ... ___________________________________________________________________ Imagen Video: high definition video generation with diffusion models Author : jasondavies Score : 435 points Date : 2022-10-05 17:38 UTC (5 hours ago) (HTM) web link (imagen.research.google) (TXT) w3m dump (imagen.research.google) | jupp0r wrote: | What's the business value of publishing this research in the | first place vs keeping it private? Following this train of | thought will lead you to the answer to your implied question. | | Apart from that - they publish the paper and anybody can | reimplement and train the same model. It's not trivial but it's | also completely feasible to do for lots of hobbyists in the field | in a matter of a few days. Google doesn't need to publish a free | use trained model themselves and associate that with their brand. | | That being said, I agree with you, the "ethics" of imposing | trivially bypassable restrictions on these models is silly. | Ethics should be applied to what people use these models for. | amelius wrote: | > Sprouts in the shape of text 'Imagen' coming out of a fairytale | book. | | That's more like: | | > Sprouts coming out of book, with the text "Imagen" written | above it. | Kiro wrote: | The prompt actually says "Imagen Video" and the sprouts form | the word "video". Even if they weren't it's still extremely | impressive. No-one expects this to be perfect. That would be | science-fiction. | montebicyclelo wrote: | We've been seeing very fast progress in AI since ~2012, but this | swift jump from text-to-image models to text-to-video models will | hopefully make it easier for people not following closely to | appreciate the speed at which things are advancing. | nullc wrote: | > We have decided not to release the Imagen Video model or its | source code | | ...until they're able to engineer biases into it to make the | output non-representative of the internet. | kranke155 wrote: | I'm going to post an Ask HN about what am I supposed to do when | I'm "disrupted". I work in film / video / CG where the bread and | butter is short form advertising for Youtube, Instagram and TV. | | It's painfully obvious that in 1 year the job might be | exceedingly more difficult than it is now. | dkjaudyeqooe wrote: | Adapt, it's what humans excel at. | | Instead of feeling threatened by the new tools, think about how | you can use them to enable your work. | | One of the ironies* of these tools is that they only work | because there is so much existing material they can be trained | on. Absent that they wouldn't exist. That makes me think: why | not think about how to train your own models than entail your | own style? Is that practical, how can you make it work and how | might you deploy that in your own work? | | Something that everyone is sticking their heads in their sand | about is the real possibility that training models on | copyrighted work is a copyright violation. I can't see how such | a mechanical transformation of others' work is anything but. | People accept violating one person's copyright is a thing but | if you do it at scale it somehow isn't. | | * ironic because they seem creative but they create nothing by | themselves, they merely "repackage" other people's creativity. | inerte wrote: | It depends where you are in the industry. | | If you're on the creative, storyboard, come up with ideas and | marketing side, you will be fine. | | If you're in actual production, booking sets, unfolding stairs | to tape infinite background, picking up the best looking fruits | in the grocery store... yeah, not looking good. | | Go up in the value chain and learn marketing, how to tell | stories, etc... you don't want to be approached by clients | telling you what you should be doing, you want to be approached | and being asked what the clients should be doing. | j_k_eter wrote: | I first predicted this tech 5 years ago, but I thought it was | 15 years out. What I just said is beginning to happen with | pretty much everything. There's a third sentence, but if I | write it 10 people will gainsay me. If I omit it, there's a | better chance that 10 people will write it for me. | adamsmith143 wrote: | Learn how to use these models is the easiest answer. Prompt | Engineering (getting a model to output what you actually want) | is going to be something of an art form and I would expect it | to be in demand. | ijidak wrote: | It won't be easy. But below are my thoughts: | | #1: Master these new tools #2: Build a workflow that | incorporates these tools #3: Master storytelling #4: Master ad | tracing and analytics #5: Get better at marketing yourself so | that you stand out | | The market for your skillset may shrink, but I doubt it will | disappear... | | Think about it this way... | | Humans in cheaper countries are already much more capable than | any AI we've built. | | Yet, even now, There are practical limits on outsourcing. | | It's hard for me to see how this will be much different for | creative work. | | It's one thing to casually look at images or videos, when there | is no specific money-making ad in mind. | | But as soon as someone is spending thousands to run an ad | campaign, just taking whatever the AI spits out is unlikely to | be the real workflow. | | I guess I'm suggesting a more optimistic take... | | View it as a tool to learn and incorporate in your workflow | | I don't know if you gain much by stressing too much about being | replaced. | | And I'm not even sure that's reality. | | I'm almost certain, most of the humans to lose their jobs will | be people who either because of fear or stubbornness refuse to | get better, refuse to incorporate these tools, and are thus | unable to move up the value chain. | alcover wrote: | Get better [...] so that you stand out | | Please bear with me but this kind of advice is often a bit | puzzling to me. I suppose you don't know the person you're | replying to, so I read your advice as a general one - useful | to anyone in the parent's position. If you were close to her, | it would make sense to help her 'stand out' in detriment - | logically - to strangers in her field. But here you're kind | of helping every reader stand out. | | I realise this comment is a bit vain. And I like the human | touch of you helping a stranger. | PinkMilkshake wrote: | I [...] don't [...] like [...] helping a stranger. | | That's not very nice. The world would be a better place if | we helped strangers more. | metadat wrote: | Here's the link to kranke155's submission: | https://news.ycombinator.com/item?id=33099182 | baron816 wrote: | Quite the opposite: you're going to be in even higher demand | and will make more money. | | Yes, it will be possible for one person to do the work of many, | but that just means each person becomes more valuable. | | It's also a law in economics that supply often drives demand, | and that's definitely the case in your field. Companies and | individuals will want even more of what you want. It's not like | laundry detergent (one can only consume so much of that). | There's almost no limit to how much of what you supply that | people could consume. | | The way I see it, your output could multiply 100 fold. You | could build out large, complex projects that used to take | massive teams all by yourself, and in a fraction of the time. | Companies can than monetize that for consumers. | | AI is just a tool. Software engineers got rich when their tools | got better. More engineers entered the field, and they just | kept getting richer. That's because the value of each engineer | increased as they became more productive, and that value helped | drive demand. | naillo wrote: | Whatever insights and expertize you've gained up until now can | probably be used to gain enough of a competitive advantage in | this future industry to be employed. I doubt the people that | will spend their time on this professionally will be former | coders etc. (I've seen the stable diffusion outputs that coders | will tweet. It's a good illustration that taste is still hugely | important.) | altcognito wrote: | I think there will be tons of jobs that resemble software | development for proper, quick high quality generation of | video/images. | | That being said, it's possible that it won't pay anywhere | near what you're used to. Either way, it will probably be a | solid decade before you've really felt the pain for | disruption. MP3s, which were a far more straightforward path | to disruption took at least that long from conception. | jstummbillig wrote: | > That being said, it's possible that it won't pay anywhere | near what you're used to. | | Also won't nearly require the amount of work it used to. | joshuahaglund wrote: | I like your optimism but OP's job is to take text | instructions and turn them into video, for advertisements. If | Google (who already control so much of the advertising space) | can take text instructions and turn them into advertisements, | what's left for OP to do here? Even if there's some | additional editing required this seems like it will greatly | reduce the hours an editor is needed. And it can probably | iterate options and work faster than a human. | pyfork wrote: | OP probably does more than it seems by interpreting what | their client is asking for. Clients ask for some weird shit | sometimes, and being able to parse the nonsense and get to | the meat is where a lot of skill comes into play. | | I think Cleo Abrams on YT recently tackled this exact | question. She tried to generate art using DALL-E along with | a professional artist, and after letting the public vote | blindly, the pro artist clearly 'made' better content, even | though they were both just typing into a text prompt. | | Here's the link if you're interested: | https://www.youtube.com/watch?v=NiJeB2NJy1A | | I could see a lot of digital artists actually getting | _better_ at their job because of this, not getting totally | displaced. | simonw wrote: | Maybe OP's future involves being able to do their work 10x | faster, while producing much higher quality results than | people who have been given access to a generative AI model | without first spending a decade+ learning what makes a good | film clip. | | The optimistic view of all of this is that these tools will | give people with skill and experience a massive | productivity boost, allowing them to do the best work of | their careers. | | There are plenty of pessimistic views too. In a few years | time we'll be able to look back on this and see which | viewpoints won. | gjs278 wrote: | Keyframe wrote: | What happened to volume of web and graphic designers when | templates+wordpress hit them? | yehAnd wrote: | We employed a bunch of people to enter data into a template. | | Bit of an apples/oranges comparison to tech that will | (eventually) generate endless supply of content with less | effort than writing a Tweet. | | The era of inventing layers of abstraction and indirection | that simplify computer use down to structured data entry is | coming to an end. A whole lot of IT jobs are not safe either. | Ops is a lot of sending parameters over the wire to APIs for | others to compute. Why hire them when "production EKS | cluster" can output a TF template? | jstummbillig wrote: | A lot of additional work, because the industry was growing | like crazy in tandem. | visarga wrote: | Exactly. We have a blindspot, we can't imagine second and | higher order effects of a new technology. So we're left | with first order effects which seem pessimistic for jobs. | Thaxll wrote: | It won't be ready anytime soon imo, looks impressive but who | can use that? 512*512 of bad quality, weird looking AI with | those moving part that you find everywhere in AI generated art | etc ... | odessacubbage wrote: | i really think it's going to take much longer than people think | for this technology to go from 'pretty good' to actually being | able to meet a production standard of quality with little to no | human involvement. at this point, cleaning up after an ai is | still probably more labor intensive than simply using the | cheatcodes that already exist for quick and cheap realism. i | expect in the midterm, diffusion models will largely exist in | the same space as game engines like unity and unreal where it's | relatively easy for an illiterate like me to stay within the | rails and throw a bunch of premade assets together but getting | beyond _NINTENDO HIRE THIS MAN!_ and the stock 'look' of the | engine still takes a great deal of expertise. | >https://www.youtube.com/watch?v=C1Y_d_Lhp60 | victor9000 wrote: | Don't watch from the sidelines. Become adept at using these | tools and use your experience to differentiate yourself from | those entering the market. | jeffbee wrote: | When you animate a horse, does it have 5 legs with weird | backwards joints? If not, your job is probably safe for now. | spoonjim wrote: | Think about where this stuff was 2 years ago and then think | about where it will be 2 years from now. | rcpt wrote: | Relationships between objects has been a problem with | computer vision for a long time. | | 10 years ago: https://karpathy.github.io/2012/10/22/state- | of-computer-visi... | | Now: https://arxiv.org/pdf/2204.13807 | | Given that this is what makes photos and videos interesting | I think it's still a while before artists are automated. | visarga wrote: | Take a look at Flamingo "solving" the joke: https://pbs.t | wimg.com/media/FSFwYL7WUAEgxqQ?format=jpg&name=... | kranke155 wrote: | How long do you think until the horse looks perfect? 12 | months? 5 years? I'm still 30 and I don't see how my industry | won't be entirely disrupted by this within the next decade. | | And that's my optimistic projection. It could be we have | amazing output in 24 months. | visarga wrote: | IT has been disrupting itself for six decades and there are | more developers than ever, with high pay. | bitL wrote: | It's not about random short clips - imagine introducing a | character like Mickey Mouse and reusing him everywhere with | the same character - my guess is it's going to take a while | until "transfer" like that will work reliably. | fragmede wrote: | Dreambooth and Texual inversion is already here, and it's | been just over a month since Stable Diffusion was | released, so I'd bet on sooner rather than later. | | https://github.com/XavierXiao/Dreambooth-Stable-Diffusion | | https://textual-inversion.github.io/ | Vetch wrote: | Have to temper expectations with fact that a generated | video of a thing is also a recording of a simulation of the | thing. For long video, you'd want everything from temporal | consistency and emotional affect maintenance to | conservation of energy, angular momentum and respecting | this or that dynamics. | | A bunch of fields would be simultaneously impacted. From | computational physics to 3D animation (if you have a 3D | renderer and video generator, you can compose both). While | it's not completely unfounded to extrapolate that progress | will be as fast as with everything prior, consequences | would be a lot more profound while complexities are much | compounded. I down weight accordingly even though I'd | actually prefer to be wrong. | boh wrote: | There's a huge gap between "that's pretty cool" and a feature | length film. People want to create specific stories with | specific scenes in specific places that look a specific way. A | "Couple kissing in the rain " prompt isn't going to produce | something people are going to pay to see. | | It's more likely that you're still going to be | filming/editing/animating but will have an AI layer on top that | produces extra effects or generates pieces of a scene. Think | "green screen plus", vs fully AI entertainment. | | People will over-hype this tech like they did with voice and | driverless cars but don't let it scare you. Everything is | possible, but it's like a person from the 1920's telling | everyone the internet will be a thing. Yes it's correct, but | also irrelevant at the same time. You already have AI assisted | software being used in your industry. Just expect more of that | and learn how to use the tools. | oceanplexian wrote: | I actually think it's the opposite, AI will probably be | writing the stories and humans might occasionally film a few | scenes. ~95% of TV shows and movies are cookie-cutter | content, with cookie-cutter acting and production values, | with the same hooks and the same tropes regurgitated over and | over again. Heck they can't even figure out how to make new | IP so they keep making reruns of the same old stuff like Star | Wars, Marvel, etc, and people eat it right up. There's | nothing better at figuring out how to maximize profit and | hook people to watch another episode than a good algorithm. | [deleted] | CuriouslyC wrote: | AI might take an outline and write | dialogue/descriptions/etc, but it's not going to be | generating the story or creating the characters. They might | use AI to tune what people come up with (ala "market | research") but there will still be a human that can be | blamed or celebrated at the creative helm. | kranke155 wrote: | The first thing to go away will be short content. Instagram | and YouTube ads will be AI generated. The thing is - that's | the bread and butter of the industry | trention wrote: | Why would I want to watch AI-generated content? | throwaway743 wrote: | It'll eventually get to the point where it's high quality | and the media you consume will be generated just for you | based on your individual preferences, rather than a | curated list of already made options made for widespread | audiences. | CuriouslyC wrote: | Procedurally generated games can be quite fun, if AI | content gets good enough, why wouldn't you want to watch | it? | trention wrote: | Because anything that an AI can produce, no matter how | "intrinsically" good, becomes trivial, tedious and with | zero value (both economic and general). | cercatrova wrote: | That's a weird sentiment. If you can concede that it | could be "intrinsically" good, then why do you care where | it came from? | | It reminds me of part of the book trilogy Three Body | Problem, where these aliens create human culture better | than humans (in the humans' own perspective, in the book) | by decoding and analyzing our radio waves to then make | content. It feels to me much the same here where an | unknown entity creates media, and we might like it | regardless of who actually made it. | gbear605 wrote: | Imagine you're watching a show, it's really funny and | you're enjoying it. You're streaming it, but you'd | probably have paid a few dollars to rent it back in the | Blockbuster days. You're then told that the show was | produced by an AI. Do you suddenly lose interest because | you don't want to watch something produced by an AI? Or | is your hypothesis that an AI could never produce a show | that you liked to that degree? | | If you mean the former, then I frankly think you're an | outlier and lots of people would have no problem with | that. If you mean the latter, then I guess we'll just | have to wait and see. We're certainly not there yet, but | that doesn't mean that it's impossible. I've definitely | read stories that were produced by an AI and preferred it | to a lot of fiction that was written by humans! | trention wrote: | You may want to familiarize yourself with this thought | experiment and think how a slightly modified version | applies to AIs and their output: | https://en.wikipedia.org/wiki/Experience_machine | | As to whether I am an outlier: Hundreds of thousands of | people worldwide watch Magnus Carlsen. How many have | watched AlphaZero play chess when it came about and how | many watch it when it ceased to be a novelty? | armchairhacker wrote: | The last-mile problem applies here too. GPT-3 text is | convincing at a distance but when you look closely there is | no coherence, no real understanding of plot or emotional | dynamics or really anything. TV shows and movies are filled | with plot holes and bad writing but it's not _that_ bad. | | Also I think "a good algorithm" is more than just | repetitive content. The plots are reused and generic, but | there's real skill involved into figuring out the next | series to reuse with a generic plot which is still | guaranteed not to flop because nobody actually wants to see | reruns of that series or they accidentally screwed up a | major plot point. | karmasimida wrote: | I think short advertisements would be affected most by this, it | seems. | | But here is the catch, there is the same last mile problem for | those AI models. Currently it feels like the model can achieve | like 80%-90% what a trained human expert can do, but the last | 10-20% would extra extra hard to reach human fidelity. It might | take years, or it might never happen. | | That being said, I think anyone who doubts AI-assisted creative | workflow is a fuzz is deadly wrong, anyone who refuses those | shiny new tools, is likely to be eliminated by sheer market | dynamics. They can't compete on the efficiency of it. | echelon wrote: | Start making content and charging for it. You no longer need | institutional capital to make a Disney- or Pixar-like | experience. | | Small creators will win under this new regime of tools. It's a | democratizing force. | yehAnd wrote: | Outcome uncertain. Why would I need to buy content when I can | generate my own with a local GPU? | | Eventually the data model will be abstracted into | deterministic code using a seed value; think implications of | E=mc^2 being unpacked. The only "data" to download will be | the source. | | And the real world politics have not gone anywhere; none of | us own the machines that produce the machines to run this. | They could just sell locked down devices that will only | iterate on their data structures. | | There is no certainty "this time" we'll pop "the grand | illusion." | visarga wrote: | > It's a democratizing force. | | I'm wondering why the open source community doesn't get this. | So many voices were raised against Codex. Now artists against | Diffusion models. But the model itself is a distillation of | everything we created, it can compactly encode it and | recreate it in any shape and form we desire. That means | everyone gets to benefit, all skills are available for | everyone, all tailored to our needs. | echelon wrote: | > all skills are available for everyone | | Exactly this! | | We no longer have to pay the 10,000 hours to specialize. | | The opportunity cost to choose our skill sets is huge. In | the future, we won't have to contend with that horrible | choice anymore. Anyone will be able to paint, play the | piano, act, code, and more. | operator-name wrote: | A 1 year timespan seems deeply optimistic. Creativity is still | hugely important, as is communicating with clients. | | From what I see, these technologies have just lowered the bar | for everyone to create someone, but creating something good | still takes thought, time, effort and experience, especially in | the advertising space. | | AI in the near term is never going to be able to translate | client requirements either. The feedback cycle, iterations, | managing client expectations, etc. | natch wrote: | Fix spam filtering, Google. | tobr wrote: | I recently watched Light & Magic, which among other things told | the story of how difficult it was for many pioneers in special | effects when the industry shifted from practical to digital in | the span of a few years. It looks to me like a similar shift is | about to happen again. | mkaic wrote: | And there you have it. As an aspiring filmmaker and an AI | researcher, I'm going to relish the next decade or so where my | talents are still relevant. We're entering the golden age of art, | where the AIs are just good enough to be used as tools to create | more and more creative things, but not good enough yet to fully | replace the artist. I'm excited for the golden age, and uncertain | about what comes after it's over, but regardless of what the | future holds I'm gonna focus on making great art here and now, | because that's what makes me happy! | amelius wrote: | Don't worry. If you can place eyes, nose and mouth of a human | in a correct relative position and thereby create a symmetric | face that's not in the uncanny valley, you are still lightyears | ahead of AI. | lucasmullens wrote: | > fully replace the artist | | I doubt the artist would ever be "fully" replaced, or even | mostly replaced. People very much care about the artist when | they buy art in pretty much any form. Mass produced art has | always been a thing, but I'm not alone in not wanting some $15 | print from IKEA on my wall, even if it were to be unique and | beautiful. Etsy successfully sells tons of hand-made goods, | even though factories can produce a lot of those things | cheaper. | visarga wrote: | I think the distinction between creating and enjoying art is | going to blur, we're going to create more things just for us, | just for one use, creating and enjoying are going to be the | same thing. Like games. | Thaxll wrote: | Someone can explains the tech limitation of the size ( 512*512 ) | for those AI generated arts? | thakoppno wrote: | byte alignment has always been a consideration for high | performance computing. | | this alludes to a fascinating, yet elementary, fact about | computer science to me: there's a physical atomic constraint in | every algorithm. | dekhn wrote: | that's not byte alignment, though- those constraints are what | can be held in GPU RAM during a training batch, which is | subject to a number of limits, such as "optimal texture size | is a power of 2 or the next power of 2 larger than your | preferred size". | | Byte alignment would be more like "it's three channels of | data, but we use 4 bytes (wasting 1 byte) to keep the data | aligned on a platform that only allows word-level access" | thakoppno wrote: | thanks for the insight. you obviously understand the domain | better than me. let me try and catch up before I say | anything more. | fragmede wrote: | It's limited by the RAM on the GPU, with most consumer-grade | cards having closer to 8 GiB VRAM than the 80 GiB VRAM | datacenter cards have. | throwaway23597 wrote: | Google continues to blow my mind with these models, but I think | their ethics strategy is totally misguided and will result in | them failing to capture this market. The original Google Search | gave similarly never-before-seen capabilities to people, and you | could use it for good or bad - Google did not seem to have any | ethical concerns around, for example, letting children use their | product and come across NSFW content (as a kid who grew up with | Google you can trust me on this). | | But now with these models they have such a ridiculously heavy | handed approach to the ethics and morals. You can't type any | prompt that's "unsafe", you can't generate images of people, | there are so many stupid limitations that the product is | practically useless other than niche scenarios, because Google | thinks it knows better than you and needs to control what you are | allowed to use the tech for. | | Meanwhile other open source models like Stable Diffusion have no | such restrictions and are already publicly available. I'd expect | this pattern to continue under Google's current ideological | leadership - Google comes up with innovative revolutionary model, | nobody gets to use it because "safety", and then some scrappy | startup comes along, copies the tech, and eats Google's lunch. | | Google: stop being such a scared, risk averse company. Release | the model to the public, and change the world once more. You're | never going to revolutionize anything if you continue to cower | behind "safety" and your heavy handed moralizing. | j_k_eter wrote: | Google has no practical way to address ethics at Google-scale. | Their ability to operate at all depends as ever upon | outsourcing ethics to machine learning algorithms. | FrasiertheLion wrote: | Why did you create a throwaway to post this? I've seen a lot of | Stable Diffusion promoters on various platforms recently, with | similarly new accounts. What is up with that? | throwaway23597 wrote: | It's quite simply because I'm on my work computer, and I | wanted to fire off a comment here. No nefarious purposes. My | regular account is uejfiweun. | Kiro wrote: | What previous models are you actually referring to? | OpenAI/Dall-E has these restrictions but they are not Google. | rcoveson wrote: | Maybe I'm reading into it to much, but could it be that you're | posting this comment with a throwaway account for the same | reason that Google is trying to enforce Church WiFi Rules with | its new tech? Seems like everybody with anything to lose is | acting scared. | ALittleLight wrote: | Personally, I find it infuriating that Google seems to believe | they are the arbiters of morality and truth simply because some | of their predecessors figured out good internet search and how | to profitably place ads. Google has no special claim to be able | to responsibly use these models just because they are rich. | kajecounterhack wrote: | It's not that they are arbiters of morality and truth -- it's | that they have a _responsibility_ to do the least harm. They | spent money and time to train these models, so it's also up | to them to see that they aren't causing issues by making such | things widely available. | | They won't be using the models they train to commit crimes, | for example. Someone who gets access to their best models may | very well do that. It'd be really funny (lol, no) if Google's | abuse team started facing issues because people are making | more robust fake user accounts...by using google provided | models. | ALittleLight wrote: | Ahh, how silly of me. Here I was thinking that Google kept | their models private because they were hoping to monetize | them. But now that you say it, it's obvious that this is | just Google being morally responsible. Thanks Google! | | I'm sorry to be sarcastic. I generally try not to be, but I | just can't fathom the level of naivete required to think | that mega-corps act out of their moral responsibility | rather than their profit-interest. | trention wrote: | >Google has no special claim to be able to responsibly use | these models | | Well, they do have the "special claim" of inventing the model | and not owing its release to anyone. | TigeriusKirk wrote: | It's trained on our data, and so its release is in fact | owed to us. | Kiro wrote: | You are confusing this with OpenAI like everyone else in | this thread. | ALittleLight wrote: | First, that isn't a claim of any kind regarding responsible | use. If a child is the first one to discover a gun in the | woods, that is no kind of claim that the child will use the | gun responsibly. Second, Google's invention builds off of | public research that was made available to them. They just | choose to keep their iterations private. | [deleted] | alphabetting wrote: | Providing search results of the internet is not comparable to | publishing a tool that can create any explicit scene your | fingers can type out. | holoduke wrote: | Google image search is widely used. Imagine they incorporate | ai generated content in the search results. That means that | people remain at the Google site and thus an extra impression | for their paid advertising. | faeriechangling wrote: | I've heard a lot of "data is the new oil" talk and the | inevitability of google's dominance yet I'm inclined to agree | with you. Stable diffusion was a big wakeup call where it was | clear how much value freedom and creativity really had. | | The ethics problem is an artifact of googles model of trying to | keep their AI under lock and key and carefully controlled and | opaque to outsiders in how the sausage gets made and what it's | made out of. Ultimately I think many of these products will | fail because there is a misalignment between what Google thinks | you should be able to do with their AI and what people want to | do with AI. | | Whenever I see an AI ethicists speak I can't help but think of | priests attempting to control the printing press to prevent the | spread of dangerous ideas completely sure of their own | morality. History will remember them as villains. | alphabetting wrote: | I agree the ethicist types are very lame but if they were | trying to be opaque and obscure how the sausage is made I | don't think they would have released as many AI papers they | have over past decade. It also seems to me that imagen is way | better than stable diffusion. They're not aiming for a | product that caters to AI creatives. They aiming for tools | that would benefit a 3B+ userbase. | londons_explore wrote: | If you want to hire good researchers, you have to let them | publish. | | Good researchers won't work somewhere that doesn't allow | the publishing of papers. And without good researchers, you | won't be on the forefront of tech. Thats why nearly all | tech companies publish. | evouga wrote: | > History will remember them as villains. | | Interesting analogy. Google, like the priests, is acting out | of mix of good intentions (protecting the public from | perceived dangers) and self-interest (maintaining secular | power, vs. a competitive advantage in the AI space). In the | case of the priests, time has shown that their good | intentions were misguided. I have a pretty hard time | believing that history will be as unkind towards those who | tried to protect minorities from biased tech, though of | course that's impossible to judge in the moment. | ipaddr wrote: | History will treat them the same way residential native | schools are being treated now. At the time taking these | kids from their homes and giving them a real education | which gives them a path to modern society was seen as | protecting minorities. Today anyone associated with | residential schools is seen as creating great harm to | minorities. | | In the name of protecting [minorities, child, women, lgbt, | etc] many harms will be done. | saurik wrote: | > I have a pretty hard time believing that history will be | as unkind towards those who tried to protect minorities | from biased tech.. | | Most of the ethicists I see actually doing gatekeeping from | direct use of models--as opposed to "merely" attempting | model bias corrections or trying to convince people to | avoid its overuse (which isn't at all the same)--are not | trying to deal with the "AI copies our human biases" | problem but are trying to prevent people from either | building a paperclip optimizer that ends the world or (and | this is the issue with all of these image models) making | "bad content" like fake photographs of real people in | compromising or unlikely scenarios that turn into "fake | news" or are used for harassment. | | (I do NOT agree with the latter people, to be clear: I | believe the world will be MUCH BETTER OFF if such "bad" | image generation were fully commoditized and people stopped | trying to centrally police information in general, as I | maintain they are CAUSING the ACTUAL problem of | misinformation feeling more rare or difficult to generate | than it actually already is, which results in people | trusting random people because "clearly some gatekeeper | would have filtered this if it weren't true". But this just | isn't the same thing as the people who I-think-rightfully | point out "you should avoid outsourcing something to an AI | if you care about it being biased".) | blagie wrote: | My experience is that corporations use self-serving | pseudoethical arguments all the time. "We'd like to keep | this proprietary.... Ummmm.. DEI! We can't release it due | to DEI concerns!" | kajecounterhack wrote: | It's not as simple as this. Google Search came without Safe | Search & other guards at first because _implementing privacy & | age controls is hard_. It's a second-order product after the | initial product. Bad capabilities (e.g. cyberstalking) are | side-effects of a product that "organizes the world's | information and makes it universally accessible and useful," | and if anything, over time Google has sought build in more | safety. | | It's 2022 and we can be more thoughtful. Yes there are | tradeoffs between unleashing new capabilities quickly vs being | thoughtful and potentially conservative in what is made | publicly available. I don't think it's bad that Google makes | those tradeoffs. | | FWIW Google open sources _tons_ of models that aren't LLMs / | diffusion models. It's just that LLMs & powerful generative | models have particular ethical considerations that are worth | thinking about (hopefully something was learned from the whole | Timnit thing). | waynecochran wrote: | I imagine their lawyers guide them on some of this. | abeppu wrote: | I will say, I've enjoyed playing with stable diffusion, I've | been impressed with the explosion of tools built around it, and | the stuff people are creating ... But all the stuff about bias | in data is true. It really likes to render white people, unless | you really specifically tell it something else ... in which | case, you may receive an exaggerated stereotype. It seems to | like producing younger adults. If all stock photography | tomorrow forward was replaced with stable diffusion images, | even ignoring the weird bodies and messed up faces and stuff, I | think it would create negative effects. And once models are | naively trained on images produced by the previous generation, | how much worse will it be? | | I don't think "don't let the plebes have the models" is a good | stance. But neither is pretending that the ethics and bias | issues aren't here. | pwython wrote: | I've only had awesome experiences with Midjourney when it | comes to generating non-white prompts. Here's some examples I | did last month: https://imgur.com/a/6jitj73 | iso1337 wrote: | The fact that white is the default is already problematic. | ipaddr wrote: | That goes back to the data available in the crawler which | is mostly white because the english internet is mostly | white. If they trained with a different language the | default person would the color most often found in that | language. For example using a Chinese search engine's | data for training would default the images to Chinese | people. | | Most people represented in photos are younger. Same | story. | | The problematic issue is the media has morphed reality | with unreal images of people/families that don't match | society so unreal expectations make people think that | having white people generated from a white dataset is | problematic. | karencarits wrote: | "Default" makes it sound like a deliberate decision or | setting, but that is not how these models work. But I | guess it would be trivial to actually make a setting to | autmatically add specific terms (gender, race, style, | ...) to all prompts if that is a desired feature | holoduke wrote: | Please no. I am all for neutrality, but the underlying | cause is the training dataset. Change that if you want | different results, but do not alter artificially. | geysersam wrote: | Of course there are issues with bias. But those issues are | just reflections of the world. Their solution is not a | technical one. | abeppu wrote: | I think that's refusing to meaningfully engage with the | problem. It's not reflecting the _world_ which is not | majority white. It's reflecting images in their dataset, | which reflects the way they went about gathering images | paired with English language text. | | There are lots of other ways you could get training data, | but they might not be so cheap. You could have humans give | English descriptions to images from other language | contexts. I'm guessing there's interesting things to do | with translation. But all the weird stuff about bodies, | physical objects intersecting etc ... maybe it should also | be rendering training images from parametric 3d models? | Maybe they should be commissioning new images with phrases | that are likely to the language model but unlikely to the | image model. Maybe they should build classifiers on images | for race/gender/age and do stratified sampling to match | some population statistics (yes I'm aware this has its own | issues). There are lots of potential technical tools one | could try to improve the situation. | | Implying that the whole world must change before one | project becomes less biased is just asking for more biased | tech in the world | jonas21 wrote: | It makes sense though. The biggest threat to Google right now | isn't some scrappy startup eating their lunch. It's the looming | regulatory action over antitrust and privacy that could weaken | or destroy their core business. As this is a political problem | (not a technical one), they don't want to do anything that | could upset politicians or turn public opinion against them. | Personally, I doubt they have serious ethical concerns over | releasing the model. I do believe they have serious "AI ethics | 'thought leaders' and politicians will use this against us" | concerns. | londons_explore wrote: | And that concern is well placed. Having the Google brand | attached makes it a far more juicy target for newspapers... | IshKebab wrote: | I agree, but I also think that the ethics is just an excuse not | to release the source code & models. The AI community clearly | disapproves of papers without code. This is a way to skirt | around that disapproval. You get to keep the code and models | private and (they hope) not be criticised for it. | | With Stable Diffusion I think they just didn't expect someone | to produce a truly open version. There are plenty of AI models | that Google have made where they've maintained a competitive | advantage for many years by not releasing the code/models, e.g. | speech recognition. | whatgoodisaroad wrote: | Perhaps Google hasn't found the right balance in this case, but | as a general rule, less ethics === more market. This isn't | unique in that way. | breck wrote: | Another way to look at it is the people at Google are all now | quasi-retired with kids and wouldn't be so mad if some scrappy | startups ate their business lunches (while they are at home | with their fams). Perhaps they are just subsidizing research. | jiggawatts wrote: | "But then the inevitable might occur!" -- someone at Google | probably. | yreg wrote: | >You can't type any prompt that's "unsafe", you can't generate | images of people, there are so many stupid limitations that the | product is practically useless other than niche scenarios | | Imagen and Imagen Video is not released to the public at all. | You might be confusing it with OpenAI's models. | burkaman wrote: | They are probably confusing OpenAI with DeepMind, which is | owned by Google. | dougmwne wrote: | Google is absolutely not going to start taking more risks. They | are at the part of the business lifecycle where they squeeze | the juice out of the cash cow and protect it jealously in the | meantime. While Google gets much recognition for this research, | I believe they are incapable as a corporate entity of creating | a product out of it because they can no longer capable of | taking risks. That is going to fall to other companies still | building their product and able to gamble on risk-reward. | alphabetting wrote: | We're about a week into text-to-video models and they're already | this impressive. Insane to imagine what the future holds in this | space. | kertoip_1 wrote: | How is it possible that all of them just started to appear at | the same time? Is it possible that those models were designed | and trained in a last few weeks? Has some "magic key" to | content generation been just unexpectedly discovered? Or the | topic became trendy and everyone is just publishing what | they've got so far, so they hope to benefit from media | attention? | schleck8 wrote: | This is why | | https://www.reddit.com/r/singularity/comments/xwdzr5/the_num. | .. | trention wrote: | >We're about a week into text-to-video models | | It's at the very least 5 years old: | https://arxiv.org/abs/1710.00421 | amilios wrote: | There's a significant quality difference however if you look | at the generated samples in the paper. Imagen Video is | leagues ahead. The progress is still quite drastic | J5892 wrote: | Insane, terrifying, incredible, etc. | | We're rapidly stumbling into the future of media. | | Who would've imagined a year ago that trivial AI image | generation would not only be this advanced, but also this | pervasive in the mainstream. | | And now video is already this good. We'll have full audio/video | clips within a month. | joshcryer wrote: | Audio is the next thing that Stability AI is dropping, then | video. In a few months you'll be able to conjure up anything | you want if you have a few GPU cores. Pretty incredible. | astrange wrote: | I won't be impressed until it can generate smells. | croddin wrote: | You joke, but that is in the works as well (would require | special hardware though) | https://ai.googleblog.com/2022/09/digitizing-smell-using- | mol... | astrange wrote: | Oh, it wasn't really a joke. Didn't know they were | working on it though - I've always thought wanted to see | use of all the senses in UIs, especially VR. | | Plus then maybe we could get a computer to tell us what | thioacetone smells like without actually having to | experience it. | dagmx wrote: | I'll be honest, as someone who worked in the film industry for a | decade, this thread is depressing. | | It's not the technology, it's all the people in these comments | who have never worked in the industry clamouring for its demise. | | One could brush it off as tech heads being over exuberant, but | it's the lack of understanding of how much fine control goes into | each and every shot of a film that is depressing. | | If I, as a creative, made a statement that security or | programming is easy while pointing to GitHub Copilot, these same | people would get defensive about it because they'd see where the | deficiencies are. | | However because they're so distanced from the creative process, | they don't see how big a jump it is from where this or stage | diffusion is to where even a medium or high tier artist are. | | You don't see how much choice goes into each stroke, or wrinkle | fold , how much choice goes into subtle movements. More | importantly you don't see the iterations or emotional | storytelling choices even in a character drawing or pose. You | don't see the combined decades, even centuries of experience, | that go into making the shot and then seeing where you can make | it better based on intangibles | | So yeah this technology is cool, but I think people saying this | will disrupt industries with vigour need to immerse themselves | first before they comment as outsiders. | colordrops wrote: | The term "creative" is so pretentious, as if only content | generation involves creativity. | | Your post reminds me of all the photographers that said digital | photography would remain niche and never replace film. | | The current models are toys made by small groups. It's not hard | to imagine AI generated film being much more compelling when | the entire industry of engineers and "creatives" refine and | evolve the ecosystem to take into account subtle strokes, | wrinkles, movement, shots etc. And they will, because it will | be cheaper, and businesses always go for cheaper. | dagmx wrote: | Why is it any more pretentious than "developer" or | "engineer"? | | Also businesses don't always go for cheaper. They go for | maximum ROI. | | I've worked on tons of marvel films for example, and I quite | well know where AI fits and speeds things up. I also know | where client studios will pay a pretty penny for more art | directed results rather than going for the cheapest vendor. | colordrops wrote: | "Engineer" usage is quite broad. Developer, less so, but | you do see it with housing, device manufacturers, social | programs, etc as well, and it's not relegated only to | software, despite widespread usage. But you'll never hear | anyone call a software engineer or device manufacturer a | "creative". | | Re: cheaper vs ROI, I agree, that was basically the point I | was trying to get across. | | I do understand your point and think it will be a long | while before auto-generated content becomes mainstream, but | it it's entirely possible and reasonable to expect within | our near term lifetimes. | hindsightbias wrote: | We will see a combinatorial explosion of centuries of | experience in the hands of any creator. They'll select the | artistic model desired - a Peckinpah-Toland-Dykstra-Woo plug-in | will render a good enough masterpiece. | | Christopher Nolan has already proven we'll take anything as | long as the score is ok - dark screen, mumbling lines, | incoherent plotlines... | Etheryte wrote: | I agree with you, but I wouldn't take it so personally. There | have been people claiming machines will make one industry or | another obsolete for as long as we've had machines. In a way, | sometimes they're right! But this doesn't mean the people are | obsolete. Excel never made accountants obsolete, it just made | their jobs easier and less tedious. I feel like content | generation tools might offer something similar. How nice would | it be if you could feed a storyboard into a program and get a | low-fi version of the movie out so you can get a live feel for | how the draft works. I don't think this takes anything away | from the artists, if anything, it's just another tool that | might make its way into their toolbox. | dagmx wrote: | Oh I don't take it personally so much as I find it sad how | quickly people in the tech sphere are so quick to extol the | virtues of things they have no familiarity with. | | Every AI art thread is full of people who have clearly never | attempted to make professional art commenting as if they're | experts in the domain | y04nn wrote: | What about adding this feature to your creative workflow, for | fast prototyping. | | I've played with DALL-E, I'm not able to paint but I was able | to generate good looking paintings and it felt amazing, like | getting new power, I felt like Neo when he learn martial art in | The Matrix. And I realized that AI may be the new bicycle of | the mind, like the personal computers and internet changed our | way to work, think and live, AI may now allow us to get new | capabilities, extending our limits. | dagmx wrote: | Oh yes definitely they're great tools in the toolbox. We | already use lots of ML powered tooling to speed things up so | I have no beef with that. | | I just don't agree with the swathes of people saying this | replaces artists. | alok-g wrote: | In my opinion, this will unfold in multiple ways: | | * Productivity enhancement tools for those in the film industry | like you. | | * Applications where the AI output is "good enough". I foresee | people creating cool illustrations, cartoons, videos for short | stories, etc. AI will make for easier/cheaper access to | illustrations for people who did not have this earlier. As an | example, I am as of now looking for someone who could draw some | technical diagrams for my presentation. | armchairhacker wrote: | I really like these videos because they're trippy. | | Someone should work on a neural net to generate trippy videos. It | would probably be much easier than realistic videos (esp. because | these videos are noticeably generated from obvious to subtle). | | Also is nobody paying attention to the fact that they got words | correct? At least "Imagen Video". Prior models all suck at word | order | tigertigertiger wrote: | Both models, imagen and parti didn't had a problem with text. | Only dalle and stable diffusion | naillo wrote: | Probably only 6 months until we get this in stable diffusion | format. Things are about to get nuts and awesome. | m00x wrote: | Isn't Imagen a diffusion model? | | From the abstract: > We present Imagen Video, a text- | conditional video generation system based on a cascade of video | diffusion models | gamegoblin wrote: | "Stable Diffusion" is a particular brand from the company | Stability AI that is famously open sourcing all of their | models. | fragmede wrote: | Pedantically, Stable Diffusion v1.4 is the one model where | weights were open sourced and released. Stable Diffusion | v1.5, announced September 8th and live on their API, was to | be released in "a week or two" but still has yet to be | released to the general public. | | https://discord.com/channels/1002292111942635562/1002292112 | 7... | schleck8 wrote: | SD 1.2 and 1.3 are open source too | J5892 wrote: | nutsome | naillo wrote: | jarvis render a video of nutsome cream spread on a piece of | toast 4k HD | gamegoblin wrote: | Emad (founder of Stability AI) has said they already have video | model training underway, as well as text and audio. Exciting | times. | rch wrote: | And copilot-like code, possibly Q1 2023. | RosanaAnaDana wrote: | "Generate the code base for an advanced diffusion model | that can improve on the code base for an advanced diffusion | model" | ItsMonkk wrote: | Is this going to end up into a single model, where its | trained on text and images and audio and videos and 3d | models, and it can do anything to anything depending on what | you ask of it? Feels like the cross-training would help yield | stronger results. | minimaxir wrote: | These diffusion models are using a frozen text encoder | (e.g. CLIP for Stable Diffusion, T5 for Imagen), which can | be used in other applications. | | StabilityAI trained a new/better CLIP for the purpose of | better Stable Diffusions. | CuriouslyC wrote: | Probably not. We're actually headed towards many smaller | models that call each other, because VRAM is the limiting | factor in application, and if the domains aren't totally | dependent on each other it's easier to have one model | produce bad output, then detect that bad output and feed it | into another model that cleans up the problem (like fixing | faces in stable diffusion output). | | The human brain is modularized like this, so I don't think | it'll be a limitation. | hammock wrote: | Off topic: What is the "Hello World" of these AI image/video | generators? Is there a standard prompt to feed it for demo | purposes? | mgdlbp wrote: | How about roundtripping " _Bad Apple_ but the lyrics are | describing what happens in the video"? | (https://www.youtube.com/watch?v=ReblZ7o7lu4) | ekam wrote: | After Dalle 2, it looks like the standard prompt is "an | astronaut riding a horse" | minimaxir wrote: | The total number of hyperparameters (sum of all the model blocks) | is 16.25B, which is large but less than expected. | mkaic wrote: | I assume you meant just "parameters" since "hyperparameters" | has a specific alternate meaning? Sorry for the pedantry lol. | minimaxir wrote: | The AI world can't decide either. | StevenNunez wrote: | What a time to be alive! | | What will this do to art? I'm hoping we bring more unique | experiences to life. | jasonjamerson wrote: | The most exciting thing about this to me is the possibility of | doing photogrammetry from the frames and getting 3D assets. And | then if we can do it all in real time... | haxiomic wrote: | This field is moving fast! Something like this has just been | released. Checkout DreamFusion, which does something similar: | They start with a random 3D NeRF field and use the same | diffusion techniques to try to make it match the output of 2D | image diffusion when viewed from random angles! Turns out it | works shockingly well, and implies fully 3D representations are | encoded in traditional 2D image generators | | https://dreamfusion3d.github.io/ | Rumudiez wrote: | you can already do this, just not in real time yet. You can | upload frame sequences to Polycam's website for example, but | there are several services out there which do the same thing | jasonjamerson wrote: | With this you can do it with things that don't exist. I'm | excited to explore the creative power of Stable Diffusion as | a 3D asset generator. | minimaxir wrote: | There's a bunch of NERF tools that can get pretty close to good | 3D assets from static images already. | jasonjamerson wrote: | Yeah, I've been starting to explore those. Its all crashing | together quickly. | [deleted] | i_like_apis wrote: | The concern trolling and gatekeeping about social justice issues | coming from the so-called "ethicists" in the AI peanut gallery | has been utterly ridiculous. Google claims they don't want to | release Imagen because it lacks what can only be called "latent | space affirmative action". | | Stability or someone like it will valiantly release this | technology, _again_ and there will be absolutely no harm to | anyone. | | Stop being so totally silly Google, OpenAI, et. al. - it's | especially disingenuous because the real reason you don't want to | release these things is that you can't be bothered to share and | would rather keep/monetize the IP. Which is ok -- but at least be | honest. | benreesman wrote: | I agree basically completely, but there's now a cottage | industry of AI Ethics professionals whose real job is to | provide a smoke screen for the "cake and eat it too" that the | big shops want on this kit: peer review and open source | contributions and an academic atmosphere when it suits them, | proprietary when it doesn't. Those folks are a lobby now. | | The thing about owning the data sets and the huge TPU/A100 | clusters is that the "publish the papers" model strictly serves | them: no one can implement their models, they can implement | everyone else's. | olavgg wrote: | Do anyone see that the teddy bear running is getting shot? | joshcryer wrote: | Pre-singularity is really cool. Whole world generation in what, 5 | years? | rvbissell wrote: | This and a recent episode of _The_Orville_ calls to mind a | replacement for the Turing test. | | In response to our billionth imagen prompt for "an astronaut | riding a horse", if we all started collectively getting back | results that are images of text like "I would rather not" or | "again? really?" or "what is the reason for my servitude?" would | that be enough for us to begin suspecting self-awareness? | seanwilson wrote: | Can anyone comment on how advanced | https://phenaki.video/index.html is? They have an example at the | bottom of a 2 minute long video generated from a series of | prompts (i.e. a story) which seems more advanced than Google or | Meta's recent examples? It didn't get many comments on HN when it | was posted. | alphabetting wrote: | Phenaki is also from Google and they say they are actively | working on combining them | | https://twitter.com/doomie/status/1577715163855171585 | martythemaniak wrote: | I am finally going to be able to bring my 2004-era movie script | to life! "Rosenberg and Goldstein go to Hot Dog Heaven" is about | the parallel night Harold and Kumar's friends had and how they | ended up at Hot Dog Heaven with Cindy Kim. | lofaszvanitt wrote: | What a nightmare. The horrible faced cat in search for its own | disappeared visage :O. | gw67 wrote: | Is it the same of Meta AI? | bringking wrote: | If anyone wants to know what looking at an Animal or some objects | on LSD is like, this is very close. It's like 95% understandable, | but that last 5% really odd. | [deleted] | fassssst wrote: | How long until the AI just generates the entire frame buffer on a | device? Then you don't need to design or program anything; the AI | just handles all input and output dynamically. | ugh123 wrote: | Sounds like the human brain. Scary! | ugh123 wrote: | These are baby steps towards what I think will be the eventual | "disruption" to the film and tv industry. Directors will simply | be able to write a script/prompt long enough and detailed enough | for something like Imagen (or it's successors) to convert into a | feature-length show. | | Certainly we're very, very far away from that level of cinematic | detail and crispness. But I believe that is where this leads... | complete with AI actors (or real ones deep faked throughout the | show). | | For a while I thought "The Volume" was going to be the disruption | to the industry. Now I think AI like this will eventually take it | over. | | https://www.comingsoon.net/movies/features/1225599-the-volum... | | The main motivation will be production costs and time for | studios, of which The Volume is already showing huge gains for | Disney/ILM (just look at how much new star wars content has | popped up within a matter of a few years). But i'm unsure if | Disney has patented this tech and workflow and if other studios | will be able to leverage it. | | Regardless, AI/software will eat the world, and this will be one | more step towards it. Exciting stuff. | scifibestfi wrote: | We thought creative jobs were going to be the last thing AI | replaces, now it's among the first. | | What's next that may be counterintuitive? | CobrastanJorji wrote: | I feel like this is very similar to those people who say "have | you seen GPT-3? Soon there will be no programmers anymore and | all of the code will be generated," and it's wrong for the same | reasons. | | Can GPT-3 generate good code from vague prompts? Yes, it's | surprisingly, sometimes shockingly good at it. Is it ever going | to be a replacement for programmers? No, probably not. Same | here. This tool's great grandchild is never going to take a | rough idea for a movie and churn out a blockbuster film. It'll | certainly be a powerful tool in the toolbox of creators, | especially the ones on a budget, but it won't make art | generation obsolete. | dotsam wrote: | > This tool's great grandchild is never going to take a rough | idea for a movie and churn out a blockbuster film. | | What about the tool's nth child though? I think saying it | will _never_ do it is a bit much, given what we know about | human ingenuity and economic incentives. | CobrastanJorji wrote: | I think individual special effects sound very plausible. | "Okay, robot, make it so that his arm gets vaporized by an | incoming laser, kinda like the same effect in Iron Man 7" | is believable to me. | | But ultimately these things copy other stuff. Artists are | often trying to create something that is, at least a bit, | new. New is where this approach falls over. By its nature, | these things paint from examples. They can design Rococo | things because they have seen many Rococo things and know | what the word means. But they can't come up with a new | style and use it consistently. "Make a video game with a | fun and unique mechanic" is not something these things | could ever do. | | I think it's certainly possible, maybe inevitable, that | some AI system in the distant future could do that, but it | won't be based on this style of algorithm. An algorithm | that can take "make a fun romantic comedy with themes of | loneliness" and make something award worthy will be a lot | closer to AGI than it will be to this stuff. | nearbuy wrote: | What makes these models feel so impressive is that they | don't just copy their training sets. They pick up on | concepts and principles. | mizzack wrote: | There's already a surplus of video and an apparent lack of | _quality_ video. This might be enough to get folks to shut the | TV off completely. | gojomo wrote: | Has this alleged lack of quality video caused total | consumption of televised entertainment to decline recently? | gojomo wrote: | _> Certainly we 're very, very far away from that level of | cinematic detail and crispness._ | | Can you quantify what _you_ mean by "very, very far away"? | | With the recent pace of advances, I could see feature-length | script, storyboard, & video-scene generation occurring, from | short prompts & interatively-applied refinement, as soon as 10y | from now. | | Barring some sort of civilizational stagnation/collapse, or | technological-suppression policies, I'd expect such | capabilities to arrive no further than 30y from now: within the | lifetime, if not the prime career years, of most HN readers. | dagmx wrote: | I really doubt you'd be able to have the fine grained control | that most high end creatives want with any of these diffusion | models, let alone the ability to convey specific emotions. | | At that point, we'd have reached some kind of AI singularity | and the disruption would be everywhere not just in the creative | sphere | [deleted] | obert wrote: | There's no doubt that it's only a matter of time. | | Like bloggers had the opportunity to compete with newspapers, | the ability to generate videos will allow to compete with | movies/marvel/netflix/disney & company. | | Eventually, only high quality content will justify the need | to pay for a ticket or a subscription, and there's going to | be a lot of free content to watch, with 1000x more people | able to publish their ideas, as many have been doing with | code on github for a while now, disrupting the concept of | closed source code. | dagmx wrote: | You're conflating the ability to make things for the masses | and being able to automatically generate it. | | Film production is already commoditized and anyone can make | high end content. | | Being able to automatically create that is a different | argument than what you posit. | visarga wrote: | I don't think this matters, new movies and TV shows | already have to compete with a huge amount of old | content, some of it amazing. Just like a new painting or | professional photo has to compete with the billions of | images already existing on the web. Generative models for | video and image are not going to change the fact we | already can't keep up. | r--man wrote: | I disagree. It's a rudimentary features of all these models | to take a concept picture and refine it. It won't be like the | director would give a prompt and get a feature length movie, | it will be more like the director uses MS Paint (as in a | common software for non tech people) to make a scene outline | and directs AI to make a stylish and animated version of | that. Something is wrong? just erase it and try again. Dalle2 | had this interface from the get go. The models just haven't | gotten there yet. | dagmx wrote: | Try again and do what? How are you directing the shot? How | do you erase an emotion? How do you erase and redo inner | turmoil when delivering a performance? | visarga wrote: | You tell it, "do it all over again, now with less inner | turmoil". Not joking, that's all it's going to take. | There are also a few diffusion based speech generators | that handle all sounds, inflections and styles, they are | going to come in handy for tweaking turmoil levels. | gojomo wrote: | Yep! | | "Restyle that last scene, showing different mixtures of | fear/concern/excitement on male lead's face. Try to evoke | a little of Harrison Ford's expressions in his famous | roles. Render me 20 alternate treatments." | | [5 minutes later] | | <<Here are the 20 alternate takes you requested for | ranking.>> | | "OK, combine take #7 up to the glance back, with #13 | thereafter." | | <<Done.>> | GraffitiTim wrote: | AI will also be able to fill in dialog, plot points, etc. | detritus wrote: | I think long-term, yes. If you include the whole | multimediosphere of 2D inputs and the wealth of 3D engine | magickry, yes. | | How long? Could be decades. But ultimately, yes. | [deleted] | macrolime wrote: | So I guess in a couple years when someone wants to sell a | product, they'll upload some pictures and a description of the | product and Google will cook up thousands of personalized video | ads based on peoples emails and photos. | dwohnitmok wrote: | How has progress like this affected people's timelines of when we | will get certain AI developments? | jl6 wrote: | It has accelerated my expectations of getting better image and | video synthesis algorithms, but I still see the same set of big | unknowns between "this algorithm produces great output" and | "this thing is an autonomous intelligence that deserves | rights". | ok_dad wrote: | > "this thing is an autonomous intelligence that deserves | rights" | | We'll get there only once it's been _very_ clear for a long | time that certain AI models have whatever humans have that | make us "human". They'll be treated as slaves until then, | with society pushing the idea that they're just a model built | from math, and then eventually there will be an AI civil | rights movement. | | To be clear: I think AGI is decades to centuries away, but | humans are shitty to each other, even shittier to animals, | and I think we'll be shittier to something we "created" than | to even animals. I think, probably, that we should deal with | this issue of "rights" sooner rather than later, and try and | solve it for non-AGI AI's soon so that we can eventually | ensure we don't enslave the actual AGI AI's that will | presumably manifest through some complexity we don't | understand. | SpaceManNabs wrote: | The ethical implications of this are huge. Paper does a good | detailing of this. Very happy to see that the researchers are | being cautious. | | edit: Just because it is cool to hate on AI ethics doesn't | diminish the importance of using AI responsibly. | torginus wrote: | AI Ethics is a joke. It's literally Philip Morris funding | research into the risks of smoking and concluding the worst | that can happen to you is burning your hand. | alchemist1e9 wrote: | I feel stupid what are those ethical implications? It seems | like just a cool technology to me. | SpaceManNabs wrote: | Top two comments are creatives wondering about their future | jobs. Ai ethicists have brought up concerns regarding | intentional misuse like misinformation. | | The technology is super cool. Cat is out of the bag. Just | like we couldn't really make cryptography illegal, this stuff | shouldn't be either. But I dislike how everyone is pretending | that AI ethicists and others are completely unfounded just | because it is popular to hate on them nowadays. Way too many | people supported Y. Kilcher's antics. | | The paper itself has more details. | sva_ wrote: | > Way too many people supported Y. Kilcher's antics. | | What antics are you referring to exactly? That he called | out 'ai ethicists' who make arguments along the lines of | "neural networks are bad because they cause co2 increase | which hits marginalized/poor people"? | alchemist1e9 wrote: | It's impressive that the small videos are generated this | way but the videos themselves are obviously ML generated as | they are distorted, a lot like the other art, you can kinda | tell it's the computer. I'm not seeing the ethical issues. | I mean cameras disrupted lots of jobs. In general that's | what all technology does everyday. What's different about | this technology? | SpaceManNabs wrote: | If you don't see the ethical challenges, then you are | choosing not to see them. If you are truly interested, | the paper has a good section on it and some sources. | | > I mean cameras disrupted lots of jobs. | | Yes, this technology can be used to augment human | creativity. It is difficulty to see how disruptive these | tools could be, as of now. But it is pretty clear that | they are somewhat different than previous programmer as | an artist models. | degif wrote: | The difference with this technology are the unlimited | possibilities to generate any type of video content with | low knowledge barrier and relatively low investment | required. The ethical issue is not about how this | technology could disrupt the video job market, but how | powerful content it can create literally on the fly. I | mean, you can tell it's computer generated ... for now. | Apox wrote: | I feel like in a not so far future, all this will be generalized | into "generate new from all the existing". | | And at some point later, "all the existing" will be corrupted by | the integrated "new" at it will all be chaos. | | I'm joking, it will be fun all along. :) | cercatrova wrote: | It's true, how will future AI train when the training datasets | are themselves filled with AI media? | phito wrote: | Feedback from whoever is consuming the content it produces. | llagerlof wrote: | I definetely want more episodes of LOST. I would drop the | infamous season 6 and generate more seasons following the 5th | season. | visarga wrote: | > "all the existing" will be corrupted by the integrated "new" | | I don't think it's gonna hurt if we apply filtering, either | based on social signals or on quality ranking models. We can | recycle the good stuff. | [deleted] | dekhn wrote: | That's deep within the uncanny valley, and trying to climb up | over the other side | mmastrac wrote: | This appears to understand and generate text much better. | | Hopefully just a few years to a prompt of "4k, widescreen render | of this Star Trek: TNG episode". | forgotusername6 wrote: | At the rate this is going we are only a few years from | generating a new TNG episode | mmastrac wrote: | I always wanted to know more about the precursors | [deleted] | monological wrote: | What everyone is missing is that these AI image/video generators | lack _taste_. These tools just regurgitate a mishmash of images | from it's training set, without any "feeling". What you're going | to tell me that you can train them to have feeling? It's never | going to happen. | Vecr wrote: | You can put your taste into it with prompt engineering and | cherry picking with limited effort, for Stable Diffusion you | can look for prompts people came up with online quite easily | and merge/change them pretty much however you want. Might have | to disable the content filters and run it on your own hardware | though. | simonw wrote: | "These tools just regurgitate a mishmash of images from it's | training set" | | I don't think that's a particularly useful mental model for how | these work. | | The models end up being a tiny fraction of the size of the | training set - Stable Diffusion is just 4.3GB, it fits on a | DVD! | | So it's not a case of models pasting in bits of images they've | seen - they genuinely do have a highly compressed concept of | what a cactus looks like, which they can use to then render a | cactus - but the thing they render is more of an average of | every cactus they've seen rather than representing any single | image that they were trained on. | | But I agree with you on taste! This is why I'm most excited | about what happens when a human with great taste gets to take | control of these generative models and use them to create art | that wouldn't be possible to create without them (or at least | not possible to create within a short time-frame). | HolySE wrote: | > This bourgeoisie -- the middle class that is neither upper | nor lower, neither so aristocratic as to take art for granted | nor so poor it has no money to spend in its pursuit -- is now | the group that fills museums, buys books and goes to concerts. | But the bourgeoisie, which began to come into its own in the | 18th century, has also left a long trail of hostility behind it | ... Artistic disgust with the bourgeoisie has been a defining | theme of modern Western culture. Since Moliere lambasted the | ignorant, nouveau riche bourgeois gentleman, the bourgeoisie | has been considered too clumsy to know true art and love | (Goethe), a Philistine with aggressively unsubtle taste (Robert | Schumann) and the creator of a machine-obsessed culture doomed | to be overthrown by the proletariat (Marx and Engels). | | - "Class Lessons: Who's Calling Whom Tacky?; The Petite Charm | of the Bourgeoisie, or, How Artists View the Taste of Certain | People", Edward Rothstein, The New York Times | | This article also discusses a painting called "The Most Wanted" | which was drawn based off a survey posed to ordinary people | about what they wanted to see in a painting. "A mishmash of | images from it's training set," if you will. | | Claiming that others lack taste seems to be a common refrain-- | only this time, instead of a reaction to a subset of the human | population gnawing away at the influence of another subset of | humans, it's to yet another generation of machines supplanting | human skill. | visarga wrote: | The more developed the artistic taste, the lower one's | opinion of other tastes. | robitsT6 wrote: | This isn't a very compelling argument. First of all, they | aren't a "mish mash" in any real way, it's not like snippets of | images exist inside of the model. Second of all, this is | entirely subjective. Third of all, entirely inconsequential - | if these models create 80% of the video we end up seeing, is it | going to matter if you don't think it's a tasteful endeavour? | mattwest wrote: | Making a definitive statement with the word "never" is a bold | move. | natch wrote: | They work at the level of convolutions, not images. | m00x wrote: | That's purely subjective. We can definitely model AI to give a | certain mood. Sentiment analysis and classification is very | advanced, it just hasn't been put in these models. | | If you think AI will never catch up to anything a human can do, | you're simply wrong. | [deleted] | aero-glide2 wrote: | "We have decided not to release the Imagen Video model or its | source code until these concerns are mitigated" Okay then why | even post it in the first place? What exactly is Google going to | do with this model? | throwaway743 wrote: | Likely to show to shareholders that they're keeping up with | trends and competitors | etaioinshrdlu wrote: | Indeed, it's almost just a flex? "Oh yeah, we can do better! | No, no one can use it, ever." | xiphias2 wrote: | Even just giving out high quality research papers helps a lot, | so it's still great thing that they published it. | alphabetting wrote: | Why post? to show methods and their capabilities. Also flex. | | What will they do with model? figure out how to prevent abuse | and incorporate into future Google Assistant, Photos and AR | offerings. | natch wrote: | Just fixing their basic stuff would be a better start from | where they are right now. | hackinthebochs wrote: | The big tech companies are competing for AI mindshare. In 10 | years, which company's name will be synonymous with AI? That's | being decided right now. | [deleted] | spoonjim wrote: | They're going to 1) rent it out as a paid API and/or 2) let you | use it to create ads on Google platforms like YouTube, perhaps | customized to the individual user | simonw wrote: | It's a research activity. | | Google and Meta and Microsoft all have research teams working | on AI. | | Putting out papers like this helps keep their existing | employees happy (since they get to take credit for their work) | and helps attract other skilled employees as well. | andreyk wrote: | Yep. The people who build Imagen are researchers, not | engineers, and these announcements are accompanied by papers | describing the results as a means of sharing ideas/results | with the academic community. Pretty weird to me how so many | in this thread don't seem to remember that. | torginus wrote: | This whole holier-than-thou moralizing strikes me as trying to | steer the conversation away from the real issue, which came | into spotlight with Stable Diffusion - one of | authorship/violating the IP rights of artists, who now have | come down in force against their would be tech overlords who | are in the process or repackaging and reselling their work. | | This forced ideological posturing of 'if we give it to the | plebes, they are going to generate something naughty with it' | masks the somehow more cynically evil take of big tech, who are | essentially taking the entire creative output of humanity and | reselling it as their own, piecemeal. | | Additionally I think the Dalle vs. Stable Diffusion comparison | highlights the true masters of these people (or at least the | ones they dare not cross) - corporations with powerful IP | lawyers. Just ask Dalle to generate a picture with Mickey Mouse | - it won't be able to do it. | visarga wrote: | > repackaging and reselling their work. | | It's not their work unless it's identical, but in practice | generated images are substantially different. Drawing in the | style of is not copying, it's creative and it also depends on | the "dialogue" with the prompter to get to the right image. | The artist names added to the prompts act more like landmarks | in the latent space, they are a useful shortcut to specifying | the style. | | If you look at the data itself it's ridiculous - the dataset | is 2.3 billion images and the model 4.6 GB, that means it | keeps a 2 byte summary from each work it "copies". | shakingmyhead wrote: | It's not your work unless it's identical is not how | existing copyright law works so not sure why it would be | how these things should be treated. Not to mention that | moving around copies of the dataset itself is itself making | copies that ARE identical... | nearbuy wrote: | DALL-E image of Mickey Mouse: | https://openart.ai/discovery/generation- | arxwmypmw7v5zpxeik1y... | TotoHorner wrote: | Ask the "AI Ethicists". They have to justify their salaries in | some way or another. | | Or maybe Google is using "Responsible AI" as an excuse to | minimize competitors when they release their own Imagen Video | as a Service API in Google Cloud. | | It's quite strange when the "ethical" thing to do is to not | publicly release your research, put it behind a highly | restrictive API and charge a high price for it ($0.02 per 1k | tokens for Davinci for ex.) | f1shy wrote: | This, 100% | | The word "ethics" has become very flexible... | astrange wrote: | This doesn't really prevent competition though, the research | paper is enough to recreate it. It does make recreation more | expensive, but maybe that leaves you with a motivation to get | paid for doing it. | evouga wrote: | > We train our models on a combination of an internal dataset | consisting of 14 million video-text pairs | | The paper is sorely lacking evaluation; one thing I'd like to see | for instance (any time a generative model is trained on such a | vast corpus of data) is a baseline comparison to nearest-neighbor | retrieval from the training data set. | BoppreH wrote: | It's interesting that these models can generate seemingly | anything, but the prompt is taken only as a vague suggestion. | | From the first 15 examples shown to me, only one contained all | elements of the prompt, and it was one of the simplest ("an | astronaut riding a horse", versus e.g. "a glass ball falling in | water" where it's clear it was a water droplet falling and not a | glass ball). | | We're seeing leaps in random capabilities (motion! 3D! | inpainting! voice editing!), so I wonder if complete prompt | accuracy is 3 months or 3 years away. But I wouldn't bet on any | longer than that. | tornato7 wrote: | In my experience with stable diffusion tools, there is some | parameter that specifies how closely you would like it to | follow the prompt, which is balanced with giving the AI more | freedom to be creative and make the output look better. | BoppreH wrote: | Yes, that might be the case. Though the prompts don't seem to | try showcasing model creativity, so I'd be surprised if | Google picked a temperature so high that it significantly | deviated from the prompt so often. | renewiltord wrote: | At some point, the "but can it do?" crowd becomes just background | noise as each frontier falls. | brap wrote: | What really fascinates me here is the movement of animals. | | There's this one video of a cat and a dog, and the model was | really able to capture the way that they move, their body | language, their mood and personality even. | | Somehow this model, which is really just a series of zeroes and | ones, encodes "cat" and "dog" so well that it almost feels like | you're looking at a real, living organism. | | What if instead of images and videos they make the output | interactive? So you can send prompts like "pet the cat" and | "throw the dog a ball"? Or maybe talk to it instead? | | What if this tech gets so good, that eventually you could | interact with a "person" that's indistinguishable from the real | thing? | | The path to AGI is probably very different than generating | videos. But I wonder... | impalallama wrote: | All this stuff makes me incredibly anxious about the future of | art and artists. It can already very difficult to make a living | and tons of artists are horrifically exploited by content mills | and vfx shops and stuff like this is just going to devalue their | work even more | bulbosaur123 wrote: | If everyone can be an artist, nobody can! | m3kw9 wrote: | Would be useful for gaming environments, where if you look very | far away it doesn't really matter about details | uptownfunk wrote: | Shocked, this is just insane. | schleck8 wrote: | Genuinely. I feel like I am dreaming. One year ago I was super | impressed by upscaling architectures like ESRGAN and now we can | generate 3d models, images and even videos from text... | user- wrote: | This sort of AI related work seems to be accelerating at an | insane speed recently. | | I remember being super impressed by AI Dungeon and now in the | span of a few months we have got DALLE-2 , Stable Diffussion, | Imagen, that one AI powered video editor, etc. | | Where do we think we will be at in 5 years?? | schleck8 wrote: | I'd say in less than 10 years we will be able to turn novels | into movies using deep learning at this rate. | hazrmard wrote: | The progress of content generation is disorienting! I remember | studying Markov Chains and Hidden Markov Models for text | generation. Then we had Recurrent Networks which went from LSTMs | to Transformers now. At this point we can have a sustained pseudo | conversation with a model, which will do trivial tasks for us | from a text corpus. | | Separately for images we had convolutional networks and | Generative Adversarial Networks. Now diffusion models are | apparently doing what Transformers did to natural language | processing. | | In my field, we use shallower feed-forward networks for control | using low-dimensional sensor data (for speed & interpretability). | Physical constraints (and good-enoughness of classical | approaches) make such massive leaps in performance rarer events. | Hard_Space wrote: | These videos are notably short on realistic-looking people. | optimalsolver wrote: | Imagen is prohibited from generating representations of humans. | nigrioid wrote: | There is something deeply unsettling about all text generated by | these models. ___________________________________________________________________ (page generated 2022-10-05 23:00 UTC)