[HN Gopher] DreamFusion: Text-to-3D using 2D Diffusion ___________________________________________________________________ DreamFusion: Text-to-3D using 2D Diffusion Author : going_ham Score : 327 points Date : 2022-09-29 18:57 UTC (4 hours ago) (HTM) web link (dreamfusion3d.github.io) (TXT) w3m dump (dreamfusion3d.github.io) | etaioinshrdlu wrote: | Huh, it's a pretty similar technique to what I outlined a couple | days ago: https://news.ycombinator.com/item?id=32965139 | | Although they start with random initialization and a text prompt. | It seems to work well. I now see no reason we can't start with | image initialization! | efrank3 wrote: | The version that you proposed wouldn't have worked | ToJans wrote: | "those who say it cannot be done should not interrupt the | people doing it" | parasj wrote: | Correct link with full demo: https://dreamfusion3d.github.io/ | kentlyons wrote: | That link also has a link to the authors and the paper | preprint. | dang wrote: | Changed now. Thanks! | MitPitt wrote: | Coincidentally came out the same day as Meta's text-to-video. I | wonder if Google deliberately held out the release to make a | bigger impact somehow? | astrange wrote: | I think it's because of ICLR deadlines. | bm-rf wrote: | Would they publish it anonymously? I'd bet they'd want to take | credit somehow. | kmonsen wrote: | Someone posted the "correct" URL that has names: | https://dreamfusion3d.github.io/ | blondin wrote: | nvidia also released GET3D[1] a few days ago. research seems to | be heading towards similar goals. | | [1]: https://github.com/nv-tlabs/GET3D | coolca wrote: | This is like magic to me. The pace at which we are getting these | tool amazes me. | edgartaor wrote: | I don't see a person in the gallery. It's capable of generate a | 3D model of me with only a photo? | sirianth wrote: | Is there code for any of these models? Or a collab? Ajay Jain's | colab doesn't work, but I would love to see a colab for this. | _just7_ wrote: | My guess is we have to wait a good 3 months time before someone | makes a opensource version | sirianth wrote: | Silly replying to myself I know, but I had more thoughts. I'm | an architect for 3D worlds and I am desperate, lol, for this | kind of tool. I use both blender and grasshopper, but I use | midjourney to think and prototype all the time. Obvious but it | would be astonishing to have something like this for game | worlds. I used another version of this to create "a forest | emerging from an aircraft carrier" | https://www.instagram.com/p/CiRfXKzpnLC/ but the technique | didn't have good resolution yet (high fidelity). | jaggs wrote: | You should try the AUTOMATIC1111 version of stable diffusion. | It's crazy fast and has great results - | https://github.com/AUTOMATIC1111/stable-diffusion- | webui/wiki... | achr2 wrote: | The thing that frightens me is that we are rapidly reaching broad | humanity disrupting ML technologies without any of the social or | societal frameworks to cope with it. | narrator wrote: | There were bigger disruptions in the past. The telegraph, | railroads, explosives. "The Devils" by Dostoevsky is a great | fictional account of what all these technological disruptions | do to the fragile social order in the late 19th century Russia | countryside. All of a sudden all these foreign people, ideas , | technology and commerce start streaming in to these once | isolated communities. | throwaway675309 wrote: | I'm usually not a fan of this general hand wringing / fear | mongering around ML that a lot of people with too much time and | not enough STEM background constantly bring up. | | Stable diffusion has been made available to the public for | quite a while now and if anything has disproved a lot of the | ungrounded nonsense that made companies like OpenAI censor | their generative models. | modeless wrote: | The most incredible thing here is that this demonstrates a level | of 3D understanding that I didn't believe existed in 2D image | models yet. All of the 3D information in the output was inferred | from the training set, which is exclusively uncurated and | unsorted 2D still images. No 3D models, no camera parameters, no | depth maps. No information about picture content other than a | text label (scraped from the web and often incorrect!). | | From a pile of random undifferentiated images the model has | learned the detailed 3D structure and plausible poses and | variants of thousands (millions?) of everyday objects. And all we | needed to get that 3D information out of the model was the right | sampling procedure. | bmpoole wrote: | Co-author here - we were also surprised :) The breadth of | knowledge of the visual world embedded in these 2D models and | what they unlock is astounding. | Vt71fcAqt7 wrote: | Any word about how Nerf -> marching cubes works? I thought | that was still an open problem. Is that another discovery in | this research paper? | adamredwoods wrote: | So I wonder if unusual angles that normally do not get | photographed will be distorted? For example, underneath a table | looking up. | jacobr1 wrote: | They reapply noise to the potentially distorted image and | then predict the de-noised version like the originally | rendered first frame. So the image is at least internally | consistent for the frame (to the extend the the system | generates consistency whatsoever). | | The example with a squirrel wearing a hoodie demonstrates an | interesting edge case, the "front" of the squirrel (with | hoodie over the head) show a normal hooded face as expected, | but when you rotate to the "back" you get another face where | the hoodie is low over the eyes. Each looks fine in | isolation, but in aggregate it seems like we have a two-faced | squirrel. | yarg wrote: | It'll be delusions and guesses, rather than distortions. | | It'll just make up some colours and geometries that don't | contradict anything it already knows from the defined | perspectives. | | Or leave it empty. | bmpoole wrote: | Yes, this is often a problem. We use view-dependent prompts | (e.g. "cat wearing sunglasses, back view") but the pretrained | 2D model often does not do a good job of interpreting non- | canonical views and will put sunglasses on the back of the | cats head (as well as the front). | salawat wrote: | >cat wearing sunglasses, back view") | | Bad prompt, missing implied antecedent/ambiguous subject... | | You may want: | | Back view of a cat which is wearing sunglasses, back view | of a cat, but the view is wearing sunglasses, etc... I | actually tried using projective terms from drafting books, | and didn't get great results. Nor anatomicals either. | LordDragonfang wrote: | Some of the "mesh exports" used as examples on the page | actually show this, to some extent. Look specifically at the | plush corgi's belly and the weird geometry on the underside | of the lemur's book, and to a lesser extent the underside of | the bedsheet-ghost. | GistNoesis wrote: | As far as I understand from a quick read of the paper, the 2D | diffusion doesn't have a 3D understanding. It probably have | some sort of local neighborhood understanding, aka small | geometric transformation of objects map close to each other in | the diffusion space (That's why like with latent spaces you can | "interpolate" (https://replicate.com/andreasjansson/stable- | diffusion-animat...) in the diffusion space). | | But that's not really surprising because when you have enough | data, even simple clustering methods group objects like faces | by the direction they are looking to. With enough views even a | simple L2 distance in pixel space allow t-SNE to do that. | | They are injecting the 3D constraints via the NERF and an | optimization process to add the consistency between the frames. | | It's a deep dream process that optimize by alternating updates | for 3D consistency, and updates for text-to-2Dimage | correspondence. It's searching for a solution that satisfy | these two constraints at the same time. | | Even though they only need to run a single diffusion step to | get the update direction, this optimization process is quite | long : 1h30 (but they are not using things like instant Nerf | (or even simple voxel grids) ). | | But this will allow for creation of a dataset of 3D objects | with corresponding text, which will then allow to train a | diffusion model that will have a 3D understanding and will be | able to generate 3D objects directly with a single diffusion | process. | samuell wrote: | Gives a new perspective on a classic verse: | | "For he spoke, and it came to be; | | he commanded, and it stood firm." | | Psalm 33:9, NIV | | :) | IshKebab wrote: | Not really though. | macawfish wrote: | So does this mean I can use DreamBooth to create plausible NERFs | of myself in any scenario? The future is looking weird. | parasj wrote: | @dang The link should be updated to | https://dreamfusion3d.github.io | joewhatkins wrote: | This is crazy good - most prior text-to-3d models produced weird | amorphous blobs that would kind of look like the prompt from some | angles, but had no actual spatial consistency. | | Blown away by how quickly this stuff is advancing, even as | someone who's relatively cynical about AI art. | drKarl wrote: | Amazing! How long then until we get photorealistic AI generated | 3D VR games and experiences in the metaverse? | drKarl wrote: | Why the downvote? I wasn't being sarcastic, it was a honest | question, I'm really impressed how far this technology has come | since GPT-3 2 years ago to DALl-E and Stable Diffusion ro | Meta's text to video to this... | inerte wrote: | I was wondering the same thing in the other thread about Text | to Video. Someone asked about 3D Blender models, which made | me think about animating blender models. Bang, now on this | thread we see animated images... it does feel like we can get | to asking for a 3D environment, put on a VR glass and | experience it. And with outpainting, that we can even change | it in real time. | | It's totally sci-fi, and at the same time seems to be | possible? I am amazed how even image generation evolved over | the last year, but that's just me daydreaming. | macrolime wrote: | Or in-painting with AR glasses. Change things in the real | world just by looking at it (with eye tracking) and say | what you want it changed into. | macrolime wrote: | This sounds like something that could be made to work with stable | diffusion if someone just implements the code based on the paper. | coolspot wrote: | Give it a week or two... | naillo wrote: | It's funny that the authors are 'anonymous' but they have access | to Imagen so obviously it's by Google. | cuuupid wrote: | A large portion of the ML community (rightly) discredits Google | papers because: | | - they rarely provide the data or code used so it's basically | "i swear it works bro" research | | - what they achieve is usually through having the most pristine | dataset on the planet and is often unusable by other | researchers | | - other times they publish papers that are basically "we | slightly modified this excellent open source paper, slapped an | internal name on it and trained it on our proprietary dataset" | | - sometimes they achieve remarkably little but their papers | still get a shiny spot because they're a big name and sponsor | all the conferences | | - they've also been caught trying to patent/copyright ML | techniques; disregarding that this is the same as privatizing | math, these are often techniques they plainly didn't come up | with | | Also ever since OpenAI did their "we have to go closed-source | for-profit to save humanity" PR campaign, every company that | releases models that can achieve a large amount in NLP/CV gets | dragged by the media and equated to Skynet. | alphabetting wrote: | A ton of big advances in AI that community has benefitted | from have been from published google research | oldgradstudent wrote: | > Paper under double-blind review | | Once the paper is accepted (or rejected) the names may be | revealed. | | Though, in reality, the reviewers can often easily tell who | wrote the paper. | joewhatkins wrote: | This is par for the course - there have been other instances | where an 'anonymous' paper mentioned training on a cluster of | TPUs that weren't publicly available yet - dead giveaway it was | Google. | VikingCoder wrote: | Dead giveaway... Dead giveaway... | googlryas wrote: | Lots of reasons to stay anonymous besides for hiding what org | is behind the paper. Maybe they don't want to be kidnapped by | the North Koreans and forced to produce new paens with "lost | footage" to Kim Il-Sung. | [deleted] | parasj wrote: | The full author list is on the updated link at: | https://dreamfusion3d.github.io/ | dang wrote: | Url changed from https://dreamfusionpaper.github.io/ to the page | that names the authors. | RosanaAnaDana wrote: | This is getting asymptotic. | layer8 wrote: | Progress often happens in waves. There will be a trough again. | sva_ wrote: | Seems a bit like a tsunami currently. But I wonder how we'll | think about it 10 years from now. | gfodor wrote: | AI might be different - as has been predicted for many years | now - due to the compounding effects on intelligence. | arisAlexis wrote: | Is it a light version of script when the AGI comes fast | ml_basics wrote: | Awesome! I wonder how long it will be until there is an open | source implementation compatible with Stable Diffusion | owenpalmer wrote: | Source? | jonas21 wrote: | Can someone explain what's going on in this example from the | gallery? The prompt is "a humanoid robot using a rolling pin to | roll out dough": | | https://dreamfusion-cdn.ajayj.com/gallery_sept28/crf20/a_DSL... | | But if you look closely, the pin looks like it's actually rolling | across the dough as the camera orbits. | WithinReason wrote: | The rolling pin is above the table but the shading is wrong | because they don't render shadows. | ajayjain wrote: | Hi! Ajay here. Correct, our shading model doesn't compute | intersections, since that's a bit challenging with a NeRF | scene representation. | chaps wrote: | Super interesting work. Do you think that's a solvable | problem and something you'll work on? | golemotron wrote: | Anonymously authored research is very ominous. | parasj wrote: | The full author list is on the updated link at: | https://dreamfusion3d.github.io/ | VikingCoder wrote: | We're quickly approaching HNFusion: Text-to-HN-Article-That- | Implements-That-Idea ... | O__________O wrote: | Unclear to me what is going on, but there's another URL that | lists the authors names. Given it's possible this change was done | for reason, not linking to it, but strikes me as odd it's still | up. Anyone know what's going on without causing problems for the | authors? | parasj wrote: | This link was from OpenReview which must be anonymous (double | blind). The full author list is on the updated link at: | https://dreamfusion3d.github.io | O__________O wrote: | Aware of the link, though you have not provided any | clarification for why there are two links; strikes me as odd | if authors are trying to post it anonymously that simple | Google finds authors names. | gersh wrote: | Is code available? | yarg wrote: | Cool. | | The samples are lacking definition, but they're otherwise | spatially stable across perspectives. | | That's something that's been struggled with for years. | jianshen wrote: | Did we hit some sort of technical inflection point in the last | couple of weeks or is this just coincidence that all of these ML | papers around high quality procedural generation are just | dropping every other day? | the8472 wrote: | This has been going on for years. The applications are just | crossing thresholds now that are more salient for people, e.g. | doing art. | sirianth wrote: | yay | alphabetting wrote: | Maybe deadline for neurips which is coming up? | grandmczeb wrote: | This was submitted to ICLR | macrolocal wrote: | Conference season? | [deleted] | dr_dshiv wrote: | It's called the technological singularity. Pretty fun so far! | AStrangeMorrow wrote: | This isn't what is usually meant by "technological | singularity". It is an inflection point where technology | growth becomes incontrollable and unpredictable, usually | theorized to be cause by a self improving agent (/AI) that | becomes smarter with each of its iterations. This is still | standard technological progress, human control, even if very | fast | layer8 wrote: | From the abstract: "We introduce a loss based on probability | density distillation that enables the use of a 2D diffusion | model as a prior for optimization of a parametric image | generator. Using this loss in a DeepDream-like procedure, we | optimize a randomly-initialized 3D model (a Neural Radiance | Field, or NeRF) via gradient descent such that its 2D | renderings from random angles achieve a low loss." | | This seems like basically plugging a couple of techniques | together that already existed, allowing to turn 2D text-to- | image into 3D text-to-image. | macawfish wrote: | Time and time again these ML techniques are proving to be | wildly modular and pluggable. Maybe sooner or later someone | will build a framework for end to end text-to-effective-ML- | architecture that will just plug different things together | and optimize them. | lbotos wrote: | I think this is what huggingface (github for machine | learning) is trying with diffusers lib: | https://huggingface.co/docs/diffusers/index | | They have others as well. | sdan wrote: | > This seems like basically plugging a couple of techniques | together that already existed | | as with a majority of ML research | WithinReason wrote: | Isn't that what the Singularity was described as a few | decades ago? Progress so fast it's unpredictable even in | the short term. | rileyphone wrote: | Same as it ever was, scientific revolutions arrive all at | once, punctuating otherwise uneventful periods. As I | understand, the present one is the product of the paper | "Attention is all you need": | https://arxiv.org/pdf/1706.03762.pdf. | ramesh31 wrote: | >as with a majority of ML research | | Plus "we did the same thing, but with 10x the compute | resources". | | But yeah. | [deleted] | anigbrowl wrote: | True (I made such a proposal myself a few hours ago, albeit | in vaguer terms). The thing is deployment infrastructure is | good enough now that we can just treat it as modular signal | flows and experiment a lot without having to engineer a | whole pile of custom infrastructure for each impulsive | experiment. | beambot wrote: | > This seems like basically plugging a couple of techniques | together that already existed [...] | | In his Lex Fridman interview, John Carmack makes similar | assertions about this prospect for AGI: That it will likely | be the clever combination of existing primitives (plus maybe | a couple novel new ones) that make the first AGI feasible in | just a couple thousand lines of code. | aliqot wrote: | That's a great example that reminds me of another one: | there was nothing new about Bitcoin conceptually, it was | all concepts we already had just in a new combination. IRC, | Hashing, Proof of Work, Distributed Consensus, Difficulty | algorithms, you name it. Aside from Base58 there wasn't | much original other than the combination of those elements. | stavros wrote: | Base58 really should have been base57. | aliqot wrote: | Hello Stavros, I agree. When I look at the goals that | base58 sought to achieve, (eliminating visually similar | characters) I couldn't help but wonder why more | characters were not eliminated. There is quite a bit of | typeface androgyny when you consider case and face. | stavros wrote: | Yeah, I don't know why 1 was left in there, seems like a | lost opportunity. Discarding l, I, 0, O, but then leaving | 1? I wonder why. | aliqot wrote: | I can only assume it was for a superstitious reason so | that the original address prefixes could be a 1. This is | the only sense I can make from it. | ImprobableTruth wrote: | My hot take is that we're merely catching up on until recently | unutilized hardware improvements. There's nothing 'self- | improving', it's largely "just" scaled up methods or new, | clever applications of scaled up methods. | | The pace at which methods scale up is currently a lot faster | than hardware improvements, so unless these scaled up methods | become incredibly lucrative (not impossible), I think it's | quite likely we'll soon-ish (a couple years from now) see a | slowdown. | darkhorse222 wrote: | I think DALLE really kicked things into high gear. | gitfan86 wrote: | It has become clear since alphaGo that intelligence is an | emergent property of neural networks. Since then the time and | cost requirements to create a useful intelligence have been | coming down. The big change was in August when Stable Diffusion | was able to run on consumer hardware. Things were already | accelerating before August, but that has really kicked up the | speed because millions of people can play around with it and | discover intelligence applications, especially in the latent | space. | anigbrowl wrote: | They're AI generated, the singularity already happened but the | machines are trying to ease us into it. | samstave wrote: | Scary fn thought! | | and I agree with you! | | And the OP comment its by the magnanimous/infamous AnigBrowl | | You need to start doing AI legal admin ( I dont have the | terms, but you may - we need legal language to control how we | deal with AI) | | and @dang - kill the gosh darn "posting too fast" thing | | Jiminey Crickets I have talked to you abt this so many | times... | bhedgeoser wrote: | > Scary fn thought! | | I'm a kotlin programmer so it's a scary fun thought for me. | LordDragonfang wrote: | "You're posting too fast" is a limit that's manually | applied to accounts that have a history of "posting too | many low-quality comments too quickly and/or getting into | flame wars". You can email dang (hn@ycombinator.com) if you | think it was applied in error, but if it's been applied to | you more than once... you probably have a continuing | problem with proliferating flame wars or posting otherwise | combative comments. | samstave wrote: | THIS | | WTF - the singularity is closer than we thought!!! | Workaccount2 wrote: | SD is open source (for real open source) and the community has | been having a field day with it. | [deleted] ___________________________________________________________________ (page generated 2022-09-29 23:00 UTC)