[HN Gopher] DreamFusion: Text-to-3D using 2D Diffusion
       DreamFusion: Text-to-3D using 2D Diffusion
       Author : going_ham
       Score  : 327 points
       Date   : 2022-09-29 18:57 UTC (4 hours ago)
 (HTM) web link (dreamfusion3d.github.io)
 (TXT) w3m dump (dreamfusion3d.github.io)
       | etaioinshrdlu wrote:
       | Huh, it's a pretty similar technique to what I outlined a couple
       | days ago: https://news.ycombinator.com/item?id=32965139
       | Although they start with random initialization and a text prompt.
       | It seems to work well. I now see no reason we can't start with
       | image initialization!
         | efrank3 wrote:
         | The version that you proposed wouldn't have worked
         | ToJans wrote:
         | "those who say it cannot be done should not interrupt the
         | people doing it"
       | parasj wrote:
       | Correct link with full demo: https://dreamfusion3d.github.io/
         | kentlyons wrote:
         | That link also has a link to the authors and the paper
         | preprint.
         | dang wrote:
         | Changed now. Thanks!
       | MitPitt wrote:
       | Coincidentally came out the same day as Meta's text-to-video. I
       | wonder if Google deliberately held out the release to make a
       | bigger impact somehow?
         | astrange wrote:
         | I think it's because of ICLR deadlines.
         | bm-rf wrote:
         | Would they publish it anonymously? I'd bet they'd want to take
         | credit somehow.
           | kmonsen wrote:
           | Someone posted the "correct" URL that has names:
           | https://dreamfusion3d.github.io/
         | blondin wrote:
         | nvidia also released GET3D[1] a few days ago. research seems to
         | be heading towards similar goals.
         | [1]: https://github.com/nv-tlabs/GET3D
       | coolca wrote:
       | This is like magic to me. The pace at which we are getting these
       | tool amazes me.
       | edgartaor wrote:
       | I don't see a person in the gallery. It's capable of generate a
       | 3D model of me with only a photo?
       | sirianth wrote:
       | Is there code for any of these models? Or a collab? Ajay Jain's
       | colab doesn't work, but I would love to see a colab for this.
         | _just7_ wrote:
         | My guess is we have to wait a good 3 months time before someone
         | makes a opensource version
         | sirianth wrote:
         | Silly replying to myself I know, but I had more thoughts. I'm
         | an architect for 3D worlds and I am desperate, lol, for this
         | kind of tool. I use both blender and grasshopper, but I use
         | midjourney to think and prototype all the time. Obvious but it
         | would be astonishing to have something like this for game
         | worlds. I used another version of this to create "a forest
         | emerging from an aircraft carrier"
         | https://www.instagram.com/p/CiRfXKzpnLC/ but the technique
         | didn't have good resolution yet (high fidelity).
           | jaggs wrote:
           | You should try the AUTOMATIC1111 version of stable diffusion.
           | It's crazy fast and has great results -
           | https://github.com/AUTOMATIC1111/stable-diffusion-
           | webui/wiki...
       | achr2 wrote:
       | The thing that frightens me is that we are rapidly reaching broad
       | humanity disrupting ML technologies without any of the social or
       | societal frameworks to cope with it.
         | narrator wrote:
         | There were bigger disruptions in the past. The telegraph,
         | railroads, explosives. "The Devils" by Dostoevsky is a great
         | fictional account of what all these technological disruptions
         | do to the fragile social order in the late 19th century Russia
         | countryside. All of a sudden all these foreign people, ideas ,
         | technology and commerce start streaming in to these once
         | isolated communities.
         | throwaway675309 wrote:
         | I'm usually not a fan of this general hand wringing / fear
         | mongering around ML that a lot of people with too much time and
         | not enough STEM background constantly bring up.
         | Stable diffusion has been made available to the public for
         | quite a while now and if anything has disproved a lot of the
         | ungrounded nonsense that made companies like OpenAI censor
         | their generative models.
       | modeless wrote:
       | The most incredible thing here is that this demonstrates a level
       | of 3D understanding that I didn't believe existed in 2D image
       | models yet. All of the 3D information in the output was inferred
       | from the training set, which is exclusively uncurated and
       | unsorted 2D still images. No 3D models, no camera parameters, no
       | depth maps. No information about picture content other than a
       | text label (scraped from the web and often incorrect!).
       | From a pile of random undifferentiated images the model has
       | learned the detailed 3D structure and plausible poses and
       | variants of thousands (millions?) of everyday objects. And all we
       | needed to get that 3D information out of the model was the right
       | sampling procedure.
         | bmpoole wrote:
         | Co-author here - we were also surprised :) The breadth of
         | knowledge of the visual world embedded in these 2D models and
         | what they unlock is astounding.
           | Vt71fcAqt7 wrote:
           | Any word about how Nerf -> marching cubes works? I thought
           | that was still an open problem. Is that another discovery in
           | this research paper?
         | adamredwoods wrote:
         | So I wonder if unusual angles that normally do not get
         | photographed will be distorted? For example, underneath a table
         | looking up.
           | jacobr1 wrote:
           | They reapply noise to the potentially distorted image and
           | then predict the de-noised version like the originally
           | rendered first frame. So the image is at least internally
           | consistent for the frame (to the extend the the system
           | generates consistency whatsoever).
           | The example with a squirrel wearing a hoodie demonstrates an
           | interesting edge case, the "front" of the squirrel (with
           | hoodie over the head) show a normal hooded face as expected,
           | but when you rotate to the "back" you get another face where
           | the hoodie is low over the eyes. Each looks fine in
           | isolation, but in aggregate it seems like we have a two-faced
           | squirrel.
           | yarg wrote:
           | It'll be delusions and guesses, rather than distortions.
           | It'll just make up some colours and geometries that don't
           | contradict anything it already knows from the defined
           | perspectives.
           | Or leave it empty.
           | bmpoole wrote:
           | Yes, this is often a problem. We use view-dependent prompts
           | (e.g. "cat wearing sunglasses, back view") but the pretrained
           | 2D model often does not do a good job of interpreting non-
           | canonical views and will put sunglasses on the back of the
           | cats head (as well as the front).
             | salawat wrote:
             | >cat wearing sunglasses, back view")
             | Bad prompt, missing implied antecedent/ambiguous subject...
             | You may want:
             | Back view of a cat which is wearing sunglasses, back view
             | of a cat, but the view is wearing sunglasses, etc... I
             | actually tried using projective terms from drafting books,
             | and didn't get great results. Nor anatomicals either.
           | LordDragonfang wrote:
           | Some of the "mesh exports" used as examples on the page
           | actually show this, to some extent. Look specifically at the
           | plush corgi's belly and the weird geometry on the underside
           | of the lemur's book, and to a lesser extent the underside of
           | the bedsheet-ghost.
         | GistNoesis wrote:
         | As far as I understand from a quick read of the paper, the 2D
         | diffusion doesn't have a 3D understanding. It probably have
         | some sort of local neighborhood understanding, aka small
         | geometric transformation of objects map close to each other in
         | the diffusion space (That's why like with latent spaces you can
         | "interpolate" (https://replicate.com/andreasjansson/stable-
         | diffusion-animat...) in the diffusion space).
         | But that's not really surprising because when you have enough
         | data, even simple clustering methods group objects like faces
         | by the direction they are looking to. With enough views even a
         | simple L2 distance in pixel space allow t-SNE to do that.
         | They are injecting the 3D constraints via the NERF and an
         | optimization process to add the consistency between the frames.
         | It's a deep dream process that optimize by alternating updates
         | for 3D consistency, and updates for text-to-2Dimage
         | correspondence. It's searching for a solution that satisfy
         | these two constraints at the same time.
         | Even though they only need to run a single diffusion step to
         | get the update direction, this optimization process is quite
         | long : 1h30 (but they are not using things like instant Nerf
         | (or even simple voxel grids) ).
         | But this will allow for creation of a dataset of 3D objects
         | with corresponding text, which will then allow to train a
         | diffusion model that will have a 3D understanding and will be
         | able to generate 3D objects directly with a single diffusion
         | process.
       | samuell wrote:
       | Gives a new perspective on a classic verse:
       | "For he spoke, and it came to be;
       | he commanded, and it stood firm."
       | Psalm 33:9, NIV
       | :)
         | IshKebab wrote:
         | Not really though.
       | macawfish wrote:
       | So does this mean I can use DreamBooth to create plausible NERFs
       | of myself in any scenario? The future is looking weird.
       | parasj wrote:
       | @dang The link should be updated to
       | https://dreamfusion3d.github.io
       | joewhatkins wrote:
       | This is crazy good - most prior text-to-3d models produced weird
       | amorphous blobs that would kind of look like the prompt from some
       | angles, but had no actual spatial consistency.
       | Blown away by how quickly this stuff is advancing, even as
       | someone who's relatively cynical about AI art.
       | drKarl wrote:
       | Amazing! How long then until we get photorealistic AI generated
       | 3D VR games and experiences in the metaverse?
         | drKarl wrote:
         | Why the downvote? I wasn't being sarcastic, it was a honest
         | question, I'm really impressed how far this technology has come
         | since GPT-3 2 years ago to DALl-E and Stable Diffusion ro
         | Meta's text to video to this...
           | inerte wrote:
           | I was wondering the same thing in the other thread about Text
           | to Video. Someone asked about 3D Blender models, which made
           | me think about animating blender models. Bang, now on this
           | thread we see animated images... it does feel like we can get
           | to asking for a 3D environment, put on a VR glass and
           | experience it. And with outpainting, that we can even change
           | it in real time.
           | It's totally sci-fi, and at the same time seems to be
           | possible? I am amazed how even image generation evolved over
           | the last year, but that's just me daydreaming.
             | macrolime wrote:
             | Or in-painting with AR glasses. Change things in the real
             | world just by looking at it (with eye tracking) and say
             | what you want it changed into.
       | macrolime wrote:
       | This sounds like something that could be made to work with stable
       | diffusion if someone just implements the code based on the paper.
         | coolspot wrote:
         | Give it a week or two...
       | naillo wrote:
       | It's funny that the authors are 'anonymous' but they have access
       | to Imagen so obviously it's by Google.
         | cuuupid wrote:
         | A large portion of the ML community (rightly) discredits Google
         | papers because:
         | - they rarely provide the data or code used so it's basically
         | "i swear it works bro" research
         | - what they achieve is usually through having the most pristine
         | dataset on the planet and is often unusable by other
         | researchers
         | - other times they publish papers that are basically "we
         | slightly modified this excellent open source paper, slapped an
         | internal name on it and trained it on our proprietary dataset"
         | - sometimes they achieve remarkably little but their papers
         | still get a shiny spot because they're a big name and sponsor
         | all the conferences
         | - they've also been caught trying to patent/copyright ML
         | techniques; disregarding that this is the same as privatizing
         | math, these are often techniques they plainly didn't come up
         | with
         | Also ever since OpenAI did their "we have to go closed-source
         | for-profit to save humanity" PR campaign, every company that
         | releases models that can achieve a large amount in NLP/CV gets
         | dragged by the media and equated to Skynet.
           | alphabetting wrote:
           | A ton of big advances in AI that community has benefitted
           | from have been from published google research
         | oldgradstudent wrote:
         | > Paper under double-blind review
         | Once the paper is accepted (or rejected) the names may be
         | revealed.
         | Though, in reality, the reviewers can often easily tell who
         | wrote the paper.
         | joewhatkins wrote:
         | This is par for the course - there have been other instances
         | where an 'anonymous' paper mentioned training on a cluster of
         | TPUs that weren't publicly available yet - dead giveaway it was
         | Google.
           | VikingCoder wrote:
           | Dead giveaway... Dead giveaway...
         | googlryas wrote:
         | Lots of reasons to stay anonymous besides for hiding what org
         | is behind the paper. Maybe they don't want to be kidnapped by
         | the North Koreans and forced to produce new paens with "lost
         | footage" to Kim Il-Sung.
         | [deleted]
         | parasj wrote:
         | The full author list is on the updated link at:
         | https://dreamfusion3d.github.io/
       | dang wrote:
       | Url changed from https://dreamfusionpaper.github.io/ to the page
       | that names the authors.
       | RosanaAnaDana wrote:
       | This is getting asymptotic.
         | layer8 wrote:
         | Progress often happens in waves. There will be a trough again.
           | sva_ wrote:
           | Seems a bit like a tsunami currently. But I wonder how we'll
           | think about it 10 years from now.
           | gfodor wrote:
           | AI might be different - as has been predicted for many years
           | now - due to the compounding effects on intelligence.
       | arisAlexis wrote:
       | Is it a light version of script when the AGI comes fast
       | ml_basics wrote:
       | Awesome! I wonder how long it will be until there is an open
       | source implementation compatible with Stable Diffusion
       | owenpalmer wrote:
       | Source?
       | jonas21 wrote:
       | Can someone explain what's going on in this example from the
       | gallery? The prompt is "a humanoid robot using a rolling pin to
       | roll out dough":
       | https://dreamfusion-cdn.ajayj.com/gallery_sept28/crf20/a_DSL...
       | But if you look closely, the pin looks like it's actually rolling
       | across the dough as the camera orbits.
         | WithinReason wrote:
         | The rolling pin is above the table but the shading is wrong
         | because they don't render shadows.
           | ajayjain wrote:
           | Hi! Ajay here. Correct, our shading model doesn't compute
           | intersections, since that's a bit challenging with a NeRF
           | scene representation.
             | chaps wrote:
             | Super interesting work. Do you think that's a solvable
             | problem and something you'll work on?
       | golemotron wrote:
       | Anonymously authored research is very ominous.
         | parasj wrote:
         | The full author list is on the updated link at:
         | https://dreamfusion3d.github.io/
       | VikingCoder wrote:
       | We're quickly approaching HNFusion: Text-to-HN-Article-That-
       | Implements-That-Idea ...
       | O__________O wrote:
       | Unclear to me what is going on, but there's another URL that
       | lists the authors names. Given it's possible this change was done
       | for reason, not linking to it, but strikes me as odd it's still
       | up. Anyone know what's going on without causing problems for the
       | authors?
         | parasj wrote:
         | This link was from OpenReview which must be anonymous (double
         | blind). The full author list is on the updated link at:
         | https://dreamfusion3d.github.io
           | O__________O wrote:
           | Aware of the link, though you have not provided any
           | clarification for why there are two links; strikes me as odd
           | if authors are trying to post it anonymously that simple
           | Google finds authors names.
       | gersh wrote:
       | Is code available?
       | yarg wrote:
       | Cool.
       | The samples are lacking definition, but they're otherwise
       | spatially stable across perspectives.
       | That's something that's been struggled with for years.
       | jianshen wrote:
       | Did we hit some sort of technical inflection point in the last
       | couple of weeks or is this just coincidence that all of these ML
       | papers around high quality procedural generation are just
       | dropping every other day?
         | the8472 wrote:
         | This has been going on for years. The applications are just
         | crossing thresholds now that are more salient for people, e.g.
         | doing art.
         | sirianth wrote:
         | yay
         | alphabetting wrote:
         | Maybe deadline for neurips which is coming up?
           | grandmczeb wrote:
           | This was submitted to ICLR
         | macrolocal wrote:
         | Conference season?
         | [deleted]
         | dr_dshiv wrote:
         | It's called the technological singularity. Pretty fun so far!
           | AStrangeMorrow wrote:
           | This isn't what is usually meant by "technological
           | singularity". It is an inflection point where technology
           | growth becomes incontrollable and unpredictable, usually
           | theorized to be cause by a self improving agent (/AI) that
           | becomes smarter with each of its iterations. This is still
           | standard technological progress, human control, even if very
           | fast
         | layer8 wrote:
         | From the abstract: "We introduce a loss based on probability
         | density distillation that enables the use of a 2D diffusion
         | model as a prior for optimization of a parametric image
         | generator. Using this loss in a DeepDream-like procedure, we
         | optimize a randomly-initialized 3D model (a Neural Radiance
         | Field, or NeRF) via gradient descent such that its 2D
         | renderings from random angles achieve a low loss."
         | This seems like basically plugging a couple of techniques
         | together that already existed, allowing to turn 2D text-to-
         | image into 3D text-to-image.
           | macawfish wrote:
           | Time and time again these ML techniques are proving to be
           | wildly modular and pluggable. Maybe sooner or later someone
           | will build a framework for end to end text-to-effective-ML-
           | architecture that will just plug different things together
           | and optimize them.
             | lbotos wrote:
             | I think this is what huggingface (github for machine
             | learning) is trying with diffusers lib:
             | https://huggingface.co/docs/diffusers/index
             | They have others as well.
           | sdan wrote:
           | > This seems like basically plugging a couple of techniques
           | together that already existed
           | as with a majority of ML research
             | WithinReason wrote:
             | Isn't that what the Singularity was described as a few
             | decades ago? Progress so fast it's unpredictable even in
             | the short term.
               | rileyphone wrote:
               | Same as it ever was, scientific revolutions arrive all at
               | once, punctuating otherwise uneventful periods. As I
               | understand, the present one is the product of the paper
               | "Attention is all you need":
               | https://arxiv.org/pdf/1706.03762.pdf.
             | ramesh31 wrote:
             | >as with a majority of ML research
             | Plus "we did the same thing, but with 10x the compute
             | resources".
             | But yeah.
             | [deleted]
             | anigbrowl wrote:
             | True (I made such a proposal myself a few hours ago, albeit
             | in vaguer terms). The thing is deployment infrastructure is
             | good enough now that we can just treat it as modular signal
             | flows and experiment a lot without having to engineer a
             | whole pile of custom infrastructure for each impulsive
             | experiment.
           | beambot wrote:
           | > This seems like basically plugging a couple of techniques
           | together that already existed [...]
           | In his Lex Fridman interview, John Carmack makes similar
           | assertions about this prospect for AGI: That it will likely
           | be the clever combination of existing primitives (plus maybe
           | a couple novel new ones) that make the first AGI feasible in
           | just a couple thousand lines of code.
             | aliqot wrote:
             | That's a great example that reminds me of another one:
             | there was nothing new about Bitcoin conceptually, it was
             | all concepts we already had just in a new combination. IRC,
             | Hashing, Proof of Work, Distributed Consensus, Difficulty
             | algorithms, you name it. Aside from Base58 there wasn't
             | much original other than the combination of those elements.
               | stavros wrote:
               | Base58 really should have been base57.
               | aliqot wrote:
               | Hello Stavros, I agree. When I look at the goals that
               | base58 sought to achieve, (eliminating visually similar
               | characters) I couldn't help but wonder why more
               | characters were not eliminated. There is quite a bit of
               | typeface androgyny when you consider case and face.
               | stavros wrote:
               | Yeah, I don't know why 1 was left in there, seems like a
               | lost opportunity. Discarding l, I, 0, O, but then leaving
               | 1? I wonder why.
               | aliqot wrote:
               | I can only assume it was for a superstitious reason so
               | that the original address prefixes could be a 1. This is
               | the only sense I can make from it.
         | ImprobableTruth wrote:
         | My hot take is that we're merely catching up on until recently
         | unutilized hardware improvements. There's nothing 'self-
         | improving', it's largely "just" scaled up methods or new,
         | clever applications of scaled up methods.
         | The pace at which methods scale up is currently a lot faster
         | than hardware improvements, so unless these scaled up methods
         | become incredibly lucrative (not impossible), I think it's
         | quite likely we'll soon-ish (a couple years from now) see a
         | slowdown.
         | darkhorse222 wrote:
         | I think DALLE really kicked things into high gear.
         | gitfan86 wrote:
         | It has become clear since alphaGo that intelligence is an
         | emergent property of neural networks. Since then the time and
         | cost requirements to create a useful intelligence have been
         | coming down. The big change was in August when Stable Diffusion
         | was able to run on consumer hardware. Things were already
         | accelerating before August, but that has really kicked up the
         | speed because millions of people can play around with it and
         | discover intelligence applications, especially in the latent
         | space.
         | anigbrowl wrote:
         | They're AI generated, the singularity already happened but the
         | machines are trying to ease us into it.
           | samstave wrote:
           | Scary fn thought!
           | and I agree with you!
           | And the OP comment its by the magnanimous/infamous AnigBrowl
           | You need to start doing AI legal admin ( I dont have the
           | terms, but you may - we need legal language to control how we
           | deal with AI)
           | and @dang - kill the gosh darn "posting too fast" thing
           | Jiminey Crickets I have talked to you abt this so many
           | times...
             | bhedgeoser wrote:
             | > Scary fn thought!
             | I'm a kotlin programmer so it's a scary fun thought for me.
             | LordDragonfang wrote:
             | "You're posting too fast" is a limit that's manually
             | applied to accounts that have a history of "posting too
             | many low-quality comments too quickly and/or getting into
             | flame wars". You can email dang (hn@ycombinator.com) if you
             | think it was applied in error, but if it's been applied to
             | you more than once... you probably have a continuing
             | problem with proliferating flame wars or posting otherwise
             | combative comments.
         | samstave wrote:
         | THIS
         | WTF - the singularity is closer than we thought!!!
         | Workaccount2 wrote:
         | SD is open source (for real open source) and the community has
         | been having a field day with it.
         | [deleted]
       (page generated 2022-09-29 23:00 UTC)