[HN Gopher] A working implementation of text-to-3D DreamFusion, ...
       ___________________________________________________________________
        
       A working implementation of text-to-3D DreamFusion, powered by
       Stable Diffusion
        
       Author : nopinsight
       Score  : 203 points
       Date   : 2022-10-06 15:12 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xwdv wrote:
       | Feels like AI generated art is approaching a sort of singularity
       | at this point. Progress getting very exponential.
        
         | antegamisou wrote:
         | Only if you have low standards of what constitutes research on
         | artificial intelligence and art perception.
        
         | hwers wrote:
         | These things take months and months to train (hardly fast
         | progress). Any new model that's coming out is generally known
         | in the atmosphere (not unpredictable) and these applications
         | were pretty expected the day stable diffusion came out.
        
       | whazor wrote:
       | I would love a AI model that does text-to-SVG.
        
         | WASDx wrote:
         | OpenAI Codex can do that.
        
         | fsiefken wrote:
         | that would be lovely! perhaps it can also dream up interesting
         | game of life patterns
        
       | nobbis wrote:
       | Took a week, as predicted:
       | https://twitter.com/DvashElad/status/1575614411834011651
        
         | nobbis wrote:
         | Key step in generating 3D - ask Stable Diffusion to score views
         | from different angles:                 for d in ['front',
         | 'side', 'back', 'side', 'overhead', 'bottom']:         text =
         | f"{ref_text}, {d} view"
         | 
         | https://github.com/ashawkey/stable-dreamfusion/blob/0cb8c0e0...
        
           | dwallin wrote:
           | Given the way the language model works these words could have
           | multiple meanings. I wonder if training a form of textual
           | inversion to more directly represent these concepts might
           | improve the results. You could even try teaching it to
           | represent more fine grained degree adjustments.
        
           | shadowgovt wrote:
           | I'm modestly surprised that those few angles give us enough
           | data to build out a full 3D render, but I guess I shouldn't
           | be too surprised, as that's tech that has had high demand and
           | been understood for years (those kind of front-cut / side-cut
           | images are what 3D artists use to do their initial prototypes
           | of objects if they're working from real-life models).
        
             | nobbis wrote:
             | DreamFusion doesn't directly build a 3D model from those
             | generated images. It starts with a completely random 3D
             | voxel model, renders it from 6 different angles, then asks
             | Stable Diffusion how plausible an image of "X, side view"
             | it is.
             | 
             | It then sprinkles some noise on the rendering, makes Stable
             | Diffusion improve it a little, then adjusts the voxels to
             | produce that image (using differentiable rendering.)
             | 
             | Rinse and repeat for hours.
        
               | shadowgovt wrote:
               | Thank you for the clarification; I hadn't grokked the
               | algorithm yet.
               | 
               | That's interesting for a couple of reasons. I can see why
               | that works. It also implies that for closed objects, the
               | voxel data on the interior (where no images can see it)
               | will be complete noise, as there's no signal to pick any
               | color or lack of a voxel.
        
               | FeepingCreature wrote:
               | text = f"{ref_text}, front cutaway drawing"
               | 
               | Maybe?
        
               | nobbis wrote:
               | Yes, although not complete noise - probably empty.
               | Haven't checked but assume there's regularization of the
               | NeRF parameters.
        
             | mhuffman wrote:
             | I don't think that NeRFs require too many image to make
             | impressive results.
        
             | [deleted]
        
       | bergenty wrote:
       | I can't wait to generate novel 3D models to CNC/3D print! Can
       | these be exported out as STL/OBJs?
        
         | samstave wrote:
         | Heckin A!
         | 
         | I just got my 3D printer and was a bit too tipsy to assemble it
         | the day it arrived - and have several things I want to print...
         | 
         | It will be interesting to experiment with describing the thing
         | I want to print with text instead of designing it in SolidEdge
         | and see what AI thinks....
         | 
         | I wonder if you can feed it specific dimensions?
         | 
         | "A holder for a power supply for an e bike with two mounting
         | holes 120mm apart with a carry capacity that is 5 inches long
         | and 1.5 inches deep"
        
           | wongarsu wrote:
           | Well, here's 16 attempts of regular Stable Diffusion with
           | that prompt [1], and here's what it things a technical
           | drawing of it might look like [2].
           | 
           | Maybe two papers down the line :D For now you might have more
           | luck with something less specific.
           | 
           | 1: https://i.imgur.com/RPNCwyM.png
           | 
           | 2: https://i.imgur.com/c9pfM8U.png
        
             | samstave wrote:
             | Still dope... but are those also obj or stl?
             | 
             | I like my DALLE expressions of "masterchief as ventruvian
             | man as drawn by da Vinci"
             | 
             | And my "technical exploded diagrams of cybernetic eco
             | skeleton suits in blueprint"
             | 
             | Try those out?
        
         | moron4hire wrote:
         | In the usage notes, there's a line that mentions
         | # test (exporting 360 video, and an obj mesh with png texture)
         | python main_nerf.py --text "a hamburger" --workspace trial -O
         | --test
         | 
         | So I guess so. That's pretty awesome.
        
       | etaioinshrdlu wrote:
       | I think it would be interesting to convert to a polygon mesh
       | periodically in-the-loop. It could end up with more precise
       | models.
        
       | Cook4986 wrote:
       | Currently working with a student group to build out a 3D scene
       | generator (https://github.com/Cook4986/Longhand), and the
       | prospect of arbitrary, hyper-specific mesh arrays on demand is
       | thrilling.
       | 
       | Right now, we are relying on the Sketchfab API to populate our
       | (Blender) scenes, which is an imperfect lens through which to
       | visualize the contents of texts that our non-technical
       | "clientele" are studying.
       | 
       | Since we are publishing these scenes via WebXR (Hubs), we have
       | specific criteria related to poly counts (latency, bandwidth,
       | etc) and usability. Regarding the latter concern, it's not clear
       | that our end users will want to wait/pay for compute.
       | 
       | *copyedited
        
         | sirianth wrote:
         | wow
        
       | Geee wrote:
       | Would be cool to see it adapted to img2img scenario, using one or
       | more 'seed images'. It would be closer to a standard NERF, but
       | also would be able to imagine novel angles and guidance with
       | prompt.
        
       | thehappypm wrote:
       | Omg, imagine how useful this would be for video games or movies.
       | Whipping up an asset in a matter of hours of computer time?
       | Amazing
        
       | sdwvit wrote:
       | All these news on image/3d/video generation just show how we live
       | in the middle of ai/ml breakthrough. Incredible to see news with
       | extreme progress in the field like this popping everyday.
        
         | dmingod666 wrote:
         | Browse Lexica.art your mind will be blown by the range and
         | amount of detail on some of the art.
         | 
         | Like this (nsfw content):
         | https://lexica.art/?q=Intricate+goddess
         | 
         | There is an addictive and trippy quality to this and it is yet
         | to hit mainstream -- The art itself is stunning but it goes
         | beyond that, the ability to nudge it around and make variations
         | to it is incredible. now add the fact that you can train it
         | with your own content. people are going to go bonkers with this
         | and it's going to open up a lot of debates too.
        
       | wongarsu wrote:
       | There's a small gallery of success and failure cases here [1].
       | 
       | It certainly doesn't look as good as the original, yet. I wonder
       | if that's due to the implementation differences noted, less
       | cherry picking in what they show, or inherent differences between
       | Imagen and Stable Diffusion.
       | 
       | Maybe Imagen just has a much better grasp of how images translate
       | to actual objects, where Stable Diffusion is stuck more on the 2d
       | image plane.
       | 
       | 1: https://github.com/ashawkey/stable-dreamfusion/issues/1
        
         | shadowgovt wrote:
         | I feel like the Cthulhu head is extra-successful, given the
         | subject matter.
         | 
         | Non-Euclidean back-polygon imaging? Good work, algorithm. ;)
        
         | nopenopenopeno wrote:
         | One cannot help but notice the success cases are expected to
         | have symmetry along at least one axis, whereas the failure
         | cases are not.
        
           | namarie wrote:
           | Aren't squirrels and frogs expected to have an axis of
           | symmetry? I think the reason for the failures is the presence
           | of faces; it seems to be trying to make a face visible from
           | all angles.
        
             | gpm wrote:
             | Which probably has a lot to do with us taking nearly all
             | our pictures of things with faces from the face facing
             | direction.
        
         | codeflo wrote:
         | I guess a lot can be done to force the model to create properly
         | connected 3D shapes instead of these thin protruding 2D slices.
         | But I noticed something else. Some of the angles "in between"
         | the frog faces have three eyes. I wonder if part of the issue
         | might be that those don't look especially wrong to Stable
         | Diffusion. It's often surprisingly confused about the number of
         | limbs it should generate.
        
       | jrmann100 wrote:
       | [Edited] The original Dreamfusion project was discussed here a
       | few days ago: https://news.ycombinator.com/item?id=33025446
        
         | naet wrote:
         | This is a different project/implementation, based on the open
         | source stable diffusion instead of proprietary google imagegen
        
           | jrmann100 wrote:
           | Edited; thanks for clarifying!
           | 
           | That makes this Stable-Dreamfusion adaptation even more
           | promising.
        
       | hwers wrote:
       | Only downside with this is that each mesh takes like 5 hours to
       | generate (on a v100 too). Obviously it'll speed up but we're far
       | from the panacea
        
       | etaioinshrdlu wrote:
       | How long does it take? 5 hours?
        
       | egypturnash wrote:
       | Jesus. Well thanks for your contribution to putting the entire
       | creative industry out of work, I guess, little anime girl icon
       | person. Ugh.
        
         | yieldcrv wrote:
         | tent cities by the beach can use the showers
        
         | ccity88 wrote:
         | How very pessimistic. We should never shirk technological
         | progress for fear of upsetting the status quo or established
         | agenda. All of this is only a matter of time away from
         | emerging. Have fun being on the forgotten side of history
        
         | uni_rule wrote:
         | This is a force multiplayer. It doesn't take the place of
         | artistic intent, dingus. Besides you can't accomplish much with
         | just "a model". This is an asset generator, hardly a threat to
         | anyone especially when these things will likely need some
         | weight painting to touch up anyway.
        
           | mrtranscendence wrote:
           | I don't think the commenter is upset that this _particular_
           | model will be deployed, putting creative professionals out of
           | work. It's clearly a janky proof of concept. I think they're
           | upset about what follow on work could eventually mean.
        
         | axg11 wrote:
         | A new technology is developed with the potential to make you
         | 100x more efficient at your job. Today, a creative artist can
         | only contribute to a project through a narrow slice. Tomorrow,
         | the same creative artist can single-handedly orchestrate an
         | entire project.
        
           | astrange wrote:
           | It's more like there were tasks that were previously so
           | unproductive they couldn't be done at all, and now they're
           | productive enough you might be able to be employed doing
           | them.
           | 
           | Automation creates jobs rather than destroying them. What
           | destroys jobs is mainly bad macroeconomic conditions.
        
         | cercatrova wrote:
         | Another day, another AI media generation project, and yet
         | another comment by egypturnash lamenting the "death of the
         | creative industry."
        
           | egypturnash wrote:
           | representative of the industry currently under threat of
           | disruption is not happy about this and continues to be vocal
           | about her unhappiness, film at 11
        
             | Gabriel_Martin wrote:
             | A representative of the luddite contingent perhaps.
        
         | tluyben2 wrote:
         | I have a friend who is an old school professional artist from
         | before affordable computing who has been using AI (and
         | computers before that) to aid him with creations for many years
         | now and runs everything himself on his own machines (which is a
         | pretty expensive setup) experimenting and training and he loves
         | every iteration.
         | 
         | But I guess what creative industry means to you? Pumping out
         | web UIs or 3d gaming models were never, for the most part, the
         | creative industry; learning to see what people like and copying
         | that for different situations is not necessarily creative and
         | thus what AI easily does; anything that doesn't come with a lot
         | of learning and practice and talent outside manual work will be
         | replaced by AI soon; the other stuff will take somewhat longer.
         | 
         | If you think this can replace you, you weren't/aren't in the
         | creative industry. Same goes for coders afraid of no code.
        
           | egypturnash wrote:
           | So how many shitty "not creative industry" jobs did your
           | friend take on the way to where he could have "a pretty
           | expensive setup" to do this? What did he crank out solely to
           | earn a paycheck with his art skills?
        
             | tluyben2 wrote:
             | Your tone is not great, but he never had those jobs; he was
             | born in a poor family (for NL), but his talent was
             | recognised by HR Giger when he sent him a paintbrushed work
             | (via dhl with a frame and all on a whim) and that was
             | enough. He is not rich but makes a nice living. Note that
             | this is the EU; there is not much of a risk of dying under
             | a bridge even if you don't succeed. But he did as far as he
             | is concerned. He never compromised anything like you imply
             | he must have done.
             | 
             | Edit; but you are also implying you think your job is gone
             | with stuff like this? What do you do? Also I am hoping I
             | will be replaced: I have been thinking I will be replaced
             | since the early 80s as my work as a programmer is not so
             | exciting (I love it and will keep doing it even if it's not
             | viable anymore, which I do believe for the 20% of people
             | who do niche work is very far off AI wise; like I think
             | with creative as well) but it seems closer now than ever.
             | 
             | Edit2: looking at your profile work, you don't seem you
             | will be replaced by anything soon; what is the anger about?
             | Do you have public blogs/tweets about your feelings about
             | this; looking at your work (in your HN profile) you seem
             | the group not touched by this at all.
        
               | mrtranscendence wrote:
               | What is "soon" here? Admittedly I'm not particularly
               | sanguine about the prospects of AI generated art or code
               | taking many jobs in the near future, but at some point it
               | could well happen even to talented engineers and artists.
               | It's nice of _you_ to not mind being replaced, but of
               | course not everyone will be happy about existential
               | threats to their hard-earned livelihoods.
        
       | GistNoesis wrote:
       | >Stable-Diffusion is a latent diffusion model, which diffuses in
       | a latent space instead of the original image space. Therefore, we
       | need the loss to propagate back from the VAE's encoder
       | 
       | There is also an alternative way to handle this latent difference
       | with the original paper that should also work :
       | 
       | Instead of working in voxel color space, you push the latent to
       | the voxel (Aka instead of having a voxel grid of 3d rgb color,
       | you have a voxel grid of dimlatent latents, (you can also use
       | spherical harmonics if you want as it works just the same in nd)
       | ).
       | 
       | Only the color prediction network differ, the density is kept the
       | same.
       | 
       | The NERF then directly render to the latent space (so there are
       | less rays to render) which mean you need to decode it with the
       | VAE only for visualization purposes and not in the training loop.
        
         | hwers wrote:
         | This sounds really interesting but I'm not sure I follow.
         | Having a hard time expressing how I'm confused though (maybe
         | its unfamiliar nerf terminology) but if you have the time I'd
         | be very interested if you could reformulate this alternative
         | method somehow (I've been stuck on this very issue for two days
         | now trying to implement this myself).
        
       | baxtr wrote:
       | Can someone explain the significance of this? I am not familiar
       | what DreamFusion is.
        
       ___________________________________________________________________
       (page generated 2022-10-06 23:00 UTC)