[HN Gopher] A working implementation of text-to-3D DreamFusion, ... ___________________________________________________________________ A working implementation of text-to-3D DreamFusion, powered by Stable Diffusion Author : nopinsight Score : 203 points Date : 2022-10-06 15:12 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | xwdv wrote: | Feels like AI generated art is approaching a sort of singularity | at this point. Progress getting very exponential. | antegamisou wrote: | Only if you have low standards of what constitutes research on | artificial intelligence and art perception. | hwers wrote: | These things take months and months to train (hardly fast | progress). Any new model that's coming out is generally known | in the atmosphere (not unpredictable) and these applications | were pretty expected the day stable diffusion came out. | whazor wrote: | I would love a AI model that does text-to-SVG. | WASDx wrote: | OpenAI Codex can do that. | fsiefken wrote: | that would be lovely! perhaps it can also dream up interesting | game of life patterns | nobbis wrote: | Took a week, as predicted: | https://twitter.com/DvashElad/status/1575614411834011651 | nobbis wrote: | Key step in generating 3D - ask Stable Diffusion to score views | from different angles: for d in ['front', | 'side', 'back', 'side', 'overhead', 'bottom']: text = | f"{ref_text}, {d} view" | | https://github.com/ashawkey/stable-dreamfusion/blob/0cb8c0e0... | dwallin wrote: | Given the way the language model works these words could have | multiple meanings. I wonder if training a form of textual | inversion to more directly represent these concepts might | improve the results. You could even try teaching it to | represent more fine grained degree adjustments. | shadowgovt wrote: | I'm modestly surprised that those few angles give us enough | data to build out a full 3D render, but I guess I shouldn't | be too surprised, as that's tech that has had high demand and | been understood for years (those kind of front-cut / side-cut | images are what 3D artists use to do their initial prototypes | of objects if they're working from real-life models). | nobbis wrote: | DreamFusion doesn't directly build a 3D model from those | generated images. It starts with a completely random 3D | voxel model, renders it from 6 different angles, then asks | Stable Diffusion how plausible an image of "X, side view" | it is. | | It then sprinkles some noise on the rendering, makes Stable | Diffusion improve it a little, then adjusts the voxels to | produce that image (using differentiable rendering.) | | Rinse and repeat for hours. | shadowgovt wrote: | Thank you for the clarification; I hadn't grokked the | algorithm yet. | | That's interesting for a couple of reasons. I can see why | that works. It also implies that for closed objects, the | voxel data on the interior (where no images can see it) | will be complete noise, as there's no signal to pick any | color or lack of a voxel. | FeepingCreature wrote: | text = f"{ref_text}, front cutaway drawing" | | Maybe? | nobbis wrote: | Yes, although not complete noise - probably empty. | Haven't checked but assume there's regularization of the | NeRF parameters. | mhuffman wrote: | I don't think that NeRFs require too many image to make | impressive results. | [deleted] | bergenty wrote: | I can't wait to generate novel 3D models to CNC/3D print! Can | these be exported out as STL/OBJs? | samstave wrote: | Heckin A! | | I just got my 3D printer and was a bit too tipsy to assemble it | the day it arrived - and have several things I want to print... | | It will be interesting to experiment with describing the thing | I want to print with text instead of designing it in SolidEdge | and see what AI thinks.... | | I wonder if you can feed it specific dimensions? | | "A holder for a power supply for an e bike with two mounting | holes 120mm apart with a carry capacity that is 5 inches long | and 1.5 inches deep" | wongarsu wrote: | Well, here's 16 attempts of regular Stable Diffusion with | that prompt [1], and here's what it things a technical | drawing of it might look like [2]. | | Maybe two papers down the line :D For now you might have more | luck with something less specific. | | 1: https://i.imgur.com/RPNCwyM.png | | 2: https://i.imgur.com/c9pfM8U.png | samstave wrote: | Still dope... but are those also obj or stl? | | I like my DALLE expressions of "masterchief as ventruvian | man as drawn by da Vinci" | | And my "technical exploded diagrams of cybernetic eco | skeleton suits in blueprint" | | Try those out? | moron4hire wrote: | In the usage notes, there's a line that mentions | # test (exporting 360 video, and an obj mesh with png texture) | python main_nerf.py --text "a hamburger" --workspace trial -O | --test | | So I guess so. That's pretty awesome. | etaioinshrdlu wrote: | I think it would be interesting to convert to a polygon mesh | periodically in-the-loop. It could end up with more precise | models. | Cook4986 wrote: | Currently working with a student group to build out a 3D scene | generator (https://github.com/Cook4986/Longhand), and the | prospect of arbitrary, hyper-specific mesh arrays on demand is | thrilling. | | Right now, we are relying on the Sketchfab API to populate our | (Blender) scenes, which is an imperfect lens through which to | visualize the contents of texts that our non-technical | "clientele" are studying. | | Since we are publishing these scenes via WebXR (Hubs), we have | specific criteria related to poly counts (latency, bandwidth, | etc) and usability. Regarding the latter concern, it's not clear | that our end users will want to wait/pay for compute. | | *copyedited | sirianth wrote: | wow | Geee wrote: | Would be cool to see it adapted to img2img scenario, using one or | more 'seed images'. It would be closer to a standard NERF, but | also would be able to imagine novel angles and guidance with | prompt. | thehappypm wrote: | Omg, imagine how useful this would be for video games or movies. | Whipping up an asset in a matter of hours of computer time? | Amazing | sdwvit wrote: | All these news on image/3d/video generation just show how we live | in the middle of ai/ml breakthrough. Incredible to see news with | extreme progress in the field like this popping everyday. | dmingod666 wrote: | Browse Lexica.art your mind will be blown by the range and | amount of detail on some of the art. | | Like this (nsfw content): | https://lexica.art/?q=Intricate+goddess | | There is an addictive and trippy quality to this and it is yet | to hit mainstream -- The art itself is stunning but it goes | beyond that, the ability to nudge it around and make variations | to it is incredible. now add the fact that you can train it | with your own content. people are going to go bonkers with this | and it's going to open up a lot of debates too. | wongarsu wrote: | There's a small gallery of success and failure cases here [1]. | | It certainly doesn't look as good as the original, yet. I wonder | if that's due to the implementation differences noted, less | cherry picking in what they show, or inherent differences between | Imagen and Stable Diffusion. | | Maybe Imagen just has a much better grasp of how images translate | to actual objects, where Stable Diffusion is stuck more on the 2d | image plane. | | 1: https://github.com/ashawkey/stable-dreamfusion/issues/1 | shadowgovt wrote: | I feel like the Cthulhu head is extra-successful, given the | subject matter. | | Non-Euclidean back-polygon imaging? Good work, algorithm. ;) | nopenopenopeno wrote: | One cannot help but notice the success cases are expected to | have symmetry along at least one axis, whereas the failure | cases are not. | namarie wrote: | Aren't squirrels and frogs expected to have an axis of | symmetry? I think the reason for the failures is the presence | of faces; it seems to be trying to make a face visible from | all angles. | gpm wrote: | Which probably has a lot to do with us taking nearly all | our pictures of things with faces from the face facing | direction. | codeflo wrote: | I guess a lot can be done to force the model to create properly | connected 3D shapes instead of these thin protruding 2D slices. | But I noticed something else. Some of the angles "in between" | the frog faces have three eyes. I wonder if part of the issue | might be that those don't look especially wrong to Stable | Diffusion. It's often surprisingly confused about the number of | limbs it should generate. | jrmann100 wrote: | [Edited] The original Dreamfusion project was discussed here a | few days ago: https://news.ycombinator.com/item?id=33025446 | naet wrote: | This is a different project/implementation, based on the open | source stable diffusion instead of proprietary google imagegen | jrmann100 wrote: | Edited; thanks for clarifying! | | That makes this Stable-Dreamfusion adaptation even more | promising. | hwers wrote: | Only downside with this is that each mesh takes like 5 hours to | generate (on a v100 too). Obviously it'll speed up but we're far | from the panacea | etaioinshrdlu wrote: | How long does it take? 5 hours? | egypturnash wrote: | Jesus. Well thanks for your contribution to putting the entire | creative industry out of work, I guess, little anime girl icon | person. Ugh. | yieldcrv wrote: | tent cities by the beach can use the showers | ccity88 wrote: | How very pessimistic. We should never shirk technological | progress for fear of upsetting the status quo or established | agenda. All of this is only a matter of time away from | emerging. Have fun being on the forgotten side of history | uni_rule wrote: | This is a force multiplayer. It doesn't take the place of | artistic intent, dingus. Besides you can't accomplish much with | just "a model". This is an asset generator, hardly a threat to | anyone especially when these things will likely need some | weight painting to touch up anyway. | mrtranscendence wrote: | I don't think the commenter is upset that this _particular_ | model will be deployed, putting creative professionals out of | work. It's clearly a janky proof of concept. I think they're | upset about what follow on work could eventually mean. | axg11 wrote: | A new technology is developed with the potential to make you | 100x more efficient at your job. Today, a creative artist can | only contribute to a project through a narrow slice. Tomorrow, | the same creative artist can single-handedly orchestrate an | entire project. | astrange wrote: | It's more like there were tasks that were previously so | unproductive they couldn't be done at all, and now they're | productive enough you might be able to be employed doing | them. | | Automation creates jobs rather than destroying them. What | destroys jobs is mainly bad macroeconomic conditions. | cercatrova wrote: | Another day, another AI media generation project, and yet | another comment by egypturnash lamenting the "death of the | creative industry." | egypturnash wrote: | representative of the industry currently under threat of | disruption is not happy about this and continues to be vocal | about her unhappiness, film at 11 | Gabriel_Martin wrote: | A representative of the luddite contingent perhaps. | tluyben2 wrote: | I have a friend who is an old school professional artist from | before affordable computing who has been using AI (and | computers before that) to aid him with creations for many years | now and runs everything himself on his own machines (which is a | pretty expensive setup) experimenting and training and he loves | every iteration. | | But I guess what creative industry means to you? Pumping out | web UIs or 3d gaming models were never, for the most part, the | creative industry; learning to see what people like and copying | that for different situations is not necessarily creative and | thus what AI easily does; anything that doesn't come with a lot | of learning and practice and talent outside manual work will be | replaced by AI soon; the other stuff will take somewhat longer. | | If you think this can replace you, you weren't/aren't in the | creative industry. Same goes for coders afraid of no code. | egypturnash wrote: | So how many shitty "not creative industry" jobs did your | friend take on the way to where he could have "a pretty | expensive setup" to do this? What did he crank out solely to | earn a paycheck with his art skills? | tluyben2 wrote: | Your tone is not great, but he never had those jobs; he was | born in a poor family (for NL), but his talent was | recognised by HR Giger when he sent him a paintbrushed work | (via dhl with a frame and all on a whim) and that was | enough. He is not rich but makes a nice living. Note that | this is the EU; there is not much of a risk of dying under | a bridge even if you don't succeed. But he did as far as he | is concerned. He never compromised anything like you imply | he must have done. | | Edit; but you are also implying you think your job is gone | with stuff like this? What do you do? Also I am hoping I | will be replaced: I have been thinking I will be replaced | since the early 80s as my work as a programmer is not so | exciting (I love it and will keep doing it even if it's not | viable anymore, which I do believe for the 20% of people | who do niche work is very far off AI wise; like I think | with creative as well) but it seems closer now than ever. | | Edit2: looking at your profile work, you don't seem you | will be replaced by anything soon; what is the anger about? | Do you have public blogs/tweets about your feelings about | this; looking at your work (in your HN profile) you seem | the group not touched by this at all. | mrtranscendence wrote: | What is "soon" here? Admittedly I'm not particularly | sanguine about the prospects of AI generated art or code | taking many jobs in the near future, but at some point it | could well happen even to talented engineers and artists. | It's nice of _you_ to not mind being replaced, but of | course not everyone will be happy about existential | threats to their hard-earned livelihoods. | GistNoesis wrote: | >Stable-Diffusion is a latent diffusion model, which diffuses in | a latent space instead of the original image space. Therefore, we | need the loss to propagate back from the VAE's encoder | | There is also an alternative way to handle this latent difference | with the original paper that should also work : | | Instead of working in voxel color space, you push the latent to | the voxel (Aka instead of having a voxel grid of 3d rgb color, | you have a voxel grid of dimlatent latents, (you can also use | spherical harmonics if you want as it works just the same in nd) | ). | | Only the color prediction network differ, the density is kept the | same. | | The NERF then directly render to the latent space (so there are | less rays to render) which mean you need to decode it with the | VAE only for visualization purposes and not in the training loop. | hwers wrote: | This sounds really interesting but I'm not sure I follow. | Having a hard time expressing how I'm confused though (maybe | its unfamiliar nerf terminology) but if you have the time I'd | be very interested if you could reformulate this alternative | method somehow (I've been stuck on this very issue for two days | now trying to implement this myself). | baxtr wrote: | Can someone explain the significance of this? I am not familiar | what DreamFusion is. ___________________________________________________________________ (page generated 2022-10-06 23:00 UTC)