[HN Gopher] GET3D: A Generative Model of High Quality 3D Texture... ___________________________________________________________________ GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images Author : lnyan Score : 129 points Date : 2022-09-24 13:49 UTC (9 hours ago) (HTM) web link (nv-tlabs.github.io) (TXT) w3m dump (nv-tlabs.github.io) | ummonk wrote: | Still nowhere near good enough to be able to generate a VFX or | video game asset from some pictures, which is what we'd really | want for a practical application of such a tool. | mgraczyk wrote: | Generating good video game assets from pictures is solved, but | this does more than that. It generates modified versions from | words. | aaaaaaaaaaab wrote: | >Generating good video game assets from pictures is solved | | lol no, not at all. It still needs tons of manual work to get | it up to quality in terms of topology, material, etc. | mgraczyk wrote: | In terms of practical engineering it's not solved, I mean | that that SOTA in photogrammetry is good enough to create | high quality textures and meshes directly from pictures. | aaaaaaaaaaab wrote: | Those meshes and textures are far from usable for | realtime rendering in a 3D game. | etaioinshrdlu wrote: | In somewhat related topics, I think we can just use stable | diffusion to help convert single photos to 3D NERFs. | | 1. find the prompt that best generates the image | | 2. generate a (crude) NERF from your starting image and render | views from other angles | | 3. use stable diffusion with the views from other angles as seed | images, refine them using the prompt from 1 combined with(add | descriptions to generate "view from back", "view from top", etc | | 4. feed the refined views back to the NERF generator, keeping the | initial photo view constant | | 5. Generate new views from the NERF, which should now be much | more realistic. | | Run the above steps 2-5 in a loop indefinitely. Eventually you | should end up with a highly accurate, realistic NERF which is | full 3d from any angle, all from a single photo. | | Similar techniques could be used to extend the scene in all | directions. | eutectic wrote: | I have my doubts that this will converge to anything | meaningful. | aliqot wrote: | In the short term you may be right, but in the long run it's | a certainty you won't. | rsp1984 wrote: | The problem with such an approach would be that NERFs require a | set of input images _with their exact poses_ and exact poses | are only available if the underlying geometry is static. | However if you use SD to generate new views it 's only an | approximation and you wouldn't be able to get the exact poses. | | Not all hope is lost though. I'm pretty sure in a few years | (perhaps sooner) we'll be able to generate entire 3D scenes | directly without going through 2D images as an intermediate | step. | londons_explore wrote: | I don't see it as a blocker... Especially if you alternate | NeRF and SD iterations - ie. don't generate a whole image | each time, instead just choose a random angle, do NeRF | reconstruction, then run a single SD iteration on that NeRF, | and do another training step of the NeRF. | | That way, you know the exact pose for each image, because you | chose it when rendering the NeRF. | eutectic wrote: | You can optimize the poses as part of the model. | bno1 wrote: | An AI that does good UV unwrapping would be much more interesting | and useful. | smoldesu wrote: | And I'd love to see an AI-generated rigging tool for auto- | generating bone structures so you don't have to do it by hand. | | Baby steps, though. The data required to train an | unwrapping/rigging tool is a lot more domain-specific than | correlating an OBJ file with it's completed render. | rytill wrote: | Can you help me understand how you are imagining that? As in, | you have texture images already and you want to apply them to | the 3D object intelligently? Is that the case you're talking | about? Or texture generation and UV unwrapping in one? | caenorst wrote: | I think what they mean by UV unwrapping is generating a UV | map from. Textured model (here the texture is generated by a | triplanes network). | | It's interesting for compression purpose, but kinda | orthogonal to this method (a good UV unwrapping can be used | once the model generated) | bno1 wrote: | Generating an UV map from an untextured mesh. The UV map is | stored in vertices as texture coordinates, and together with | the topology of the mesh it defines how the texture (images) | gets mapped to the mesh. A good UV maps preserve surface area | (e.g. every region of the mesh map to regions of the texture | proportionally, otherwise you get stretching), have few seams | and have little empty space around the texture to reduce | size. | | There are ways to do this automatically but they're far from | perfect. Artists usually take the mesh and literally unwrap | it until it's planar, and convert this transformed mesh to | the UV mapping. The advantage of this method is that it gives | you very good control of seams and texture islands, but it's | tricky to preserve surface area. | | Those neural rendering methods are very cool because they use | light/color fields, but they still have a lot of catching up | to do compared to modern 3D graphics. | TOMDM wrote: | Ok, so on the generative model modality landscape I'm now aware | of: | | - speech | | - images | | - audio samples | | - text | | - code | | - 3d models | | I've seen basic attempts at music and video, and based on | everything else we've seen getting good results there seems to be | mostly a matter of scaling. | | What content generation modalities are left? Will all corporate | generation of these fall to progressively larger models, leaving | a (relatively) niche "Made by humans!" industry in it's wake? | christiangenco wrote: | HN comments. | r1chardnl wrote: | Management positions replaced by a supervising AI | TOMDM wrote: | I wonder what your input data would need to be for a | competant AI in this space. | | Finance, goals, delivery timelines, capabilities of the team, | employee availability, which employees work well with each | other, office politics, regulatory constraints... | Filligree wrote: | It doesn't need to work as well as current management, just | well enough to be cheaper. | | Perhaps not even that, factoring in loyalty. | thanatos519 wrote: | The problem is finding a good training set. | monkeydust wrote: | Don't see why not, especially for low impact, high frequency | decisions. Some AI guided assistance with option to automate. | The next level for auto complete I guess. | Keyframe wrote: | Animation, at least the 'background one'. | snek_case wrote: | It's not just a matter of different modalities. It's still a | matter of sophistication. | | The end game is endless generative music or video streaming | customized to your preferences. Being able to describe a story, | or having the AI model take a guess at what you mind find | interesting/entertaining and generating a whole TV show or | movie for you to watch. Or generating background music while | you work and automatically adjusting to your tastes as well as | adjusting if you're finding it hard to concentrate or you need | to take a call. | tetris11 wrote: | Except it wont be, will it. Such things were promised for the | internet and we had maybe a good 10 years or so before corps | caught up and told us what to watch through their channels. | | I imagine this being much of the same: AI trained on corp- | approved training sets to give suggestions to your | preferences that they want. | | Sure, you could spin up your own and buy a machine that | trains on it's own training data, but watch how no one will | do that because of the cost, or the diminishing access to | untainted resources. | theptip wrote: | This seems like a weird take to me. We just saw Stable | Diffusion land, an open-source community-trained SOTA | model. There is an open reimplemented version of GPT. | | How is the correct extrapolation that corps will control | all content generation? | | Sure, Google will always be an OOM or two (more?) ahead in | terms of compute dedicated to the problem. And so the best- | quality stuff will likely come from big corps; Netflix (or | their successor) will have the best quality video- | generation AI. That is how it always has been though; | movies are heavily capital intensive. | | But this tech raises the quality of hobbyist-generated | content vs. highly-capitalized studio content. So I think | it's reasonable to extrapolate to even more content at the | long tail, instead of consolidation. | suby wrote: | It's going to be akin to content creation on youtube, | perhaps even just people using youtube as their | distribution medium. Anyone can make a youtube video but we | don't see everyone creating content. | | We should see a proliferation of the tech such that lots of | small (even one-man) studios pop up pumping out high | quality content, but the content is released on a schedule | similar to how youtube videos are now. Your preferences | come into play through your suggested watch list, it'll be | populated from this pre-created media based on whatever | preference machine learning algorithm the distribution | platform (youtube?) decides. The feedback through watch- | metrics will then be used by these micro studios to decide | what to create next. It's basically what already happens | now with youtube content creation, but the quality of what | people will produce will be better than hollywood content | movies / tv shows, and the pace of release will be much | quicker. | | Not everyone needs to be training and generating their own | content in order for your content preferences to be | absolutely saturated with things you'd enjoy watching. | thanatos519 wrote: | I'm waiting for the "literal video" generator, which writes and | sings new lyrics describing what is happening in the video. | eezurr wrote: | Music will continue to be made by humans because of strong | copyright law. Its illegal to sample (and distribute) > 0 | seconds of recorded music. If that makes it into any ai | generated music, its game over if you distribute it. | | Source on sampling: the head audio engineer at Juilliard School | of Music. | dinobones wrote: | Why would being a "head audio engineer" at Julliard give you | any credibility on AI generated music and sampling/copyright | law? Lol. | eezurr wrote: | Because they work with/teach electronic music/sampling in | addition to recording classical acoustic music. | Geee wrote: | It's not sampling. Sampling is copying & pasting, but that | doesn't happen, technically. Just like stable diffusion | doesn't copy any artworks. AI learns from previous works, but | doesn't copy them. It's quite similar to how humans learn and | make adaptations based on other work. | eezurr wrote: | While technically true, the unspoken premise of my argument | is that it can and will output distinctive samples derived | from the source. E.g. you cant change the pitch, tempo, and | add other effects and call the output your own, legally. | | Its a landmine | skybrian wrote: | That seems like good advice for professionals, but I'm | wondering if it's going to hold up with new ways of | distribution. | | Would distributing a generative model that can sometimes | generate such music would also be considered illegal? Will it | actually stop people from doing it in practice? | | Would it be illegal to share seeds and prompts? | | Though these alternative methods, you could have a lot of | people listening to music that's never distributed as audio | or video files. And if there's an API for it, games could use | such generated music via a plugin. | | And then I suppose people start sharing on YouTube, and we | see how good their copyright violation detection actually is. | shadowfoxx wrote: | Well, you're gonna need folks to sift through all the generated | images and curate the results into something coherent. Taste is | still a thing after-all. | TOMDM wrote: | Visions of the future where the consumer has to pick apart | "human curated", "human assembled" and "human made" much the | same way we do now for cage free and free range eggs at the | grocer. | gersh wrote: | That would be the same as curating social media feeds. | brnaftr361 wrote: | To me these models look worthless. They'd be useless for | anything other than BG props with high DOF and complementary | lighting, you can see on the rear windows in particular there | are artifacts from the topology. If you hit most of that shit | with a light from the side it would look horrendous. | | You can get away with a lot, but I think this is too much. I | think future iterations could be promising, but this definitely | isn't challenging any pipeline I'm aware of. | smoldesu wrote: | To be fair, these models are no worse than the ones the | iPhone makes with LIDAR. It's pretty impressive for being | generated from a single static image. | jokethrowaway wrote: | Great, now we can get the unreleased code for this paper and use | it with the unreleased code for generating animations (really | impressive stuff by Sebastian Starke, presented at various | SIGGRAPH) and build a videogame generator. | | I wouldn't even mad if it were a paid product and not free code, | just release something to the world so we can start using it. | calibas wrote: | Some of the videos aren't working in Firefox. Here's the error: | | > Can't decode H.264 stream because its resolution is out of the | maximum limitation | Jipazgqmnm wrote: | They all work for me on Firefox. Btw. it's using the systems | decoder. | corscans wrote: | Hecking man | wokwokwok wrote: | https://github.com/nv-tlabs/GET3D | | > News | | > 2022-09-22: Code will be uploaded next week! | | Not really that interesting at this point; the 5 page paper has a | lot of hand waving, and without the code to see how they actually | implemented it... | | ...I'm left totally underwhelmed. | | No weights. | | No model. | | No code. | | The pictures were very pretty. | | /shrug | _visgean wrote: | Thats very dismissive. The paper is 39 pages. Most of the | details is in Appendix which I think is fairly standard. I | think they describe the network quite ok (page 16). | caenorst wrote: | Disclaimer: This work is done by some of my colleagues. | | As someone pointed out, there are 25 pages of text (not | including bibliography of course), not 5. | | Most publications are coming with multiple months delay before | code release (if any), here you literally have a written soft | deadline of 1 week. So maybe you can wait few days before | posting such bad comment? | egnehots wrote: | you are barking on the wrong tree, nvidia labs have a history | of releasing their codes and models quickly. | incrudible wrote: | Spoiler: The results are not high quality, at all. | sk0g wrote: | Higher quality than what I can whip up in Blender! | | But yeah, calling this high quality is quite disingenuous. I | don't think this kind of mis-labelling is helpful or | productive. The results are what they are, and a massive step | forward from what was available/possible some years ago. ___________________________________________________________________ (page generated 2022-09-24 23:00 UTC)