[HN Gopher] GET3D: A Generative Model of High Quality 3D Texture...
       ___________________________________________________________________
        
       GET3D: A Generative Model of High Quality 3D Textured Shapes
       Learned from Images
        
       Author : lnyan
       Score  : 129 points
       Date   : 2022-09-24 13:49 UTC (9 hours ago)
        
 (HTM) web link (nv-tlabs.github.io)
 (TXT) w3m dump (nv-tlabs.github.io)
        
       | ummonk wrote:
       | Still nowhere near good enough to be able to generate a VFX or
       | video game asset from some pictures, which is what we'd really
       | want for a practical application of such a tool.
        
         | mgraczyk wrote:
         | Generating good video game assets from pictures is solved, but
         | this does more than that. It generates modified versions from
         | words.
        
           | aaaaaaaaaaab wrote:
           | >Generating good video game assets from pictures is solved
           | 
           | lol no, not at all. It still needs tons of manual work to get
           | it up to quality in terms of topology, material, etc.
        
             | mgraczyk wrote:
             | In terms of practical engineering it's not solved, I mean
             | that that SOTA in photogrammetry is good enough to create
             | high quality textures and meshes directly from pictures.
        
               | aaaaaaaaaaab wrote:
               | Those meshes and textures are far from usable for
               | realtime rendering in a 3D game.
        
       | etaioinshrdlu wrote:
       | In somewhat related topics, I think we can just use stable
       | diffusion to help convert single photos to 3D NERFs.
       | 
       | 1. find the prompt that best generates the image
       | 
       | 2. generate a (crude) NERF from your starting image and render
       | views from other angles
       | 
       | 3. use stable diffusion with the views from other angles as seed
       | images, refine them using the prompt from 1 combined with(add
       | descriptions to generate "view from back", "view from top", etc
       | 
       | 4. feed the refined views back to the NERF generator, keeping the
       | initial photo view constant
       | 
       | 5. Generate new views from the NERF, which should now be much
       | more realistic.
       | 
       | Run the above steps 2-5 in a loop indefinitely. Eventually you
       | should end up with a highly accurate, realistic NERF which is
       | full 3d from any angle, all from a single photo.
       | 
       | Similar techniques could be used to extend the scene in all
       | directions.
        
         | eutectic wrote:
         | I have my doubts that this will converge to anything
         | meaningful.
        
           | aliqot wrote:
           | In the short term you may be right, but in the long run it's
           | a certainty you won't.
        
         | rsp1984 wrote:
         | The problem with such an approach would be that NERFs require a
         | set of input images _with their exact poses_ and exact poses
         | are only available if the underlying geometry is static.
         | However if you use SD to generate new views it 's only an
         | approximation and you wouldn't be able to get the exact poses.
         | 
         | Not all hope is lost though. I'm pretty sure in a few years
         | (perhaps sooner) we'll be able to generate entire 3D scenes
         | directly without going through 2D images as an intermediate
         | step.
        
           | londons_explore wrote:
           | I don't see it as a blocker... Especially if you alternate
           | NeRF and SD iterations - ie. don't generate a whole image
           | each time, instead just choose a random angle, do NeRF
           | reconstruction, then run a single SD iteration on that NeRF,
           | and do another training step of the NeRF.
           | 
           | That way, you know the exact pose for each image, because you
           | chose it when rendering the NeRF.
        
           | eutectic wrote:
           | You can optimize the poses as part of the model.
        
       | bno1 wrote:
       | An AI that does good UV unwrapping would be much more interesting
       | and useful.
        
         | smoldesu wrote:
         | And I'd love to see an AI-generated rigging tool for auto-
         | generating bone structures so you don't have to do it by hand.
         | 
         | Baby steps, though. The data required to train an
         | unwrapping/rigging tool is a lot more domain-specific than
         | correlating an OBJ file with it's completed render.
        
         | rytill wrote:
         | Can you help me understand how you are imagining that? As in,
         | you have texture images already and you want to apply them to
         | the 3D object intelligently? Is that the case you're talking
         | about? Or texture generation and UV unwrapping in one?
        
           | caenorst wrote:
           | I think what they mean by UV unwrapping is generating a UV
           | map from. Textured model (here the texture is generated by a
           | triplanes network).
           | 
           | It's interesting for compression purpose, but kinda
           | orthogonal to this method (a good UV unwrapping can be used
           | once the model generated)
        
           | bno1 wrote:
           | Generating an UV map from an untextured mesh. The UV map is
           | stored in vertices as texture coordinates, and together with
           | the topology of the mesh it defines how the texture (images)
           | gets mapped to the mesh. A good UV maps preserve surface area
           | (e.g. every region of the mesh map to regions of the texture
           | proportionally, otherwise you get stretching), have few seams
           | and have little empty space around the texture to reduce
           | size.
           | 
           | There are ways to do this automatically but they're far from
           | perfect. Artists usually take the mesh and literally unwrap
           | it until it's planar, and convert this transformed mesh to
           | the UV mapping. The advantage of this method is that it gives
           | you very good control of seams and texture islands, but it's
           | tricky to preserve surface area.
           | 
           | Those neural rendering methods are very cool because they use
           | light/color fields, but they still have a lot of catching up
           | to do compared to modern 3D graphics.
        
       | TOMDM wrote:
       | Ok, so on the generative model modality landscape I'm now aware
       | of:
       | 
       | - speech
       | 
       | - images
       | 
       | - audio samples
       | 
       | - text
       | 
       | - code
       | 
       | - 3d models
       | 
       | I've seen basic attempts at music and video, and based on
       | everything else we've seen getting good results there seems to be
       | mostly a matter of scaling.
       | 
       | What content generation modalities are left? Will all corporate
       | generation of these fall to progressively larger models, leaving
       | a (relatively) niche "Made by humans!" industry in it's wake?
        
         | christiangenco wrote:
         | HN comments.
        
         | r1chardnl wrote:
         | Management positions replaced by a supervising AI
        
           | TOMDM wrote:
           | I wonder what your input data would need to be for a
           | competant AI in this space.
           | 
           | Finance, goals, delivery timelines, capabilities of the team,
           | employee availability, which employees work well with each
           | other, office politics, regulatory constraints...
        
             | Filligree wrote:
             | It doesn't need to work as well as current management, just
             | well enough to be cheaper.
             | 
             | Perhaps not even that, factoring in loyalty.
        
           | thanatos519 wrote:
           | The problem is finding a good training set.
        
           | monkeydust wrote:
           | Don't see why not, especially for low impact, high frequency
           | decisions. Some AI guided assistance with option to automate.
           | The next level for auto complete I guess.
        
         | Keyframe wrote:
         | Animation, at least the 'background one'.
        
         | snek_case wrote:
         | It's not just a matter of different modalities. It's still a
         | matter of sophistication.
         | 
         | The end game is endless generative music or video streaming
         | customized to your preferences. Being able to describe a story,
         | or having the AI model take a guess at what you mind find
         | interesting/entertaining and generating a whole TV show or
         | movie for you to watch. Or generating background music while
         | you work and automatically adjusting to your tastes as well as
         | adjusting if you're finding it hard to concentrate or you need
         | to take a call.
        
           | tetris11 wrote:
           | Except it wont be, will it. Such things were promised for the
           | internet and we had maybe a good 10 years or so before corps
           | caught up and told us what to watch through their channels.
           | 
           | I imagine this being much of the same: AI trained on corp-
           | approved training sets to give suggestions to your
           | preferences that they want.
           | 
           | Sure, you could spin up your own and buy a machine that
           | trains on it's own training data, but watch how no one will
           | do that because of the cost, or the diminishing access to
           | untainted resources.
        
             | theptip wrote:
             | This seems like a weird take to me. We just saw Stable
             | Diffusion land, an open-source community-trained SOTA
             | model. There is an open reimplemented version of GPT.
             | 
             | How is the correct extrapolation that corps will control
             | all content generation?
             | 
             | Sure, Google will always be an OOM or two (more?) ahead in
             | terms of compute dedicated to the problem. And so the best-
             | quality stuff will likely come from big corps; Netflix (or
             | their successor) will have the best quality video-
             | generation AI. That is how it always has been though;
             | movies are heavily capital intensive.
             | 
             | But this tech raises the quality of hobbyist-generated
             | content vs. highly-capitalized studio content. So I think
             | it's reasonable to extrapolate to even more content at the
             | long tail, instead of consolidation.
        
             | suby wrote:
             | It's going to be akin to content creation on youtube,
             | perhaps even just people using youtube as their
             | distribution medium. Anyone can make a youtube video but we
             | don't see everyone creating content.
             | 
             | We should see a proliferation of the tech such that lots of
             | small (even one-man) studios pop up pumping out high
             | quality content, but the content is released on a schedule
             | similar to how youtube videos are now. Your preferences
             | come into play through your suggested watch list, it'll be
             | populated from this pre-created media based on whatever
             | preference machine learning algorithm the distribution
             | platform (youtube?) decides. The feedback through watch-
             | metrics will then be used by these micro studios to decide
             | what to create next. It's basically what already happens
             | now with youtube content creation, but the quality of what
             | people will produce will be better than hollywood content
             | movies / tv shows, and the pace of release will be much
             | quicker.
             | 
             | Not everyone needs to be training and generating their own
             | content in order for your content preferences to be
             | absolutely saturated with things you'd enjoy watching.
        
         | thanatos519 wrote:
         | I'm waiting for the "literal video" generator, which writes and
         | sings new lyrics describing what is happening in the video.
        
         | eezurr wrote:
         | Music will continue to be made by humans because of strong
         | copyright law. Its illegal to sample (and distribute) > 0
         | seconds of recorded music. If that makes it into any ai
         | generated music, its game over if you distribute it.
         | 
         | Source on sampling: the head audio engineer at Juilliard School
         | of Music.
        
           | dinobones wrote:
           | Why would being a "head audio engineer" at Julliard give you
           | any credibility on AI generated music and sampling/copyright
           | law? Lol.
        
             | eezurr wrote:
             | Because they work with/teach electronic music/sampling in
             | addition to recording classical acoustic music.
        
           | Geee wrote:
           | It's not sampling. Sampling is copying & pasting, but that
           | doesn't happen, technically. Just like stable diffusion
           | doesn't copy any artworks. AI learns from previous works, but
           | doesn't copy them. It's quite similar to how humans learn and
           | make adaptations based on other work.
        
             | eezurr wrote:
             | While technically true, the unspoken premise of my argument
             | is that it can and will output distinctive samples derived
             | from the source. E.g. you cant change the pitch, tempo, and
             | add other effects and call the output your own, legally.
             | 
             | Its a landmine
        
           | skybrian wrote:
           | That seems like good advice for professionals, but I'm
           | wondering if it's going to hold up with new ways of
           | distribution.
           | 
           | Would distributing a generative model that can sometimes
           | generate such music would also be considered illegal? Will it
           | actually stop people from doing it in practice?
           | 
           | Would it be illegal to share seeds and prompts?
           | 
           | Though these alternative methods, you could have a lot of
           | people listening to music that's never distributed as audio
           | or video files. And if there's an API for it, games could use
           | such generated music via a plugin.
           | 
           | And then I suppose people start sharing on YouTube, and we
           | see how good their copyright violation detection actually is.
        
         | shadowfoxx wrote:
         | Well, you're gonna need folks to sift through all the generated
         | images and curate the results into something coherent. Taste is
         | still a thing after-all.
        
           | TOMDM wrote:
           | Visions of the future where the consumer has to pick apart
           | "human curated", "human assembled" and "human made" much the
           | same way we do now for cage free and free range eggs at the
           | grocer.
        
           | gersh wrote:
           | That would be the same as curating social media feeds.
        
         | brnaftr361 wrote:
         | To me these models look worthless. They'd be useless for
         | anything other than BG props with high DOF and complementary
         | lighting, you can see on the rear windows in particular there
         | are artifacts from the topology. If you hit most of that shit
         | with a light from the side it would look horrendous.
         | 
         | You can get away with a lot, but I think this is too much. I
         | think future iterations could be promising, but this definitely
         | isn't challenging any pipeline I'm aware of.
        
           | smoldesu wrote:
           | To be fair, these models are no worse than the ones the
           | iPhone makes with LIDAR. It's pretty impressive for being
           | generated from a single static image.
        
       | jokethrowaway wrote:
       | Great, now we can get the unreleased code for this paper and use
       | it with the unreleased code for generating animations (really
       | impressive stuff by Sebastian Starke, presented at various
       | SIGGRAPH) and build a videogame generator.
       | 
       | I wouldn't even mad if it were a paid product and not free code,
       | just release something to the world so we can start using it.
        
       | calibas wrote:
       | Some of the videos aren't working in Firefox. Here's the error:
       | 
       | > Can't decode H.264 stream because its resolution is out of the
       | maximum limitation
        
         | Jipazgqmnm wrote:
         | They all work for me on Firefox. Btw. it's using the systems
         | decoder.
        
       | corscans wrote:
       | Hecking man
        
       | wokwokwok wrote:
       | https://github.com/nv-tlabs/GET3D
       | 
       | > News
       | 
       | > 2022-09-22: Code will be uploaded next week!
       | 
       | Not really that interesting at this point; the 5 page paper has a
       | lot of hand waving, and without the code to see how they actually
       | implemented it...
       | 
       | ...I'm left totally underwhelmed.
       | 
       | No weights.
       | 
       | No model.
       | 
       | No code.
       | 
       | The pictures were very pretty.
       | 
       | /shrug
        
         | _visgean wrote:
         | Thats very dismissive. The paper is 39 pages. Most of the
         | details is in Appendix which I think is fairly standard. I
         | think they describe the network quite ok (page 16).
        
         | caenorst wrote:
         | Disclaimer: This work is done by some of my colleagues.
         | 
         | As someone pointed out, there are 25 pages of text (not
         | including bibliography of course), not 5.
         | 
         | Most publications are coming with multiple months delay before
         | code release (if any), here you literally have a written soft
         | deadline of 1 week. So maybe you can wait few days before
         | posting such bad comment?
        
         | egnehots wrote:
         | you are barking on the wrong tree, nvidia labs have a history
         | of releasing their codes and models quickly.
        
       | incrudible wrote:
       | Spoiler: The results are not high quality, at all.
        
         | sk0g wrote:
         | Higher quality than what I can whip up in Blender!
         | 
         | But yeah, calling this high quality is quite disingenuous. I
         | don't think this kind of mis-labelling is helpful or
         | productive. The results are what they are, and a massive step
         | forward from what was available/possible some years ago.
        
       ___________________________________________________________________
       (page generated 2022-09-24 23:00 UTC)