[HN Gopher] MeshGPT: Generating triangle meshes with decoder-onl...
       ___________________________________________________________________
        
       MeshGPT: Generating triangle meshes with decoder-only transformers
        
       Author : jackcook
       Score  : 381 points
       Date   : 2023-11-28 17:56 UTC (5 hours ago)
        
 (HTM) web link (nihalsid.github.io)
 (TXT) w3m dump (nihalsid.github.io)
        
       | chongli wrote:
       | This looks really cool! Seems like it would be an incredible boon
       | for an indie game developer to generate a large pool of assets!
        
         | stuckinhell wrote:
         | I think indie game development is dead with these techniques.
         | Instead big companies will create "make your own game" games.
         | 
         | Indie games already seems pretty derivative these days. I think
         | this tech will kill them in mid-term as big companies use them.
        
           | CamperBob2 wrote:
           | For values of "dead" equal to "Now people who aren't 3D
           | artists and can't afford to hire them will be able to make
           | games," maybe.
           | 
           | User name checks out.
        
             | stuckinhell wrote:
             | AI is already taking video game illustrators' jobs in China
             | https://restofworld.org/2023/ai-image-china-video-game-
             | layof...
             | 
             | It feels like a countdown until every creative in the
             | videogame industry is automated.
        
           | owenpalmer wrote:
           | People who use "make your own game" games aren't good at
           | making games. They might enjoy a simplified process to feel
           | the accomplishment of seeing quick results, but I find it
           | unlikely they'll be competing with indie developers.
        
             | CaptainFever wrote:
             | Yeah, and if there was going to be such a tool, people who
             | invest more time in it would be better than those casually
             | using it. In other words, professionals.
        
             | CamperBob2 wrote:
             | Careful with that generalization. Game-changing FPS mods
             | like Counterstrike were basically "make your own game"
             | projects, built with the highest-level toolkits imaginable
             | (editors for existing commercial games.)
        
           | chongli wrote:
           | "Make your own game" games will never replace regular games.
           | They target totally different interests. People who play
           | games (vast majority) just want to play an experience created
           | by someone else. People who like "make your own game" games
           | are creative types who just use that as a jumping off point
           | to becoming a game designer.
           | 
           | It's no different than saying "these home kitchen appliances
           | are really gonna kill off the restaurant industry."
        
             | stuckinhell wrote:
             | Hmm I think it will destroy the market in a couple ways.
             | 
             | AI creating video games would drastically increase the
             | volume of games available in the market. This surge in
             | supply could make it harder for indie games to stand out,
             | especially if AI-generated games are of high quality or
             | novelty. It could also lead to even more indie saturation(
             | the average indie makes less than 1000 dollars).
             | 
             | As the market expectations shift, I think most indie
             | development dies unless you are already rich or basically
             | have patronage from rich clients.
        
           | dexwiz wrote:
           | The platform layer of the "make your own game" game is always
           | too heavy and too limited to compete with a dedicated engine
           | in the long run. Also the monetization strategy is bad for
           | professionals.
        
           | angra_mainyu wrote:
           | I couldn't disagree more. RPGMaker didn't kill RPGs,
           | Unity/Godot/Unreal didn't kill games, Minecraft didn't kill
           | games, and Renpy didn't kill VNs.
           | 
           | Far more people prefer playing games than making them.
           | 
           | We'll probably see a new boom of indie games instead. Don't
           | forget, a large part of what makes the gaming experience
           | unique is the narrative elements, gameplay, and aesthetics -
           | none of which are easily replaceable.
           | 
           | This empowers indie studios to hit a faster pace on one of
           | the most painful areas of indie game dev: asset generation
           | (or at least for me as a solo dev hobbyist).
        
             | stuckinhell wrote:
             | Sorry I guess I wasn't clear. None of those things made
             | games automatically. The future is buying a game making
             | game, and saying I want a zelda clone but funnier.
             | 
             | The ai game framework handles the full game creation
             | pipeline.
        
           | Vegenoid wrote:
           | There are more amazing, innovative and interesting indie
           | games being created now than ever before. There's just also
           | way more indie games that aren't those things.
        
       | airstrike wrote:
       | This is revolutionary
        
       | shaileshm wrote:
       | This is what a truly revolutionary idea looks like. There are so
       | many details in the paper. Also, we know that transformers can
       | scale. Pretty sure this idea will be used by a lot of companies
       | to train the general 3D asset creation pipeline. This is just too
       | great.
       | 
       | "We first learn a vocabulary of latent quantized embeddings,
       | using graph convolutions, which inform these embeddings of the
       | local mesh geometry and topology. These embeddings are sequenced
       | and decoded into triangles by a decoder, ensuring that they can
       | effectively reconstruct the mesh."
       | 
       | This idea is simply beautiful and so obvious in hindsight.
       | 
       | "To define the tokens to generate, we consider a practical
       | approach to represent a mesh M for autoregressive generation: a
       | sequence of triangles."
       | 
       | More from paper. Just so cool!
        
         | tomcam wrote:
         | Can someone explain quantized embeddings to me?
        
           | _hark wrote:
           | NNs are typically continuous/differentiable so you can do
           | gradient-based learning on them. We often want to use some of
           | the structure the NN has learned to represent data
           | efficiently. E.g., we might take a pre-trained GPT-type
           | model, and put a passage of text through it, and instead of
           | getting the next-token prediction probability (which GPT was
           | trained on), we just get a snapshot of some of the
           | activations at some intermediate layer of the network. The
           | idea is that these activations will encode semantically
           | useful information about the input text. Then we might e.g.
           | store a bunch of these activations and use them to do
           | semantic search/lookup to find similar passages of text, or
           | whatever.
           | 
           | Quantized embeddings are just that, but you introduce some
           | discrete structure into the NN, such that the representations
           | there are not continuous. A typical way to do this these days
           | is to learn a codebook VQ-VAE style. Basically, we take some
           | intermediate continuous representation learned in the normal
           | way, and replace it in the forward pass with the nearest
           | "quantized" code from our codebook. It biases the learning
           | since we can't differentiate through it, and we just pretend
           | like we didn't take the quantization step, but it seems to
           | work well. There's a lot more that can be said about why one
           | might want to do this, the value of discrete vs continuous
           | representations, efficiency, modularity, etc...
        
             | enjeyw wrote:
             | If you're willing, I'd love your insight on the "why one
             | might want to do this".
             | 
             | Conceptually I understand embedding quantization, and I
             | have some hint of why it works for things like WAV2VEC -
             | human phonemes are (somewhat) finite so forcing the
             | representation to be finite makes sense - but I feel like
             | there's a level of detail that I'm missing regarding whats
             | really going on and when quantisation helps/harms that I
             | haven't been able to gleam from papers.
        
               | visarga wrote:
               | Maybe it helps to point out that the first version of
               | Dall-E (of 'baby daikon radish in a tutu walking a dog'
               | fame) used the same trick, but they quantized the image
               | patches.
        
         | hedgehog wrote:
         | Another thing to note here is this looks to be around seven
         | total days of training on at most 4 A100s. Not all really
         | cutting edge work requires a data center sized cluster.
        
       | sram1337 wrote:
       | What is the input? Is it converting a text query like "chair" to
       | a mesh?
       | 
       | edit: Seems like mesh completion is the main input-output method,
       | not just a neat feature.
        
         | CamperBob2 wrote:
         | That's what I was wondering. From the diagram it looks like the
         | input is other chair meshes, which makes it somewhat less
         | interesting.
        
           | tayo42 wrote:
           | Really the hardest thing with art is details and usually
           | seperates good from bad. So if you can sketch what you want
           | roughly without skill and have the details generated, that's
           | extremely useful. And image to image with the existing
           | diffusion models is useful and popular.
        
             | nullptr_deref wrote:
             | I have no idea about your background when I am commenting
             | here. But these are my two cents.
             | 
             | NO. Details are mostly like icing on top of the cake. Sure,
             | good details make good art but it is not always the case.
             | True and beautiful art requires form + shape. What you are
             | saying is something visually appealing. So, the reason why
             | diffusion models feel so bland is because they are good
             | with details but do not have precise forms and shape.
             | Nowadays they are getting better, however, it still remains
             | an issue.
             | 
             | Form + shape > details is something they teach in Art 101.
        
           | treyd wrote:
           | There's also examples of tables, lamps, couches, etc in the
           | video.
        
         | all2 wrote:
         | You prompt this LLM using 3D meshes for it to complete, in the
         | same manner you use language to prompt language specific LLMs.
        
           | owenpalmer wrote:
           | That's what it seems like. Although this is not an LLM.
           | 
           | > Inspired by recent advances in powerful large language
           | models, we adopt a sequence-based approach to
           | autoregressively generate triangle meshes as sequences of
           | triangles.
           | 
           | It's only inspired by LLMs
        
             | adw wrote:
             | This is sort of a distinction without a difference. It's an
             | autoregressive sequence model; the distinction is how
             | you're encoding data into (and out of) a sequence of
             | tokens.
             | 
             | LLMs are autoregressive sequence models where the "role" of
             | the graph convolutional encoder here is filled by a BPE
             | tokenizer (also a learned model, just a much simpler one
             | than the model used here). That this works implies that you
             | can probably port this idea to other domains by designing
             | clever codecs which map their feature space into discrete
             | token sequences, similarly.
             | 
             | (Everything is feature engineering if you squint hard
             | enough.)
        
             | ShamelessC wrote:
             | The only difference is the label, really. The underlying
             | transformer architecture and the approach of using a
             | codebook is identical to a large language model. The same
             | approach was also used originally for image generation in
             | DALL-E 1.
        
         | anentropic wrote:
         | Yeah it's hard to tell.
         | 
         | It looks like the input is itself a 3D mesh? So the model is
         | doing "shape completion" (e.g. they show generating a chair
         | from just some legs)... or possibly generating "variations"
         | when the input shape is more complete?
         | 
         | But I guess it's a starting point... maybe you could use
         | another model that does worse quality text-to-mesh as the input
         | and get something more crisp and coherent from this one.
        
       | carbocation wrote:
       | On my phone so I've only read this promo page - could this
       | approach be modified for surface reconstruction from a 3D point
       | cloud?
        
       | kranke155 wrote:
       | My chosen profession (3D / filmmaking) feels like being in some
       | kind of combat trench at the moment. Both fascinating and scary
        
         | nextworddev wrote:
         | What do you ascertain the use case of this in your field? Does
         | it seem high quality? (I have no context)
        
           | zavertnik wrote:
           | I'm not a professional in VFX, but I work in television and
           | do a lot of VFX/3D work on the side. The quality isn't
           | amazing, but it looks like this could be the start of a
           | Midjourney-tier VFX/3D LLM, which would be awesome. For me,
           | this would help bridge the gap between having to use/find
           | premade assets and building what I want.
           | 
           | For context, building from scratch in a 3D pipeline requires
           | you to wear a lot of different hats (modeling, materials,
           | lighting, framing, animating, ect). It costs a lot of time to
           | get to not only learn these hats but also use them together.
           | The individual complexity of those skill sets makes it
           | difficult to experiment and play around, which is how people
           | learn with software.
           | 
           | The shortcut is using premade assets or addons. For instance,
           | being able to use the Source game assets in Source Filmmaker
           | combined with SFM using a familiar game engine makes it easy
           | to build an intuition with the workflow. This makes Source
           | Filmmaker accessible and its why theres so much content out
           | there made with it. So if you have gaps in your skillset or
           | need to save time, you'll buy/use premade assets. This comes
           | at a cost of control, but that's always been the tradeoff
           | between building what you want and building with what you
           | have.
           | 
           | Just like GPT and DALL-E built a bridge between building what
           | you want and building with what you have, a high fidelity GPT
           | for the 3D pipeline would make that world so much more
           | accessible and would bring the kind of attention NLE video
           | editing got in the post-Youtube world. If I could describe in
           | text and/or generate an image of a scene I want and have a
           | GPT create the objects, model them, generate textures, and
           | place them in the scene, I could suddenly just open blender,
           | describe a scene, and just experimenting with shooting in it,
           | as if I was playing in a sandbox FPS game.
           | 
           | I'm not sure if MeshGPT is the ChatGPT of the 3D pipeline,
           | but I do think this is kind of content generation is the
           | conduit for the DALL-E of video that so many people are
           | terrified and/or excited for.
        
             | gavinray wrote:
             | On an unrelated note, could I ask your opinion?
             | 
             | My wife is passionate about film/TV production and VFX.
             | 
             | She's currently in school for this but is concerned about
             | the difficulty of landing a job afterwards.
             | 
             | Do you have any recommendations on breaking into the
             | industry without work experience?
        
         | bsenftner wrote:
         | So you're probably familiar with the role of a Bidding
         | Producer; imagine the difficulty they are facing: on one side
         | they have filmmakers saying they just read so and so is now
         | created by AI, while that is news to the bidding producer and
         | their VFX/animation studio clients scrambling as everything
         | they do is new again.
        
         | sheepscreek wrote:
         | Perhaps one way to look at this could be auto-scaffolding. The
         | typical modelling and CAD tools might include this feature to
         | get you up and running faster.
         | 
         | Another massive benefit is composability. If the model can
         | generate a cup and a table, it also knows how to generate a cup
         | on a table.
         | 
         | Think of all the complex gears and machine parts this could
         | generate in the blink of an eye, while being relevant to the
         | project - rotated and positioned exact where you want it. Very
         | similar to how GitHub Copilot works.
        
         | worldsayshi wrote:
         | I don't see that LLM's have come that much further in 3D
         | animation than programming in this regard: It can spit out bits
         | and pieces that looks okay in isolation but a human need to
         | solve the puzzle. And often solving the puzzle means
         | rewriting/redoing most of the pieces.
         | 
         | We're safe for now but we should learn how to leverage the new
         | tech.
        
           | andkenneth wrote:
           | This is the "your job won't be taken away by AI, it will be
           | taken away by someone who knows how to leverage AI better
           | than you"
        
       | trostaft wrote:
       | Seems like the bibtex on the page is broken? Or might just be an
       | extension of mine.
        
       | alexose wrote:
       | It sure feels like every remaining hard problem (i.e., the ones
       | where we haven't made much progress since the 90s) is in line to
       | be solved by transformers in some fashion. What a time to be
       | alive.
        
       | mclanett wrote:
       | This is very cool. You can start with an image, generate a mesh
       | for it, render it, and then compare the render to the image.
       | Fully automated training.
        
       | j7ake wrote:
       | I love this field. Paper include a nice website, examples, and
       | videos.
       | 
       | So much more refreshing than the dense abstract, intro, results
       | paper style.
        
       | valine wrote:
       | Even if this is "only" mesh autocomplete, it is still massively
       | useful for 3D artists. There's a disconnect right now between how
       | characters are sculpted and how characters are animated. You'd
       | typically need a time consuming step to retopologize your model.
       | Transformer based retopology that takes a rough mesh and gives
       | you clean topology would be a big time saver.
       | 
       | Another application: take the output of your gaussian splatter or
       | diffusion model and run it through MeshGPT. Instant usable assets
       | with clean topology from text.
        
         | toxik wrote:
         | What you have to understand is that these methods are very
         | sensitive to what is in distribution and out of distribution.
         | If you just plug in user data, it will likely not work.
        
       | toxik wrote:
       | This was done years ago, with transformers. It was then dubbed
       | Polygen.
        
         | Sharlin wrote:
         | You might want to RTFA. Polygen and other prior art are
         | mentioned. This approach is superior.
        
           | toxik wrote:
           | I read the article. It has exactly the same limitations as
           | Polygen from what I can tell.
        
             | dymk wrote:
             | Their comparison against PolyGen looks like it's a big
             | improvement. What are the limitations that this has in
             | common with PolyGen that make it still not useful?
        
               | toxik wrote:
               | I don't think it's as widely applicable as they try to
               | make it seem. I have worked specifically with PolyGen,
               | and the main problem is "out of distribution" data.
               | Basically anything you want to do will likely be outside
               | the training distribution. This surfaces as sequencing.
               | How do you determine which triangle or vertex to place
               | first? Why would a user do it that way? What if I want to
               | draw a table with the legs last? Cannot be done. The
               | model is autoregressive.
        
       | mlsu wrote:
       | The next breakthrough will be the UX to create 3d scenes in front
       | of a model like this, in VR. This would basically let you
       | _generate_ a permanent, arbitrary 3D environment, for any
       | environment for which we have training data.
       | 
       | Diffusion models could be used to generate textures.
       | 
       | Mark is right and so so early.
        
       | amelius wrote:
       | Is this limited to shapes that have mostly flat faces?
        
       | catapart wrote:
       | Dang, this is getting so good! Still got a ways to go, with the
       | weird edges, but at this point, that feels like 'iteration
       | details' rather than an algorithmic or otherwise complex problem.
       | 
       | It's really going to speed up my pipeline to not have to pipe all
       | of my meshes into a procgen library with a million little mesh
       | modifiers hooked up to drivers. Instead, I can just pop all of my
       | meshes into a folder, train the network on them, and then start
       | asking it for other stuff in that style, knowing that I won't
       | have to re-topo or otherwise screw with the stuff it makes,
       | unless I'm looking for more creative influence.
       | 
       | Of course, until it's all the way to that point, I'm still better
       | served by the procgen; but I'm very excited by how quickly this
       | is coming together! Hopefully by next year's Unreal showcase,
       | they'll be talking about their new "Asset Generator" feature.
        
         | truckerbill wrote:
         | Do you have a recommended procgen lib?
        
           | catapart wrote:
           | Oh man, sorry, I wish! I've been using cobbled together bits
           | of python plugins that handle Blender's geometry nodes, and
           | the geometry scripts tools in Unreal. I haven't even ported
           | over to their new proc-gen tools, which I suspect can be
           | pretty useful.
        
       | circuit10 wrote:
       | Can this handle more organic shapes?
        
       | LarsDu88 wrote:
       | As a machine learning engineer who dabbles with Blender and hobby
       | gamedev, this is pretty impressive, but not quite to the point of
       | being useful in any practical manner (as far as the limited
       | furniture examples are concerned.
       | 
       | A competent modeler can make these types of meshes in under 5
       | minutes, and you still need to seed the generation with polys.
       | 
       | I imagine the next step will be to have the seed generation
       | controlled by an LLM, and to start adding image models to the
       | autoregressive parts of the architecture.
       | 
       | Then we might see truly mobile game-ready assets!
        
         | th0ma5 wrote:
         | This is a very underrated comment... As with any tech demo, I'd
         | they don't show it, it can't do it. It is very very easy to
         | imagine a generalization of these things to other purposes,
         | which, if it could do it, would be a different presentation.
        
           | rawrawrawrr wrote:
           | It's research, not meant for commercialization. The main
           | point is in the process, not necessarily the output.
        
         | empath-nirvana wrote:
         | > A competent modeler can make these types of meshes in under 5
         | minutes.
         | 
         | I don't think this general complaint about AI workflows is that
         | useful. Most people are not a competent <insert job here>. Most
         | people don't know a competent <insert job here> or can't afford
         | to hire one. Even something that takes longer than a
         | professional do at worse quality for many things is better than
         | _nothing_ which is the realistic alternative for most people
         | who would use something like this.
        
           | cannonpalms wrote:
           | Is the target market really "most people," though? I would
           | say not. The general goal of all of this economic investment
           | is to improve the productivity of labor--that means first and
           | foremost that things need to be useful and practical for
           | those trained to make determinations such as "useful" and
           | "practical."
        
             | taneq wrote:
             | Millions of people generating millions of images (some of
             | them even useful!) using Dall-E and Stable Diffusion would
             | say otherwise. A skilled digital artist could create most
             | of these images in an hour or two, I'd guess... but 'most
             | people' certainly could not, and it turns out that these
             | people really want to.
        
           | chefandy wrote:
           | > I don't think this general complaint about AI workflows is
           | that useful
           | 
           | Maybe not to you, but it's useful if you're in these fields
           | professionally, though. The difference between a neat
           | hobbyist toolkit and a professional toolkit has gigantic
           | financial implications, even if the difference is minimal to
           | "most people."
        
         | Kaijo wrote:
         | The mesh topology here would see these rejected as assets for
         | in basically any professional context. A competent modeler
         | could make much higher quality models, more suited to texturing
         | and deformation, in under five minutes. A speed modeler could
         | make the same in under a minute. And a procedural system in
         | something like Blender geonodes can already spit out an endless
         | variety of such models. But the pace of progress is staggering.
        
       | frozencell wrote:
       | Not reproducible with code = Not research.
        
       ___________________________________________________________________
       (page generated 2023-11-28 23:00 UTC)