[HN Gopher] MeshGPT: Generating triangle meshes with decoder-onl... ___________________________________________________________________ MeshGPT: Generating triangle meshes with decoder-only transformers Author : jackcook Score : 381 points Date : 2023-11-28 17:56 UTC (5 hours ago) (HTM) web link (nihalsid.github.io) (TXT) w3m dump (nihalsid.github.io) | chongli wrote: | This looks really cool! Seems like it would be an incredible boon | for an indie game developer to generate a large pool of assets! | stuckinhell wrote: | I think indie game development is dead with these techniques. | Instead big companies will create "make your own game" games. | | Indie games already seems pretty derivative these days. I think | this tech will kill them in mid-term as big companies use them. | CamperBob2 wrote: | For values of "dead" equal to "Now people who aren't 3D | artists and can't afford to hire them will be able to make | games," maybe. | | User name checks out. | stuckinhell wrote: | AI is already taking video game illustrators' jobs in China | https://restofworld.org/2023/ai-image-china-video-game- | layof... | | It feels like a countdown until every creative in the | videogame industry is automated. | owenpalmer wrote: | People who use "make your own game" games aren't good at | making games. They might enjoy a simplified process to feel | the accomplishment of seeing quick results, but I find it | unlikely they'll be competing with indie developers. | CaptainFever wrote: | Yeah, and if there was going to be such a tool, people who | invest more time in it would be better than those casually | using it. In other words, professionals. | CamperBob2 wrote: | Careful with that generalization. Game-changing FPS mods | like Counterstrike were basically "make your own game" | projects, built with the highest-level toolkits imaginable | (editors for existing commercial games.) | chongli wrote: | "Make your own game" games will never replace regular games. | They target totally different interests. People who play | games (vast majority) just want to play an experience created | by someone else. People who like "make your own game" games | are creative types who just use that as a jumping off point | to becoming a game designer. | | It's no different than saying "these home kitchen appliances | are really gonna kill off the restaurant industry." | stuckinhell wrote: | Hmm I think it will destroy the market in a couple ways. | | AI creating video games would drastically increase the | volume of games available in the market. This surge in | supply could make it harder for indie games to stand out, | especially if AI-generated games are of high quality or | novelty. It could also lead to even more indie saturation( | the average indie makes less than 1000 dollars). | | As the market expectations shift, I think most indie | development dies unless you are already rich or basically | have patronage from rich clients. | dexwiz wrote: | The platform layer of the "make your own game" game is always | too heavy and too limited to compete with a dedicated engine | in the long run. Also the monetization strategy is bad for | professionals. | angra_mainyu wrote: | I couldn't disagree more. RPGMaker didn't kill RPGs, | Unity/Godot/Unreal didn't kill games, Minecraft didn't kill | games, and Renpy didn't kill VNs. | | Far more people prefer playing games than making them. | | We'll probably see a new boom of indie games instead. Don't | forget, a large part of what makes the gaming experience | unique is the narrative elements, gameplay, and aesthetics - | none of which are easily replaceable. | | This empowers indie studios to hit a faster pace on one of | the most painful areas of indie game dev: asset generation | (or at least for me as a solo dev hobbyist). | stuckinhell wrote: | Sorry I guess I wasn't clear. None of those things made | games automatically. The future is buying a game making | game, and saying I want a zelda clone but funnier. | | The ai game framework handles the full game creation | pipeline. | Vegenoid wrote: | There are more amazing, innovative and interesting indie | games being created now than ever before. There's just also | way more indie games that aren't those things. | airstrike wrote: | This is revolutionary | shaileshm wrote: | This is what a truly revolutionary idea looks like. There are so | many details in the paper. Also, we know that transformers can | scale. Pretty sure this idea will be used by a lot of companies | to train the general 3D asset creation pipeline. This is just too | great. | | "We first learn a vocabulary of latent quantized embeddings, | using graph convolutions, which inform these embeddings of the | local mesh geometry and topology. These embeddings are sequenced | and decoded into triangles by a decoder, ensuring that they can | effectively reconstruct the mesh." | | This idea is simply beautiful and so obvious in hindsight. | | "To define the tokens to generate, we consider a practical | approach to represent a mesh M for autoregressive generation: a | sequence of triangles." | | More from paper. Just so cool! | tomcam wrote: | Can someone explain quantized embeddings to me? | _hark wrote: | NNs are typically continuous/differentiable so you can do | gradient-based learning on them. We often want to use some of | the structure the NN has learned to represent data | efficiently. E.g., we might take a pre-trained GPT-type | model, and put a passage of text through it, and instead of | getting the next-token prediction probability (which GPT was | trained on), we just get a snapshot of some of the | activations at some intermediate layer of the network. The | idea is that these activations will encode semantically | useful information about the input text. Then we might e.g. | store a bunch of these activations and use them to do | semantic search/lookup to find similar passages of text, or | whatever. | | Quantized embeddings are just that, but you introduce some | discrete structure into the NN, such that the representations | there are not continuous. A typical way to do this these days | is to learn a codebook VQ-VAE style. Basically, we take some | intermediate continuous representation learned in the normal | way, and replace it in the forward pass with the nearest | "quantized" code from our codebook. It biases the learning | since we can't differentiate through it, and we just pretend | like we didn't take the quantization step, but it seems to | work well. There's a lot more that can be said about why one | might want to do this, the value of discrete vs continuous | representations, efficiency, modularity, etc... | enjeyw wrote: | If you're willing, I'd love your insight on the "why one | might want to do this". | | Conceptually I understand embedding quantization, and I | have some hint of why it works for things like WAV2VEC - | human phonemes are (somewhat) finite so forcing the | representation to be finite makes sense - but I feel like | there's a level of detail that I'm missing regarding whats | really going on and when quantisation helps/harms that I | haven't been able to gleam from papers. | visarga wrote: | Maybe it helps to point out that the first version of | Dall-E (of 'baby daikon radish in a tutu walking a dog' | fame) used the same trick, but they quantized the image | patches. | hedgehog wrote: | Another thing to note here is this looks to be around seven | total days of training on at most 4 A100s. Not all really | cutting edge work requires a data center sized cluster. | sram1337 wrote: | What is the input? Is it converting a text query like "chair" to | a mesh? | | edit: Seems like mesh completion is the main input-output method, | not just a neat feature. | CamperBob2 wrote: | That's what I was wondering. From the diagram it looks like the | input is other chair meshes, which makes it somewhat less | interesting. | tayo42 wrote: | Really the hardest thing with art is details and usually | seperates good from bad. So if you can sketch what you want | roughly without skill and have the details generated, that's | extremely useful. And image to image with the existing | diffusion models is useful and popular. | nullptr_deref wrote: | I have no idea about your background when I am commenting | here. But these are my two cents. | | NO. Details are mostly like icing on top of the cake. Sure, | good details make good art but it is not always the case. | True and beautiful art requires form + shape. What you are | saying is something visually appealing. So, the reason why | diffusion models feel so bland is because they are good | with details but do not have precise forms and shape. | Nowadays they are getting better, however, it still remains | an issue. | | Form + shape > details is something they teach in Art 101. | treyd wrote: | There's also examples of tables, lamps, couches, etc in the | video. | all2 wrote: | You prompt this LLM using 3D meshes for it to complete, in the | same manner you use language to prompt language specific LLMs. | owenpalmer wrote: | That's what it seems like. Although this is not an LLM. | | > Inspired by recent advances in powerful large language | models, we adopt a sequence-based approach to | autoregressively generate triangle meshes as sequences of | triangles. | | It's only inspired by LLMs | adw wrote: | This is sort of a distinction without a difference. It's an | autoregressive sequence model; the distinction is how | you're encoding data into (and out of) a sequence of | tokens. | | LLMs are autoregressive sequence models where the "role" of | the graph convolutional encoder here is filled by a BPE | tokenizer (also a learned model, just a much simpler one | than the model used here). That this works implies that you | can probably port this idea to other domains by designing | clever codecs which map their feature space into discrete | token sequences, similarly. | | (Everything is feature engineering if you squint hard | enough.) | ShamelessC wrote: | The only difference is the label, really. The underlying | transformer architecture and the approach of using a | codebook is identical to a large language model. The same | approach was also used originally for image generation in | DALL-E 1. | anentropic wrote: | Yeah it's hard to tell. | | It looks like the input is itself a 3D mesh? So the model is | doing "shape completion" (e.g. they show generating a chair | from just some legs)... or possibly generating "variations" | when the input shape is more complete? | | But I guess it's a starting point... maybe you could use | another model that does worse quality text-to-mesh as the input | and get something more crisp and coherent from this one. | carbocation wrote: | On my phone so I've only read this promo page - could this | approach be modified for surface reconstruction from a 3D point | cloud? | kranke155 wrote: | My chosen profession (3D / filmmaking) feels like being in some | kind of combat trench at the moment. Both fascinating and scary | nextworddev wrote: | What do you ascertain the use case of this in your field? Does | it seem high quality? (I have no context) | zavertnik wrote: | I'm not a professional in VFX, but I work in television and | do a lot of VFX/3D work on the side. The quality isn't | amazing, but it looks like this could be the start of a | Midjourney-tier VFX/3D LLM, which would be awesome. For me, | this would help bridge the gap between having to use/find | premade assets and building what I want. | | For context, building from scratch in a 3D pipeline requires | you to wear a lot of different hats (modeling, materials, | lighting, framing, animating, ect). It costs a lot of time to | get to not only learn these hats but also use them together. | The individual complexity of those skill sets makes it | difficult to experiment and play around, which is how people | learn with software. | | The shortcut is using premade assets or addons. For instance, | being able to use the Source game assets in Source Filmmaker | combined with SFM using a familiar game engine makes it easy | to build an intuition with the workflow. This makes Source | Filmmaker accessible and its why theres so much content out | there made with it. So if you have gaps in your skillset or | need to save time, you'll buy/use premade assets. This comes | at a cost of control, but that's always been the tradeoff | between building what you want and building with what you | have. | | Just like GPT and DALL-E built a bridge between building what | you want and building with what you have, a high fidelity GPT | for the 3D pipeline would make that world so much more | accessible and would bring the kind of attention NLE video | editing got in the post-Youtube world. If I could describe in | text and/or generate an image of a scene I want and have a | GPT create the objects, model them, generate textures, and | place them in the scene, I could suddenly just open blender, | describe a scene, and just experimenting with shooting in it, | as if I was playing in a sandbox FPS game. | | I'm not sure if MeshGPT is the ChatGPT of the 3D pipeline, | but I do think this is kind of content generation is the | conduit for the DALL-E of video that so many people are | terrified and/or excited for. | gavinray wrote: | On an unrelated note, could I ask your opinion? | | My wife is passionate about film/TV production and VFX. | | She's currently in school for this but is concerned about | the difficulty of landing a job afterwards. | | Do you have any recommendations on breaking into the | industry without work experience? | bsenftner wrote: | So you're probably familiar with the role of a Bidding | Producer; imagine the difficulty they are facing: on one side | they have filmmakers saying they just read so and so is now | created by AI, while that is news to the bidding producer and | their VFX/animation studio clients scrambling as everything | they do is new again. | sheepscreek wrote: | Perhaps one way to look at this could be auto-scaffolding. The | typical modelling and CAD tools might include this feature to | get you up and running faster. | | Another massive benefit is composability. If the model can | generate a cup and a table, it also knows how to generate a cup | on a table. | | Think of all the complex gears and machine parts this could | generate in the blink of an eye, while being relevant to the | project - rotated and positioned exact where you want it. Very | similar to how GitHub Copilot works. | worldsayshi wrote: | I don't see that LLM's have come that much further in 3D | animation than programming in this regard: It can spit out bits | and pieces that looks okay in isolation but a human need to | solve the puzzle. And often solving the puzzle means | rewriting/redoing most of the pieces. | | We're safe for now but we should learn how to leverage the new | tech. | andkenneth wrote: | This is the "your job won't be taken away by AI, it will be | taken away by someone who knows how to leverage AI better | than you" | trostaft wrote: | Seems like the bibtex on the page is broken? Or might just be an | extension of mine. | alexose wrote: | It sure feels like every remaining hard problem (i.e., the ones | where we haven't made much progress since the 90s) is in line to | be solved by transformers in some fashion. What a time to be | alive. | mclanett wrote: | This is very cool. You can start with an image, generate a mesh | for it, render it, and then compare the render to the image. | Fully automated training. | j7ake wrote: | I love this field. Paper include a nice website, examples, and | videos. | | So much more refreshing than the dense abstract, intro, results | paper style. | valine wrote: | Even if this is "only" mesh autocomplete, it is still massively | useful for 3D artists. There's a disconnect right now between how | characters are sculpted and how characters are animated. You'd | typically need a time consuming step to retopologize your model. | Transformer based retopology that takes a rough mesh and gives | you clean topology would be a big time saver. | | Another application: take the output of your gaussian splatter or | diffusion model and run it through MeshGPT. Instant usable assets | with clean topology from text. | toxik wrote: | What you have to understand is that these methods are very | sensitive to what is in distribution and out of distribution. | If you just plug in user data, it will likely not work. | toxik wrote: | This was done years ago, with transformers. It was then dubbed | Polygen. | Sharlin wrote: | You might want to RTFA. Polygen and other prior art are | mentioned. This approach is superior. | toxik wrote: | I read the article. It has exactly the same limitations as | Polygen from what I can tell. | dymk wrote: | Their comparison against PolyGen looks like it's a big | improvement. What are the limitations that this has in | common with PolyGen that make it still not useful? | toxik wrote: | I don't think it's as widely applicable as they try to | make it seem. I have worked specifically with PolyGen, | and the main problem is "out of distribution" data. | Basically anything you want to do will likely be outside | the training distribution. This surfaces as sequencing. | How do you determine which triangle or vertex to place | first? Why would a user do it that way? What if I want to | draw a table with the legs last? Cannot be done. The | model is autoregressive. | mlsu wrote: | The next breakthrough will be the UX to create 3d scenes in front | of a model like this, in VR. This would basically let you | _generate_ a permanent, arbitrary 3D environment, for any | environment for which we have training data. | | Diffusion models could be used to generate textures. | | Mark is right and so so early. | amelius wrote: | Is this limited to shapes that have mostly flat faces? | catapart wrote: | Dang, this is getting so good! Still got a ways to go, with the | weird edges, but at this point, that feels like 'iteration | details' rather than an algorithmic or otherwise complex problem. | | It's really going to speed up my pipeline to not have to pipe all | of my meshes into a procgen library with a million little mesh | modifiers hooked up to drivers. Instead, I can just pop all of my | meshes into a folder, train the network on them, and then start | asking it for other stuff in that style, knowing that I won't | have to re-topo or otherwise screw with the stuff it makes, | unless I'm looking for more creative influence. | | Of course, until it's all the way to that point, I'm still better | served by the procgen; but I'm very excited by how quickly this | is coming together! Hopefully by next year's Unreal showcase, | they'll be talking about their new "Asset Generator" feature. | truckerbill wrote: | Do you have a recommended procgen lib? | catapart wrote: | Oh man, sorry, I wish! I've been using cobbled together bits | of python plugins that handle Blender's geometry nodes, and | the geometry scripts tools in Unreal. I haven't even ported | over to their new proc-gen tools, which I suspect can be | pretty useful. | circuit10 wrote: | Can this handle more organic shapes? | LarsDu88 wrote: | As a machine learning engineer who dabbles with Blender and hobby | gamedev, this is pretty impressive, but not quite to the point of | being useful in any practical manner (as far as the limited | furniture examples are concerned. | | A competent modeler can make these types of meshes in under 5 | minutes, and you still need to seed the generation with polys. | | I imagine the next step will be to have the seed generation | controlled by an LLM, and to start adding image models to the | autoregressive parts of the architecture. | | Then we might see truly mobile game-ready assets! | th0ma5 wrote: | This is a very underrated comment... As with any tech demo, I'd | they don't show it, it can't do it. It is very very easy to | imagine a generalization of these things to other purposes, | which, if it could do it, would be a different presentation. | rawrawrawrr wrote: | It's research, not meant for commercialization. The main | point is in the process, not necessarily the output. | empath-nirvana wrote: | > A competent modeler can make these types of meshes in under 5 | minutes. | | I don't think this general complaint about AI workflows is that | useful. Most people are not a competent <insert job here>. Most | people don't know a competent <insert job here> or can't afford | to hire one. Even something that takes longer than a | professional do at worse quality for many things is better than | _nothing_ which is the realistic alternative for most people | who would use something like this. | cannonpalms wrote: | Is the target market really "most people," though? I would | say not. The general goal of all of this economic investment | is to improve the productivity of labor--that means first and | foremost that things need to be useful and practical for | those trained to make determinations such as "useful" and | "practical." | taneq wrote: | Millions of people generating millions of images (some of | them even useful!) using Dall-E and Stable Diffusion would | say otherwise. A skilled digital artist could create most | of these images in an hour or two, I'd guess... but 'most | people' certainly could not, and it turns out that these | people really want to. | chefandy wrote: | > I don't think this general complaint about AI workflows is | that useful | | Maybe not to you, but it's useful if you're in these fields | professionally, though. The difference between a neat | hobbyist toolkit and a professional toolkit has gigantic | financial implications, even if the difference is minimal to | "most people." | Kaijo wrote: | The mesh topology here would see these rejected as assets for | in basically any professional context. A competent modeler | could make much higher quality models, more suited to texturing | and deformation, in under five minutes. A speed modeler could | make the same in under a minute. And a procedural system in | something like Blender geonodes can already spit out an endless | variety of such models. But the pace of progress is staggering. | frozencell wrote: | Not reproducible with code = Not research. ___________________________________________________________________ (page generated 2023-11-28 23:00 UTC)