[HN Gopher] Nvidia Research Turns 2D Photos into 3D Scenes in th... ___________________________________________________________________ Nvidia Research Turns 2D Photos into 3D Scenes in the Blink of an AI Author : bcaulfield Score : 144 points Date : 2022-03-25 20:51 UTC (2 hours ago) (HTM) web link (blogs.nvidia.com) (TXT) w3m dump (blogs.nvidia.com) | XorNot wrote: | So the part which makes this interesting to me is the speed. My | new desire in our video conferencing world these days has been to | have my camera on but running a corrected model of myself so I | can sustain apparently eye-contact without needing to look | directly at the camera. | aaron695 wrote: | siavosh wrote: | I'm curious for those that work with NeRFs what their results | look like for random images as opposed to the 'nice' ones that | are selected for publications/demos. | jrib wrote: | Just want to say I appreciate the cleverness of the title. | daenz wrote: | >The model requires just seconds to train on a few dozen still | photos -- plus data on the camera angles they were taken from -- | and can then render the resulting 3D scene within tens of | milliseconds. | | Generating the novel viewpoints is almost fast enough for VR, | assuming you're tethered to a desktop computer with whatever GPUs | they're using (probably the best setup possible). | | The holy grail (from my estimation) is getting both the training | and the rendering to fit into a VR frame budget. They'll probably | achieve it soon with some very clever techniques that only | require differential re-training as the scene changes. The result | will be a VR experience with live people and objects that feels | photorealistic, because it essentially is based on real photos. | simsla wrote: | > plus data on the camera angles they were taken from | | Doesn't seem like much of a stretch to determine the angles as | well. | | E.g. a semi brute forced way with GANs | riotnrrd wrote: | You don't even need anything that fancy. Traditional | structure-from-motion, or visual odometry gives accurate | enough position estimations. | | If you want to experiment, take a bunch (~100) of photos of | an object, and use COLMAP to generate the poses. COLMAP | implements a global SfM technique, so it will be very | accurate but very slow. | c4wrd wrote: | I've spent a lot of time thinking about this (i.e. taking a | video and creating a 3D scene) and I don't think that it is | feasible in most cases to have good accuracy. If you need to | infer the angle, you need make a lot of biased assumptions | about things like velocity, position, etc., of the camera and | even if you were 99.9% accurate, that 0.1% inaccuracy is | compounded over time. Now I'm not saying it's not possible, | but I'd believe that if you want an accurate 3D scene, you'd | rather be spending your computation budget on things other | than determining those angles when it can be simply be | provided by hardware. | krasin wrote: | https://github.com/NVLabs/instant-ngp has a script that | converts a video into frames and then uses COLMAP ([1]) to | compute camera poses. You can then train a NeRF model | within a few seconds. | | It all works pretty well. Trying it on your own video is | pretty straightforward. | | 1. https://colmap.github.io/ | riotnrrd wrote: | You're far too pessimistic (or maybe you don't know the | field well). The problem of estimating the relative poses | of the cameras responsible for a set of photos is a long | standing and essentially "solved" problem in computer | vision. I say "solved" because there is still active | research (increasing accuracy, faster, more robust, etc.) | but there are decades-old, well known techniques that any | dedicated programmer could implement in a week. | | If you're genuinely curious, look into structure from | motion, visual odometry, or SLAM. | doliveira wrote: | > even if you were 99.9% accurate, that 0.1% inaccuracy is | compounded over time | | Not really, with SLAM there are various algorithms to keep | inaccuracy in check. Basically it works by a feedback loop | of guessing an estimate for position and then updating it | using landmarks. | anyfactor wrote: | Tangent | | I wonder what happens to most people when they see innovation | such as this. Over the years I have seen numerous mind-blowing AI | achievement, which essentially feel like miracles. Yet literally | after an hour I forget what I even saw. I don't find these | innovations to have a lasting impression on me or on the internet | except for the times when these solutions are released to the | public for tinkering and they end up failing catastrophically. | | I remember having the same feeling about chatbots and TTS | technology literally ages ago, but at present time, the practical | use of these innovation feel very mediocre. | tomatowurst wrote: | hmmm I really find this to be different from chatbots, in fact | it took me a lot of skepticism to overcome before using github | copilot and I saw a new reality where it became part of the | process, albeit, not as prolific but enough to make me ponder | what the next evolution might be. | | For 3D modelers, this is huge since it takes a lot of | experience and grunt work to put the right touches to get an | even a boilerplate 3D model. So much so that many game | companies have outsourced non-human 3d modeling, this would | certainly impact those markets. | | 1) It could further lower the cost and improve quality. | | 2) Studios could move back those time-consuming tasks on-shore | and put an experienced in house artist/modeler to manage the | production. | | 3) Hybrid of both | | What I see here is that NeRF has a far more impact to the 3d | modeling/animating industry than github copilot. Another | certainty is that we are going to see faster rate innovation. | We are at a point where a paper released merely months ago are | being completely outpaced by another. The improvement in | training time that NeRF offers is insane, especially given how | quickly this new approach came out. | | We could be at a future where the release of AI achievements | will not be able to keep up with published works. It would be | as fast as somebody tweeting a new technique, only to be | outdone by somebody weeks or possibly days. | | Truly exciting times. | rilezg wrote: | I think with any tech demo (or other corporate PR piece), it is | good to assume the worst, because companies spin things to be | as ducky as possible. This is a self-reinforcing cycle, because | if two companies have identical products, then the best liar-- | er, marketer--will win. | | (not to say this sort of behavior is exclusive to corporate PR. | as the best and smartest person ever, I would never need to | exaggerate my achievements on a job application, but others | may) | TulliusCicero wrote: | The problem is that when the thing is initially announced, it's | not useful to anyone yet, because it's not productionized and | released to the general public. | | But then once it IS released to the general public, it's | probably been at least several months, maybe even multiple | years since the announcement, so people are like, " _yawn_ , | this is old news." | fleischhauf wrote: | I have the impression, that now some of them seem to really end | up in some practical applications. Funnily enough someone just | today showed me a feature of his phone where you can select | some undesired objects in youe photo and it would just replace | them with a fitting background indistinguishable from the | original photo. | ksec wrote: | Nvidia is really turning into an AI powerhouse. The moat around | CUDA, and how those target customer aren't as stringent about | budget, especially when the hardware cost is tiny compare to what | they do. | | I wonder if they could reach a trillion market cap. | alanwreath wrote: | I'm probably going to ramp up the number of photos I take in hope | that google photos auto applies this tech | danamit wrote: | I am kinda skeptical, AI demos are impressive but the real world | results are underwhelming. | | How much it resources it takes to generate images like that? is | this the most ideal situation? | | Can you take images from the web and based on metadata make a | better street view? | | With all this AI where is one accessible translation service? or | even an accent-adjusting service? or just good auto-subtitles? | woah wrote: | Are there examples of this being used on large outdoor spaces? | krasin wrote: | Yes, Waymo did the whole San Francisco block: | https://waymo.com/research/block-nerf/ | noduerme wrote: | There's something very uncanny-valley about that video. I | can't decide if it's the smoothness of the shading on the | textures or if it's the way the parallax perspective on the | buildings sometimes is just a tiny bit off. I don't generally | get motion sickness from VR but I feel like this would cause | it. | jowday wrote: | You'll find this is true of all NeRFs if you spend time | playing around with them. If a NeRF is trying to render | part of an object that wasn't observered in the input | images, it's going to look strange, since it's ultimately | just guessing at the appearance. The NVidia example in the | link has the benefit of focusing on a single entity that's | centered in all of the input photographs - the effect is | much more pronounced in large scale scenes with tons of | objects, like the Waymo one. You can still see some of this | distortion in the NVidia one - pay close attention to the | backside of the woman's left shoulder. You'll see a faint | haze or blur near her shoulder - the input images didn't | contain a clear shot of it from multiple angles, so the | model has to guess when rendering it. | gundmc wrote: | Woah, this video is way more interesting than the Nvidia | polaroid teaser in the original link. | krasin wrote: | Still, NVIDIA's achievement (and Thomas Muller in | particular) is amazing. Thomas and his collaborators | achieved an almost 1000x performance improvement, by a | combination of algorithmical and implementation tricks. | | I highly recommend trying this at home: | | https://nvlabs.github.io/instant-ngp/ | | https://github.com/NVlabs/instant-ngp | | Very straightforward and gives better insight into what | NeRF is than any shiny marketing demo. | cinntaile wrote: | Waymo needed 2.8 million images to create that scene, I | wonder how many Nvidia would need? Or was the focus only | on speed? I skimmed the article and didn't really find | info on that. | krasin wrote: | Waymo essentially trained several NeRF models for Block- | NeRF that are rendered together. It's conceivable that | NVIDIA's instant-ngp could be used for that. | xrd wrote: | This nerf project is cool too. | | https://github.com/bmild/nerf | | I've been trying to get GANs to do this for a while, but NeRFs | look like the perfect fit. | maybelsyrup wrote: | Is anyone else kinda terrified? | sorenjan wrote: | I don't really understand why NeRFs would be particularly useful | in more than a few niche cases, perhaps because I don't fully | understand what they really are. | | My impression is that you take a bunch of photos in various | places and directions, then you use those as samples of a 3D | function that describes the full scene, and optimize a neural | network to minimize the difference between the true light field | and what's described by the network. An approximation of the | actual function, that fits the training data. The millions of | coefficients are seen as a black box that somehow describes the | scene when combined in a certain way, I guess mapping a camera | pose to a rendered image? But why would that be better than some | other data structure, like a mesh, a point cloud, or signed | distance field, where you have the scene as structured data you | can reason about? What happens if you want to animate part of a | NeRF, or crop it, or change it in any way? Do you have to throw | away all trained coefficients and start again from training data? | | Can you use this method as a part of a more traditional | photogrammetry pipeline and extract the result as a regular mesh? | Nvidia seems to suggest that NeRFs are in some way better than | meshes, but according to my flawed understanding they just seem | unwieldy. | sennight wrote: | I know that taste in comedy is seasonal (yes, there were a people | in a time that thought vaudeville was the cat's pajamas), but has | anyone ever greeted a pun with anything other than a pained sigh? | noduerme wrote: | It's ones like this that make me shake my head and go "Aiaiai." | ModernMech wrote: | Watch Bob's Burgers. The whole show is basically puns. I | chuckle. | cogman10 wrote: | Puns aren't to make people laugh, the pained sigh is the point. | It's schadenfreude for the person making the pun. | sennight wrote: | > It's schadenfreude for the person making the pun. | | Nah, if it is a joke at their own expense then it is "self | deprecating humor", something which is definitely designed to | get a laugh. Humiliation fetish, maybe? Obviously nothing is | funny past a certain point of deconstruction... especially if | you find yourself defending the distinguishing difference of | the "meta". Just stop making puns, easy. | bogwog wrote: | Nvidia is leaving us all behind | [deleted] | syspec wrote: | Is there a video of this? I'm not sure what's the connection to | the top photo/video/matrix-360-effect | | Was that created from a few photos? I didn't see any additional | imagery below | | --- Update | | It looks like these are the four source photos: | https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-R... | | Then it creates this 360 video from them: | https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_... | elil17 wrote: | My prediction/hope is that NeRFs will totally revolutionize how | the film/TV industry. I can imagine: | | - Shooting a movie from a few cameras, creating a movie version | of a NeRF using those angles, and then dynamically adding in | other shots in post | | - Using lighting and depth information embedded in NeRFs to | assist in lighting/integrating CG elements | | - Using NeRFs to generate virtual sets on LED walls (like those | on The Mandalorian) from just a couple of photos of a location or | a couple of renders of a scene (currently, the sets have to be | built in a game engine and optimized for real time performance). | jowday wrote: | This sort of stuff (generating 3D assets from photographs of | real objects) has been common for quite a while via | photogrammetry. NeRFs are interesting because (in some cases) | they can create renders that look higher quality with fewer | photos, and they hint at the potential of future learned | rendering models. | cogman10 wrote: | Perhaps even making non-gimmicky live action 3d films. | | Having 3d renders of the entire film without needing green | screens and a bunch of balls seems like it would have to make | some of the post processing work easier. You can add or remove | elements. Adjust the camera angles. More effectively de-age | actors. Heck, even create scenes whole cloth if an actor | unexpectedly dies (since you still have their model). | | Seems like you could also save some time having fewer takes. | What you can fix in post would be dramatically expanded. | | Best part for film makers, they are often using multiple | cameras anyways. So this doesn't seem like it'd be too much of | a stretch. | ALittleLight wrote: | My, maybe too extreme, future fantasy version of this is | turning existing movies into 3d movies you could watch in VR. | andy_ppp wrote: | Computer games, VR and AR could also be pretty amazing uses for | this technique too. | teaearlgraycold wrote: | RIP photo-realistic modelers | tomatowurst wrote: | hmmm well I still think they will be in demand for the same | reason software developers will be not automated away. NeRF | is really mind boggling good but there are still artifacts, | and something that modelers have a good eye for. | | Having said that, it might be the end for any junior type | of roles. Same reason that github copilot really takes a | bite of the need to have a junior developer. | | I'm very curious what will happen because it will become a | sort of trend across other industries apart from legal or | medical professions (peace of mind from human-in-the-loop). | teaearlgraycold wrote: | Maybe we'll have people spend their time building IRL | sculptures and spaces to get digitized. | EugeneOZ wrote: | It will boost cut-scenes in games as well. | usrusr wrote: | > - Using lighting and depth information embedded in NeRFs to | assist in lighting/integrating CG elements | | > - Using NeRFs to generate virtual sets on LED walls | | Sounds like a powerful set of tools to defeat a number of image | manipulation detection tricks, with limited effort once the | process is set up as routine. State actor level information | warfare will soon be a class of its own. Not just in terms of | getting harder to detect, but more importantly in terms of | becoming able to produce "quality" in high volume. | gareth_untether wrote: | AI and 3D content making is becoming so exciting. Soon we'll have | an idea and be able to make it with automated tools. Sure having | a deeper undertaking of how 3D works will be beneficial, but will | no longer be the entry requirement. | PaulHoule wrote: | If you have a graphics card which is unobtainable. | baron816 wrote: | It would be really great to recreate loved ones after they have | past in some sort of digital space. | | As I've gotten older, and my parents get older as well, I've been | thinking more about what my life will be like in old age (and | beyond too). I've also been thinking what I would want "heaven" | to be. Eternal life doesn't appeal to me much. Imagine living a | quadrillion years. Even as a god, that would be miserable. That | would be (by my rough estimate) the equivalent of 500 times the | cumulative lifespans of all humans who have ever lived. | | What I would really like is to see my parents and my beloved dog | again, decades after they have past (along with any living ones | at that time). Being able to see them and speak to them one last | time at the end of my life before fading into eternal darkness | would be how I would want to go. | | Anyway, there's a free startup idea for anyone--recreate loved | ones in VR so people can see them again. | rilezg wrote: | This reminds me a lot of Black Mirror season 2 episode 1. | | Always good to treasure the time we are given. | the_mar wrote: | and then you have to put your loved one's in the attic | olladecarne wrote: | There's so many things we invent with good intentions but in | the end go terribly wrong and I think this is one of those | things. I think it's ok to mourn and remember the past, but | moving on and accepting reality is important to a healthy life. | | Let's be real though, the startup that makes this but appeals | to our worst instincts make bank. I can't imagine how much more | messed up future generations will be as we keep making more | dangerous technology that appeals to our primal instincts. | lowdest wrote: | >Imagine living a quadrillion years. Even as a god, that would | be miserable. | | This seems very subjective, I don't agree at all. | luckydata wrote: | I'm really looking forward to this technology getting applied to | home improvement. ___________________________________________________________________ (page generated 2022-03-25 23:00 UTC)