[HN Gopher] Nvidia Research Turns 2D Photos into 3D Scenes in th...
       ___________________________________________________________________
        
       Nvidia Research Turns 2D Photos into 3D Scenes in the Blink of an
       AI
        
       Author : bcaulfield
       Score  : 144 points
       Date   : 2022-03-25 20:51 UTC (2 hours ago)
        
 (HTM) web link (blogs.nvidia.com)
 (TXT) w3m dump (blogs.nvidia.com)
        
       | XorNot wrote:
       | So the part which makes this interesting to me is the speed. My
       | new desire in our video conferencing world these days has been to
       | have my camera on but running a corrected model of myself so I
       | can sustain apparently eye-contact without needing to look
       | directly at the camera.
        
       | aaron695 wrote:
        
       | siavosh wrote:
       | I'm curious for those that work with NeRFs what their results
       | look like for random images as opposed to the 'nice' ones that
       | are selected for publications/demos.
        
       | jrib wrote:
       | Just want to say I appreciate the cleverness of the title.
        
       | daenz wrote:
       | >The model requires just seconds to train on a few dozen still
       | photos -- plus data on the camera angles they were taken from --
       | and can then render the resulting 3D scene within tens of
       | milliseconds.
       | 
       | Generating the novel viewpoints is almost fast enough for VR,
       | assuming you're tethered to a desktop computer with whatever GPUs
       | they're using (probably the best setup possible).
       | 
       | The holy grail (from my estimation) is getting both the training
       | and the rendering to fit into a VR frame budget. They'll probably
       | achieve it soon with some very clever techniques that only
       | require differential re-training as the scene changes. The result
       | will be a VR experience with live people and objects that feels
       | photorealistic, because it essentially is based on real photos.
        
         | simsla wrote:
         | > plus data on the camera angles they were taken from
         | 
         | Doesn't seem like much of a stretch to determine the angles as
         | well.
         | 
         | E.g. a semi brute forced way with GANs
        
           | riotnrrd wrote:
           | You don't even need anything that fancy. Traditional
           | structure-from-motion, or visual odometry gives accurate
           | enough position estimations.
           | 
           | If you want to experiment, take a bunch (~100) of photos of
           | an object, and use COLMAP to generate the poses. COLMAP
           | implements a global SfM technique, so it will be very
           | accurate but very slow.
        
           | c4wrd wrote:
           | I've spent a lot of time thinking about this (i.e. taking a
           | video and creating a 3D scene) and I don't think that it is
           | feasible in most cases to have good accuracy. If you need to
           | infer the angle, you need make a lot of biased assumptions
           | about things like velocity, position, etc., of the camera and
           | even if you were 99.9% accurate, that 0.1% inaccuracy is
           | compounded over time. Now I'm not saying it's not possible,
           | but I'd believe that if you want an accurate 3D scene, you'd
           | rather be spending your computation budget on things other
           | than determining those angles when it can be simply be
           | provided by hardware.
        
             | krasin wrote:
             | https://github.com/NVLabs/instant-ngp has a script that
             | converts a video into frames and then uses COLMAP ([1]) to
             | compute camera poses. You can then train a NeRF model
             | within a few seconds.
             | 
             | It all works pretty well. Trying it on your own video is
             | pretty straightforward.
             | 
             | 1. https://colmap.github.io/
        
             | riotnrrd wrote:
             | You're far too pessimistic (or maybe you don't know the
             | field well). The problem of estimating the relative poses
             | of the cameras responsible for a set of photos is a long
             | standing and essentially "solved" problem in computer
             | vision. I say "solved" because there is still active
             | research (increasing accuracy, faster, more robust, etc.)
             | but there are decades-old, well known techniques that any
             | dedicated programmer could implement in a week.
             | 
             | If you're genuinely curious, look into structure from
             | motion, visual odometry, or SLAM.
        
             | doliveira wrote:
             | > even if you were 99.9% accurate, that 0.1% inaccuracy is
             | compounded over time
             | 
             | Not really, with SLAM there are various algorithms to keep
             | inaccuracy in check. Basically it works by a feedback loop
             | of guessing an estimate for position and then updating it
             | using landmarks.
        
       | anyfactor wrote:
       | Tangent
       | 
       | I wonder what happens to most people when they see innovation
       | such as this. Over the years I have seen numerous mind-blowing AI
       | achievement, which essentially feel like miracles. Yet literally
       | after an hour I forget what I even saw. I don't find these
       | innovations to have a lasting impression on me or on the internet
       | except for the times when these solutions are released to the
       | public for tinkering and they end up failing catastrophically.
       | 
       | I remember having the same feeling about chatbots and TTS
       | technology literally ages ago, but at present time, the practical
       | use of these innovation feel very mediocre.
        
         | tomatowurst wrote:
         | hmmm I really find this to be different from chatbots, in fact
         | it took me a lot of skepticism to overcome before using github
         | copilot and I saw a new reality where it became part of the
         | process, albeit, not as prolific but enough to make me ponder
         | what the next evolution might be.
         | 
         | For 3D modelers, this is huge since it takes a lot of
         | experience and grunt work to put the right touches to get an
         | even a boilerplate 3D model. So much so that many game
         | companies have outsourced non-human 3d modeling, this would
         | certainly impact those markets.
         | 
         | 1) It could further lower the cost and improve quality.
         | 
         | 2) Studios could move back those time-consuming tasks on-shore
         | and put an experienced in house artist/modeler to manage the
         | production.
         | 
         | 3) Hybrid of both
         | 
         | What I see here is that NeRF has a far more impact to the 3d
         | modeling/animating industry than github copilot. Another
         | certainty is that we are going to see faster rate innovation.
         | We are at a point where a paper released merely months ago are
         | being completely outpaced by another. The improvement in
         | training time that NeRF offers is insane, especially given how
         | quickly this new approach came out.
         | 
         | We could be at a future where the release of AI achievements
         | will not be able to keep up with published works. It would be
         | as fast as somebody tweeting a new technique, only to be
         | outdone by somebody weeks or possibly days.
         | 
         | Truly exciting times.
        
         | rilezg wrote:
         | I think with any tech demo (or other corporate PR piece), it is
         | good to assume the worst, because companies spin things to be
         | as ducky as possible. This is a self-reinforcing cycle, because
         | if two companies have identical products, then the best liar--
         | er, marketer--will win.
         | 
         | (not to say this sort of behavior is exclusive to corporate PR.
         | as the best and smartest person ever, I would never need to
         | exaggerate my achievements on a job application, but others
         | may)
        
         | TulliusCicero wrote:
         | The problem is that when the thing is initially announced, it's
         | not useful to anyone yet, because it's not productionized and
         | released to the general public.
         | 
         | But then once it IS released to the general public, it's
         | probably been at least several months, maybe even multiple
         | years since the announcement, so people are like, " _yawn_ ,
         | this is old news."
        
         | fleischhauf wrote:
         | I have the impression, that now some of them seem to really end
         | up in some practical applications. Funnily enough someone just
         | today showed me a feature of his phone where you can select
         | some undesired objects in youe photo and it would just replace
         | them with a fitting background indistinguishable from the
         | original photo.
        
       | ksec wrote:
       | Nvidia is really turning into an AI powerhouse. The moat around
       | CUDA, and how those target customer aren't as stringent about
       | budget, especially when the hardware cost is tiny compare to what
       | they do.
       | 
       | I wonder if they could reach a trillion market cap.
        
       | alanwreath wrote:
       | I'm probably going to ramp up the number of photos I take in hope
       | that google photos auto applies this tech
        
       | danamit wrote:
       | I am kinda skeptical, AI demos are impressive but the real world
       | results are underwhelming.
       | 
       | How much it resources it takes to generate images like that? is
       | this the most ideal situation?
       | 
       | Can you take images from the web and based on metadata make a
       | better street view?
       | 
       | With all this AI where is one accessible translation service? or
       | even an accent-adjusting service? or just good auto-subtitles?
        
       | woah wrote:
       | Are there examples of this being used on large outdoor spaces?
        
         | krasin wrote:
         | Yes, Waymo did the whole San Francisco block:
         | https://waymo.com/research/block-nerf/
        
           | noduerme wrote:
           | There's something very uncanny-valley about that video. I
           | can't decide if it's the smoothness of the shading on the
           | textures or if it's the way the parallax perspective on the
           | buildings sometimes is just a tiny bit off. I don't generally
           | get motion sickness from VR but I feel like this would cause
           | it.
        
             | jowday wrote:
             | You'll find this is true of all NeRFs if you spend time
             | playing around with them. If a NeRF is trying to render
             | part of an object that wasn't observered in the input
             | images, it's going to look strange, since it's ultimately
             | just guessing at the appearance. The NVidia example in the
             | link has the benefit of focusing on a single entity that's
             | centered in all of the input photographs - the effect is
             | much more pronounced in large scale scenes with tons of
             | objects, like the Waymo one. You can still see some of this
             | distortion in the NVidia one - pay close attention to the
             | backside of the woman's left shoulder. You'll see a faint
             | haze or blur near her shoulder - the input images didn't
             | contain a clear shot of it from multiple angles, so the
             | model has to guess when rendering it.
        
           | gundmc wrote:
           | Woah, this video is way more interesting than the Nvidia
           | polaroid teaser in the original link.
        
             | krasin wrote:
             | Still, NVIDIA's achievement (and Thomas Muller in
             | particular) is amazing. Thomas and his collaborators
             | achieved an almost 1000x performance improvement, by a
             | combination of algorithmical and implementation tricks.
             | 
             | I highly recommend trying this at home:
             | 
             | https://nvlabs.github.io/instant-ngp/
             | 
             | https://github.com/NVlabs/instant-ngp
             | 
             | Very straightforward and gives better insight into what
             | NeRF is than any shiny marketing demo.
        
               | cinntaile wrote:
               | Waymo needed 2.8 million images to create that scene, I
               | wonder how many Nvidia would need? Or was the focus only
               | on speed? I skimmed the article and didn't really find
               | info on that.
        
               | krasin wrote:
               | Waymo essentially trained several NeRF models for Block-
               | NeRF that are rendered together. It's conceivable that
               | NVIDIA's instant-ngp could be used for that.
        
       | xrd wrote:
       | This nerf project is cool too.
       | 
       | https://github.com/bmild/nerf
       | 
       | I've been trying to get GANs to do this for a while, but NeRFs
       | look like the perfect fit.
        
       | maybelsyrup wrote:
       | Is anyone else kinda terrified?
        
       | sorenjan wrote:
       | I don't really understand why NeRFs would be particularly useful
       | in more than a few niche cases, perhaps because I don't fully
       | understand what they really are.
       | 
       | My impression is that you take a bunch of photos in various
       | places and directions, then you use those as samples of a 3D
       | function that describes the full scene, and optimize a neural
       | network to minimize the difference between the true light field
       | and what's described by the network. An approximation of the
       | actual function, that fits the training data. The millions of
       | coefficients are seen as a black box that somehow describes the
       | scene when combined in a certain way, I guess mapping a camera
       | pose to a rendered image? But why would that be better than some
       | other data structure, like a mesh, a point cloud, or signed
       | distance field, where you have the scene as structured data you
       | can reason about? What happens if you want to animate part of a
       | NeRF, or crop it, or change it in any way? Do you have to throw
       | away all trained coefficients and start again from training data?
       | 
       | Can you use this method as a part of a more traditional
       | photogrammetry pipeline and extract the result as a regular mesh?
       | Nvidia seems to suggest that NeRFs are in some way better than
       | meshes, but according to my flawed understanding they just seem
       | unwieldy.
        
       | sennight wrote:
       | I know that taste in comedy is seasonal (yes, there were a people
       | in a time that thought vaudeville was the cat's pajamas), but has
       | anyone ever greeted a pun with anything other than a pained sigh?
        
         | noduerme wrote:
         | It's ones like this that make me shake my head and go "Aiaiai."
        
         | ModernMech wrote:
         | Watch Bob's Burgers. The whole show is basically puns. I
         | chuckle.
        
         | cogman10 wrote:
         | Puns aren't to make people laugh, the pained sigh is the point.
         | It's schadenfreude for the person making the pun.
        
           | sennight wrote:
           | > It's schadenfreude for the person making the pun.
           | 
           | Nah, if it is a joke at their own expense then it is "self
           | deprecating humor", something which is definitely designed to
           | get a laugh. Humiliation fetish, maybe? Obviously nothing is
           | funny past a certain point of deconstruction... especially if
           | you find yourself defending the distinguishing difference of
           | the "meta". Just stop making puns, easy.
        
       | bogwog wrote:
       | Nvidia is leaving us all behind
        
       | [deleted]
        
       | syspec wrote:
       | Is there a video of this? I'm not sure what's the connection to
       | the top photo/video/matrix-360-effect
       | 
       | Was that created from a few photos? I didn't see any additional
       | imagery below
       | 
       | --- Update
       | 
       | It looks like these are the four source photos:
       | https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-R...
       | 
       | Then it creates this 360 video from them:
       | https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_...
        
       | elil17 wrote:
       | My prediction/hope is that NeRFs will totally revolutionize how
       | the film/TV industry. I can imagine:
       | 
       | - Shooting a movie from a few cameras, creating a movie version
       | of a NeRF using those angles, and then dynamically adding in
       | other shots in post
       | 
       | - Using lighting and depth information embedded in NeRFs to
       | assist in lighting/integrating CG elements
       | 
       | - Using NeRFs to generate virtual sets on LED walls (like those
       | on The Mandalorian) from just a couple of photos of a location or
       | a couple of renders of a scene (currently, the sets have to be
       | built in a game engine and optimized for real time performance).
        
         | jowday wrote:
         | This sort of stuff (generating 3D assets from photographs of
         | real objects) has been common for quite a while via
         | photogrammetry. NeRFs are interesting because (in some cases)
         | they can create renders that look higher quality with fewer
         | photos, and they hint at the potential of future learned
         | rendering models.
        
         | cogman10 wrote:
         | Perhaps even making non-gimmicky live action 3d films.
         | 
         | Having 3d renders of the entire film without needing green
         | screens and a bunch of balls seems like it would have to make
         | some of the post processing work easier. You can add or remove
         | elements. Adjust the camera angles. More effectively de-age
         | actors. Heck, even create scenes whole cloth if an actor
         | unexpectedly dies (since you still have their model).
         | 
         | Seems like you could also save some time having fewer takes.
         | What you can fix in post would be dramatically expanded.
         | 
         | Best part for film makers, they are often using multiple
         | cameras anyways. So this doesn't seem like it'd be too much of
         | a stretch.
        
         | ALittleLight wrote:
         | My, maybe too extreme, future fantasy version of this is
         | turning existing movies into 3d movies you could watch in VR.
        
         | andy_ppp wrote:
         | Computer games, VR and AR could also be pretty amazing uses for
         | this technique too.
        
           | teaearlgraycold wrote:
           | RIP photo-realistic modelers
        
             | tomatowurst wrote:
             | hmmm well I still think they will be in demand for the same
             | reason software developers will be not automated away. NeRF
             | is really mind boggling good but there are still artifacts,
             | and something that modelers have a good eye for.
             | 
             | Having said that, it might be the end for any junior type
             | of roles. Same reason that github copilot really takes a
             | bite of the need to have a junior developer.
             | 
             | I'm very curious what will happen because it will become a
             | sort of trend across other industries apart from legal or
             | medical professions (peace of mind from human-in-the-loop).
        
               | teaearlgraycold wrote:
               | Maybe we'll have people spend their time building IRL
               | sculptures and spaces to get digitized.
        
         | EugeneOZ wrote:
         | It will boost cut-scenes in games as well.
        
         | usrusr wrote:
         | > - Using lighting and depth information embedded in NeRFs to
         | assist in lighting/integrating CG elements
         | 
         | > - Using NeRFs to generate virtual sets on LED walls
         | 
         | Sounds like a powerful set of tools to defeat a number of image
         | manipulation detection tricks, with limited effort once the
         | process is set up as routine. State actor level information
         | warfare will soon be a class of its own. Not just in terms of
         | getting harder to detect, but more importantly in terms of
         | becoming able to produce "quality" in high volume.
        
       | gareth_untether wrote:
       | AI and 3D content making is becoming so exciting. Soon we'll have
       | an idea and be able to make it with automated tools. Sure having
       | a deeper undertaking of how 3D works will be beneficial, but will
       | no longer be the entry requirement.
        
       | PaulHoule wrote:
       | If you have a graphics card which is unobtainable.
        
       | baron816 wrote:
       | It would be really great to recreate loved ones after they have
       | past in some sort of digital space.
       | 
       | As I've gotten older, and my parents get older as well, I've been
       | thinking more about what my life will be like in old age (and
       | beyond too). I've also been thinking what I would want "heaven"
       | to be. Eternal life doesn't appeal to me much. Imagine living a
       | quadrillion years. Even as a god, that would be miserable. That
       | would be (by my rough estimate) the equivalent of 500 times the
       | cumulative lifespans of all humans who have ever lived.
       | 
       | What I would really like is to see my parents and my beloved dog
       | again, decades after they have past (along with any living ones
       | at that time). Being able to see them and speak to them one last
       | time at the end of my life before fading into eternal darkness
       | would be how I would want to go.
       | 
       | Anyway, there's a free startup idea for anyone--recreate loved
       | ones in VR so people can see them again.
        
         | rilezg wrote:
         | This reminds me a lot of Black Mirror season 2 episode 1.
         | 
         | Always good to treasure the time we are given.
        
           | the_mar wrote:
           | and then you have to put your loved one's in the attic
        
         | olladecarne wrote:
         | There's so many things we invent with good intentions but in
         | the end go terribly wrong and I think this is one of those
         | things. I think it's ok to mourn and remember the past, but
         | moving on and accepting reality is important to a healthy life.
         | 
         | Let's be real though, the startup that makes this but appeals
         | to our worst instincts make bank. I can't imagine how much more
         | messed up future generations will be as we keep making more
         | dangerous technology that appeals to our primal instincts.
        
         | lowdest wrote:
         | >Imagine living a quadrillion years. Even as a god, that would
         | be miserable.
         | 
         | This seems very subjective, I don't agree at all.
        
       | luckydata wrote:
       | I'm really looking forward to this technology getting applied to
       | home improvement.
        
       ___________________________________________________________________
       (page generated 2022-03-25 23:00 UTC)