[HN Gopher] NeRF: Representing scenes as neural radiance fields ... ___________________________________________________________________ NeRF: Representing scenes as neural radiance fields for view synthesis Author : dfield Score : 148 points Date : 2020-03-20 14:25 UTC (8 hours ago) (HTM) web link (www.matthewtancik.com) (TXT) w3m dump (www.matthewtancik.com) | kuprel wrote: | This would be great for instant replays | jayd16 wrote: | Intel already does this with their "True View" setup. They also | had a tech demo CES where they synthesized camera positions for | movie sets. https://www.youtube.com/watch?v=9qd276AJg-o | blackhaz wrote: | Could someone ELI5, please? | type_enthusiast wrote: | They're modeling a scene mathematically as a "radiance field" - | a function that takes a view position and direction as inputs | and returns the light color that hits that position from the | direction it's facing. They use some input images to train a | neural network, in order to find an optimal radiance field | function which explains the input images. Once they have that | function, they can construct images from new angles by | evaluating the function over the (position, direction) inputs | needed by the pixels in the new image. | ur-whale wrote: | >Could someone ELI5, please? | | Smart, high-dimensional interpolator. | quadrature wrote: | its a very similar concept to photogrammetry which is | recovering a 3d representation of an object given pictures | taken from different angles. | | In this work they take pictures of a scene from different | angles and are able to train a neural network to render the | scene from new angles that aren't in any source pictures. | | The neural network takes in a location (x,y,z), a viewing | direction and spits out the RGB of the rendered image if you | were to view the scene at that location and angle. | | Using this network and traditional rendering techniques they | are able to render the whole scene. | wokwokwok wrote: | Significantly, the input is a sparse dataset. | | ie. Few source images vs. traditional photogrammetry. | | ...but basically yes, tldr; photogrammetry using neural | networks; this one is better than other recent attempts at | the same thing, but takes a really long time (2 days for this | vs 10 minutes for a voxel based approach in one of their | comparisons). | | Why bother? | | mmm... theres some kind speculation you might be able to | represent a photorealistic scene/ 3d object as a neural model | instead of voxels or meshes. | | That might be useful for some things. eg. say, a voxel | representation of semi transparent fog, or high detail | objects like hair are impractically huge, and as a mesh its | very difficult to represent. | visarga wrote: | > Why bother? | | There might be 10x speedups to be gained with a tweaked | model. | rebuilder wrote: | A number of things this seems to do well would be pretty | much impossible with standard photogrammetry : trees with | leaves, fine details like rigging on a ship, reflective | surfaces, even refraction (!) | | Of course the output is a new view, not a shaded mesh, but | given it appears to generate depth data, I think you should | be able to generate a point cloud and mesh it. Getting the | materials from the output light even be possible, I'm not | very up to date on the state of material capture nowadays. | imposter wrote: | Wow great | mooneater wrote: | If you give it a bunch of photos of a scene from different | angles, this machine learning method lets you see angles that | did not exist in the original set. | | Better results than other methods so far. | notfed wrote: | Fist bump for actually answering as ELI5 (unlike the other | responses). | teknopurge wrote: | This is bad-ass, partly because it's so elegant. | jayd16 wrote: | Very cool. Reminds me of when I played with Google's Seurat. | | The paper says its 5MB, 12 hours to train the NN and then 30 | seconds to render novel views of the scene on an nVidia V100. | | Sadly not something you can use in real time but still very cool. | | Edit:12 hours and 5MB NN not 5 Minutes | ssivark wrote: | Huh, what? It needs almost a million views, and takes 1-2 days | to train on a GPU. I'm not sure where the "5 minutes" number | comes from. | | EDIT: I was referring to the last paragraph of section 5.3 | (Implementation details), but maybe I'm misunderstanding how | they use rays / sampled coordinates. | | Very impressive visual quality. But it seems like they need a | LOT of data and computation for each scene. So, its still | plausible that intelligently done photogrammetry will beat this | approach in efficiency, but a bunch of important details need | to be figured out to make that happen. | jayd16 wrote: | Excuse me I meant 5MB. It takes 12 hours to train. | | >All compared single scene methods take at least 12 hours to | train per scene | | But it seems to only need sparse images. | | >Here, we visualize the set of 100 input views of the | synthetic Drums scene randomly captured on a surrounding | hemisphere, and we show two novel views rendered from our | optimized NeRF representation | scribu wrote: | > It needs almost a million views | | Not sure what you mean by "views". The comparisons in the | paper use at most 100 input images per scene. | byt143 wrote: | If you're only looking for one novel view, can it use less views | that are close to the novel one? | uoaei wrote: | This is absolutely stunning. | | As they say in ML, representation first -- and this is one of the | most natural and elegant ways to represent 3D scenes and | subjective viewpoints. Great that it works into a rendering | environment such that it's E2E differentiable. | | This is the first leap toward true high-quality real-time ML- | based rendering. I'm blown away. | lifeisstillgood wrote: | Well that took some effort just to work out what they actually | did. How they actually did it I have no idea. Impressive however | - a sort of fill in the blanks for the bits that are missing. If | our brains _dont_ do this one would be surprised. | | And we are all supposed to become AI developers this decade?! | | Come back Visual Basic all is forgiven :-) | [deleted] | raidicy wrote: | This blows my mind. This is probably a naive thought; This | technique looks like it could be combined with robotics to help | it navigate through its environment. | | I'd also like to see what it does when you give it multiple views | of scenes in a video game. Some from the direct pictures and some | from pictures of the monitor. | ssivark wrote: | Does anyone know how they do the "virtual object insertion" | demonstrated in the paper summary video? Can that be somehow done | on the network itself, or is that a diagnostic for scene accuracy | by performing SFM on network output? | theresistor wrote: | I'm pretty sure they're rendering a depth channel and | compositing it in. | teraflop wrote: | You could do that, but I think it's simpler to just introduce | additional objects during the raytracing process that | generates the images. That would produce accurate results | even with semitransparent objects, unlike compositing with an | depth buffer. ___________________________________________________________________ (page generated 2020-03-20 23:00 UTC)