[HN Gopher] SMERF: Streamable Memory Efficient Radiance Fields
       ___________________________________________________________________
        
       SMERF: Streamable Memory Efficient Radiance Fields
        
       We built SMERF, a new way for exploring NeRFs in real-time in your
       web browser. Try it out yourself!  Over the last few months, my
       collaborators and I have put together a new, real-time method that
       makes NeRF models accessible from smartphones, laptops, and low-
       power desktops, and we think we've done a pretty stellar job!
       SMERF, as we like to call it, distills a large, high quality NeRF
       into a real-time, streaming-ready representation that's easily
       deployed to devices as small as a smartphone via the web browser.
       On top of that, our models look great! Compared to other real-time
       methods, SMERF has higher accuracy than ever before. On large
       multi-room scenes, SMERF renders are nearly indistinguishable from
       state-of-the-art offline models like Zip-NeRF and a solid leap
       ahead of other approaches.  The best part: you can try it out
       yourself! Check out our project website for demos and more.  If you
       have any questions or feedback, don't hesitate to reach out by
       email (smerf@google.com) or Twitter (@duck).
        
       Author : duckworthd
       Score  : 258 points
       Date   : 2023-12-13 19:03 UTC (3 hours ago)
        
 (HTM) web link (smerf-3d.github.io)
 (TXT) w3m dump (smerf-3d.github.io)
        
       | sim7c00 wrote:
       | this looks really amazing. i have a relatively old smartphone
       | (2019) and its really surprisingly smooth and high fidently.
       | amazing job!
        
         | duckworthd wrote:
         | Thank you :). I'm glad to hear it! Which model are you using?
        
           | sim7c00 wrote:
           | samsung galaxy 10se
        
       | guywithabowtie wrote:
       | Any plans to release the models ?
        
         | duckworthd wrote:
         | The pretrained models are already available online! Check out
         | the "demo" section of the website. Your browser is fetching the
         | model when you run the demo.
        
           | ilaksh wrote:
           | Will the code be released, or an API endpoint? Otherwise it
           | will be impossible for us to use it for anything.. since it's
           | Google I assume it will just end up in a black hole like most
           | of the research.. or five years later some AI researchers
           | leave and finally create a startup.
        
       | zeusk wrote:
       | Are radiance fields related to Gaussian splattering?
        
         | duckworthd wrote:
         | Gaussian Splatting is heavily inspired by work in radiance
         | fields (or NeRF) models. They use much of the same technology!
        
         | corysama wrote:
         | Similar inputs, similar outputs, different representation.
        
       | aappleby wrote:
       | Very impressive demo.
        
         | duckworthd wrote:
         | Thank you!
        
       | refulgentis wrote:
       | This is __really__ stunning work, huge, huge, deal that I'm
       | seeing this in a web browser on my phone. Congratulations!
       | 
       | When I look at the NYC scene in the highest quality on desktop,
       | I'm surprised by how low-quality ex. the stuff on the counter and
       | shelves is. So then I load the lego model, and see that's _very_
       | detailed, so it doesn't seem inherent to the method.
       | 
       | Is it a consequence of input photo quality, or something else?
        
         | duckworthd wrote:
         | > This is __really__ stunning work
         | 
         | Thank you :)
         | 
         | > Is it a consequence of input photo quality, or something
         | else?
         | 
         | It's more a consequence of spatial resolution: the bigger the
         | space, the more voxels you need to maintain a fixed resolution
         | (e.g. 1 mm^3). At some point, we have to give up spatial
         | resolution to represent larger scenes.
         | 
         | A second limitation is the teacher model we're distilling. Zip-
         | NeRF (https://jonbarron.info/zipnerf/) is good, but it's not
         | _perfect_. SMERF reconstruction quality is upper-bounded by its
         | Zip-NeRF teacher.
        
       | jacoblambda wrote:
       | Is there a relatively easy way to apply these kinds of techniques
       | (either NeRFs or gaussian splats) to larger environments even if
       | it's lower precision? Like say small towns/a few blocks worth of
       | env.
        
         | duckworthd wrote:
         | In principle, there's no reason you can't fit multiple City
         | blocks at the same time with Instant NGP on a regular desktop.
         | The challenge is in estimating the camera and lens parameters
         | over such a large space. I expect such a reconstruction to be
         | quite fuzzy given the low space resolution.
        
         | ibrarmalik wrote:
         | You're under the right paper for doing this. Instead of one big
         | model, they have several smaller ones for regions in the scene.
         | This way rendering is fast for large scenes.
         | 
         | This is similar to Block-NeRF [0], in their project page they
         | show some videos of what you're asking.
         | 
         | As for an easy way of doing this, nothing out-of-the-box. You
         | can keep an eye on nerfstudio [1], and if you feel brave you
         | could implement this paper and make a PR!
         | 
         | [0] https://waymo.com/intl/es/research/block-nerf/
         | 
         | [1] https://github.com/nerfstudio-project/nerfstudio
        
       | barrkel wrote:
       | The mirror on the wall of the bathroom in the Berlin location
       | looks through to the kitchen in the next room. I guess the depth
       | gauging algorithm uses parallax, and mirrors confuse it, seeming
       | like windows. The kitchen has a blob of blurriness as the rear of
       | the mirror intrudes into kitchen, but you can see through the
       | blurriness to either room.
       | 
       | The effect is a bit spooky. I felt like a ghost going through
       | walls.
        
         | nightpool wrote:
         | The refigerator in the NYC scene has a very slick specular
         | lighting effect based on the angle you're viewing it from, and
         | if you go "into" the fridge you can see it's actually
         | generating a whole 3d scene with blurry grey and white colors
         | that turn out to precisely mimic the effects of the light from
         | the windows bouncing off the metal, and you can look "out" from
         | the fridge into the rest of the room. Same as the full-length
         | mirror in the bedroom in the same scene--there's a whole
         | virtual "mirror room" that's been built out behind the mirror
         | to give the illusion of depth as you look through it. Very cool
         | and unique consequence of the technology
        
           | pavlov wrote:
           | Wow, thanks for the tip. Fridge reflection world is so cool.
           | Feels like something David Lynch might dream up.
           | 
           | A girl is eating her morning cereal. Suddenly she looks
           | apprehensively at the fridge. Camera dollies towards the
           | appliance and seamlessly penetrates the reflective surface,
           | revealing a deep hidden space that exactly matches the
           | reflection. At the dark end of the tunnel, something stirs...
           | A wildly grinning man takes a step forward and screams.
        
           | daemonologist wrote:
           | Neat! Here are some screenshots of the same phenomenon with
           | the TV in Berlin: https://imgur.com/a/3zAA5K8
        
           | TaylorAlexander wrote:
           | Oh wow yeah. It's interesting because when I look at the
           | fridge my eye maps that to "this is a reflective surface",
           | which makes sense because that's true in the source images,
           | but then it's actually rendered as a cavity with appropriate
           | features rendered in 3D space. What's a strange feeling is to
           | enter the fridge and then turn around! I just watched
           | Hbomberguy's Patreon-only video on the video game Myst, and
           | in Myst the characters are trapped in books. If you choose
           | the wrong path at the end of the game you get trapped in a
           | book, and the view you get trapped in a book looks very
           | similar to the view from inside the NYC fridge!
        
           | deltaburnt wrote:
           | Mirror worlds are a pretty common effect you'll see in NeRFs.
           | Otherwise you would need a significantly more complex view
           | dependent feature rendered onto a flat surface.
        
           | chpatrick wrote:
           | This happens with any 3D reconstruction. It's because any
           | mirror is indistinguishable from a window into a mirrored
           | room. The tricky thing is if there's actually a something
           | behind the mirror as well.
        
         | Zetobal wrote:
         | It has exactly the same drawbacks as photogrammetry in regards
         | of highly reflective surfaces.
        
         | rzzzt wrote:
         | You can also get inside the bookcase for the ultimate Matthew
         | McConaughey experience.
        
       | promiseofbeans wrote:
       | It runs impressively well on my 2yo s21fe. It was super
       | impressive how it streamed in more images as I explored the
       | space. The tv reflections in the Berlin demo were super
       | impressive.
       | 
       | My one note is that it look a really long time to load all the
       | images - the scene wouldn't render until all ~40 initial images
       | loaded. Would it be possible to start partially rendering as the
       | images arrive, or do you need to wait for all of them before you
       | can do the first big render?
        
         | duckworthd wrote:
         | Pardon our dust: "images" is a bad name for what's being
         | loaded. Past versions of this approach (MERF) stored feature
         | vectors in PNG images. We replace them with binary arrays.
         | Unfortunately, all such arrays need to be loaded before the
         | first frame can be rendered.
         | 
         | You do however point out one weakness of SMERF: large payload
         | sizes. If we can figure out how to compress them by 10x, it'll
         | be a very different experience!
        
       | VikingCoder wrote:
       | Wow. Some questions:
       | 
       | Take for instance the fulllivingroom demo. (I prefer fps mode.)
       | 
       | 1) How many images are input?
       | 
       | 2) How long does it take to compute these models?
       | 
       | 3) How long does it take to prepare these models for this
       | browser, with all levels, etc?
       | 
       | 4) Have you tried this in VR yet?
        
         | vyrotek wrote:
         | Not exactly what you asked for. But I recently came across this
         | VR example using Gaussian Splatting instead. Exciting times.
         | 
         | https://twitter.com/gracia_vr/status/1731731549886787634
         | 
         | https://www.gracia.ai
        
         | duckworthd wrote:
         | Glad you liked our work!
         | 
         | 1) Around 100-150 if memory serves. This scene is part of the
         | mip-NeRF 360 benchmark, which you can download from the
         | corresponding project website:
         | https://jonbarron.info/mipnerf360/
         | 
         | 2) Between 12 and 48 hours, depending on the scene. We train on
         | 8x V100s or 16x A100s.
         | 
         | 3) The time for preparing assets is included in 2). I don't
         | have a breakdown for you, but it's something like 50/50.
         | 
         | 4) Nope! A keen hacker might be able to do this themselves by
         | editing the JavaScript code. Open your browser's DevTools and
         | have a look -- the code is all there!
        
           | dougmwne wrote:
           | Do you need position data to go along with the photos or just
           | the photos?
           | 
           | For VR, there's going to be some very weird depth data from
           | those reflections, but maybe they would not be so bad when
           | you are in headset.
        
       | durag wrote:
       | Any plans to do this in VR? I would love to try this.
        
         | duckworthd wrote:
         | Not at the moment but an intrepid hacker could surely extend
         | our JavaScript code and put something together.
        
       | blovescoffee wrote:
       | Since you're here @author :) Do you mind giving a quick rundown
       | on how this competes with the quality of zip-nerf?
        
         | duckworthd wrote:
         | Check out our explainer video for answers to this question and
         | more! https://www.youtube.com/watch?v=zhO8iUBpnCc
        
       | heliophobicdude wrote:
       | Great work!!
       | 
       | Question for the authors, are there opportunities, where they
       | exist, to not use optimization or tuning methods for
       | reconstructing a model of a scene?
       | 
       | We are refining efficient ways of rendering a view of a scene
       | from these models but the scenes remain static. The scenes also
       | take a while to reconstruct too.
       | 
       | Can we still achieve the great look and details of RF and GS
       | without paying for an expensive reconstruction per instance of
       | the scene?
       | 
       | Are there ways of greedily reconstructing a scene with
       | traditional CG methods into these new representations now that
       | they are fast to render?
       | 
       | Please forgive any misconceptions that I may have in advanced! We
       | really appreciate the work y'all are advancing!
        
         | duckworthd wrote:
         | > Are there opportunities, where they exist, to not use
         | optimization or tuning methods for reconstructing a model of a
         | scene?
         | 
         | If you know a way, let me know! Every system I'm aware of
         | involves optimization in one way or another, from COLMAP to 3D
         | Gaussian Splatting to Instant NGP and more. Optimization is a
         | powerful workhorse that gives us a far wider range of models
         | than a direct solver ever could. > Can we still achieve the
         | great look and details of RF and GS without paying for an
         | expensive reconstruction per instance of the scene?
         | 
         | In the future I hope so. We don't have a convincing way to
         | generate 3D scenes yet, but given the progress in 2D, I think
         | it's only a matter of time.
         | 
         | > Are there ways of greedily reconstructing a scene with
         | traditional CG methods into these new representations now that
         | they are fast to render?
         | 
         | Not that I'm aware of! If there were, I think these works
         | should be on the front page instead of SMERF.
        
       | annoyingnoob wrote:
       | There is a market here for Realtors to upload pictures and
       | produce walk-throughs of homes for sale.
        
         | esafak wrote:
         | https://matterport.com/
        
         | ibrarmalik wrote:
         | The Luma folks made something similar:
         | https://apps.apple.com/app/luma-flythroughs/id6450376609?l=e...
        
       | SubiculumCode wrote:
       | Im not sure why this demo runs so horribly in Firefox but not
       | other browsers..anyone else having this?
        
         | daemonologist wrote:
         | Runs pretty well (20-100 fps depending on the scene) for me on
         | both Firefox 120.1.1 on Android 14 (Pixel 7; smartphone preset)
         | and Firefox 120.0.1 on Fedora 39 (R7 5800, 64 GB memory, RX
         | 6600 XT; 1440p; desktop preset).
        
           | SubiculumCode wrote:
           | It seems that for some reason, my firefox is stuck in
           | software compositor. I am getting:
           | 
           | WebRender initialization failed Blocklisted; failure code
           | RcANGLE(no compositor device for EGLDisplay)(Create)_FIRST
           | 3D11_COMPOSITING runtime failed Failed to acquire a D3D11
           | device Blocklisted; failure code
           | FEATURE_FAILURE_D3D11_DEVICE2
           | 
           | I'm running a 3060
        
       | jerpint wrote:
       | Just ran this on my phone through a browser, this is very
       | impressive
        
         | duckworthd wrote:
         | Thank you :)
        
       | catskul2 wrote:
       | When might we see this in consumer VR? I'm surprised we don't
       | already but I was suspecting it was a computation constraint.
       | 
       | Does this relieve the computation constraint enough to run on
       | Quest 2/3?
       | 
       | Is there something else that would prevent binocular use?
        
         | doctoboggan wrote:
         | I recently got a new quest and I am wondering the same thing.
         | The fact that this is currently running in a browser (and can
         | run on a mobile device) gives me hope that we will see
         | something like this in VR sooner rather than later.
        
         | duckworthd wrote:
         | I can't predict the future, but I imagine soon: all of the
         | tools are there. The reason we didn't develop for VR is
         | actually simpler than you'd think: we just don't have the
         | developer time! At the end of the day, only a handful of people
         | actively wrote code for this project.
        
       | nox100 wrote:
       | memory efficient? It downloaded 500meg!
        
         | bongodongobob wrote:
         | A. Storage isn't memory
         | 
         | B. That's hardly anything in 2023.
        
           | duckworthd wrote:
           | Right-o. The web viewer is swapping assets in and out of
           | memory as the user explores the scene. The Network and disc
           | requirements are high but memory usage is low.
        
       | monlockandkey wrote:
       | Get this on a VR headset and you have a game changer literally.
        
       | modeless wrote:
       | How long until you can stitch Street View into a seamless
       | streaming NeRF of every street in the world? I hope that's the
       | goal you're working towards!
        
         | duckworthd wrote:
         | ;)
        
           | modeless wrote:
           | Haha, too bad the Earth VR team was disbanded because that
           | would be the Holy Grail. If someone can get the budget to
           | work on that I'd be tempted to come back to Google just to
           | help get it done! It's what I always wanted when I was
           | building the first Earth VR demo...
        
         | deelowe wrote:
         | I read another article talking about what waymo was working on
         | and this looks oddly similar... My understanding is that the
         | goal is to use this to reconstruct 3d models of street view
         | images in real time.
        
       | yarg wrote:
       | What I'm seeing from all of these things is very accurate single
       | navigable 3D images.
       | 
       | What I haven't seen anything of is feature and object detection,
       | blocking and extraction.
       | 
       | Hopefully a more efficient and streamable codec necessitates the
       | sort of structure that lends itself more easily to analysis.
        
       | fngjdflmdflg wrote:
       | >Google DeepMind Google Research Google Inc.
       | 
       | What a variety of groups! How did this come about?
        
       | tomatotomato31 wrote:
       | I'm following this through two minutes paper and I'm looking
       | forward to using it.
       | 
       | My grandpa died 2 years ago and in hindsight I took pictures for
       | using them as in your demo.
       | 
       | Awesome thanks:)
        
         | duckworthd wrote:
         | It would be my dream to make capturing 3D memories as easy and
         | natural as taking a 2D photos with your smartphone today.
         | Someday!
        
       | twelfthnight wrote:
       | Hope this doesn't come as snarky, but does Google pressure
       | researchers to do PR in their papers? This really is cool, but
       | there is a lot of self-promotion in this paper and very little
       | discussion of limitations (and the discussion of them is
       | bookended by qualifications why they really aren't limitations).
       | 
       | It makes it harder for me to trust the paper if I feel like the
       | paper is trying to persuade me of something rather than describe
       | the complete findings.
        
         | tomatotomato31 wrote:
         | People are not allowed to be proud of their work anymore?
        
       | yieldcrv wrote:
       | I had read about a competing technology that was suggesting
       | NeRF's were a dead end
       | 
       | but perhaps that was biased?
        
       | rzzzt wrote:
       | What kind of modes does the viewer cycle through when I press the
       | space key?
        
       ___________________________________________________________________
       (page generated 2023-12-13 23:00 UTC)