[HN Gopher] Real-Time Coherent 3D Reconstruction from Monocular ... ___________________________________________________________________ Real-Time Coherent 3D Reconstruction from Monocular Video Author : samber Score : 88 points Date : 2022-03-14 17:50 UTC (5 hours ago) (HTM) web link (zju3dv.github.io) (TXT) w3m dump (zju3dv.github.io) | stefan_ wrote: | Looked cool, then I read that there is some Apple ARkit magic | blackbox in the middle of it all. | cmelbye wrote: | I don't think that's true. The paper says that a camera pose | estimated by a SLAM system is required. ARKit implements SLAM | and can easily provide camera pose for each frame through the | ARFrame class. But there are countless other implementations of | SLAM, including Android ARCore, Oculus Quest, Roomba, self- | driving cars, and a number of GitHub repos | (https://github.com/tzutalin/awesome-visual-slam). | fxtentacle wrote: | Yeah, I also consider that odd to use LIDAR-based poses and | then call it "monocular". | spyder wrote: | It's not exactly the same but Neural Radiance Fields are getting | more impressive: | | First one was this but it was slow: | https://www.matthewtancik.com/nerf | | Then it's got faster: https://www.youtube.com/watch?v=fvXOjV7EHbk | | Lot of interesting papers: | | https://github.com/yenchenlin/awesome-NeRF | alhirzel wrote: | Does anyone know what the state of the art is for doing this type | of reconstruction as a streaming input to detection and | recognition algorithms? For instance, this could be used for | object detection and identification on a recycling conveyor line. | beambot wrote: | I don't believe that either does reconstruction... but for the | recycling application, there are a handful of companies | tackling this problem -- e.g. Everest Labs & Amp Robotics. | leobg wrote: | So much for the folks who think Tesla is on a fool's errand when | they're using cameras instead of LIDAR. | kajecounterhack wrote: | Companies like Waymo and Cruise use this kind of technology | too. Unfortunately there are tons of corner cases of weird | things you haven't seen before -- for example, some special | vehicles self-occlude and you never get enough coverage to | observe them correctly until you're too close. In general, | radars and lidars used in _conjunction_ with cameras can handle | occluded objects much better. | | Also, to measure the performance / evaluate observations | generated from this tech, you would want to compare it to a | pretty sizable 3D ground truth set which Tesla does not | currently have. There are pretty big advantages to starting | with a maximal set of sensors even if (eventually) | breakthroughs turn them into unnecessary crutches. | leobg wrote: | That was very insightful. Do you work in that space? It is | comments like yours that make HN a special place. | ceejayoz wrote: | The failure mode (for example: decapitation; | https://www.latimes.com/business/la-fi-tesla-florida- | acciden...) is pretty significant when used in a Tesla. Less so | in this tech demo. | billconan wrote: | can ARkit return accurate camera position? | upbeat_general wrote: | I haven't looked at any metrics but based on using ARKit | applications (and various VIO SLAM implementations) it can but | it depends highly on the scene, camera motion, and whether | there is LIDAR/Stereo Depth. | AndrewKemendo wrote: | Honestly this doesn't look any better than what we were doing | back in 2016-2017. I'm not sure what's novel here. | | This is the only video I could find, but we were doing monocular | reconstruction from a limited number of RGB (not depth) images | AND doing voxel segmentation on the processing side. | https://www.youtube.com/watch?v=nqy44VSWh3g | | Even as far back as 2010 people were doing reasonable monocular | reconstruction including software like meshroom etc...the whole | of TU Munich also under Matthias Niessner has been doing this for | a while. | | What's novel here? | tintor wrote: | Fast enough to be used for mobile robots? | nobbis wrote: | Their research doesn't just integrate depth maps into a TSDF - | it uses NN's to incorporate surface priors. | | I don't recall you having similar real-time meshing | functionality in 2016-2017, Andrew. Can you show what you had? | | As far as I'm aware, Abound was the first to demo real-time | monocular mobile meshing: on Android in early 2017 (e.g. | https://www.youtube.com/watch?v=K9CpT-sy7HE), and iOS in early | 2018 (e.g. | https://twitter.com/nobbis/status/972298968574013440). | pj_mukh wrote: | Looks like a much better response to white walls/texture less | surfaces. | fxtentacle wrote: | This is a paper about a new way of storing/merging 3D data. | | The actual 3D reconstruction is so-so, I agree. And they kinda | cheat by using ARKit (which uses LIDAR internally) to get good | camera poses even if there is little texture. | | So the novel part here is that they can immediately merge all | the images into a coherent representation of the 3D space, as | opposed to first doing bundle adjustment, then doing pairwise | depth matching, then doing streak-based depth matching, and | then merging the resulting point clouds. | | Also, they can use learned 3D shape priors to improve their | results. Basically that means "if there is no visible gap, | assume the surface is flat". But AFAIK, that's not new. | | EDIT: My main criticism of this paper after looking at the | source code a bit would be that due to the TSDF, which is like | a 3D voxel grid, they need insane amounts of GPU memory, or | else the scenes either need to be very small or low resolution. | That is most likely also the reason why they reconstruction | looks so cartoon-like and is smooth on all corners: They lack | the memory to store more high-frequency details. | | EDIT2: Mainly, it looks like they managed to reduce the GPU | memory consumption of Atlas [1] which is why they can | reconstruct larger areas and/or higher resolution. But it's | still far less detail than Colmap [2]. | | [1] https://github.com/magicleap/Atlas | | [2] https://colmap.github.io/ | closetnerd wrote: | Says it's real-time | AndrewKemendo wrote: | 2016 from Matthias Niessner's group | | https://www.youtube.com/watch?v=keIirXrRb1k | | http://graphics.stanford.edu/projects/bundlefusion/ | jonas21 wrote: | That requires depth input | AndrewKemendo wrote: | Good point, I don't recall offhand the paper that was the | mono-RT one. | | At a minimum though 6D.ai and a few others had companies | that were selling this as a service at least as far back | as 2017. | fxtentacle wrote: | I always found ORB-SLAM2 pretty impressive, which can map 3D | neighborhoods in realtime while you drive around in a car: | | https://www.youtube.com/watch?v=ufvPS5wJAx0 | | https://www.youtube.com/watch?v=3BrXWH6zRHg | polishdude20 wrote: | Shame there's no Android or iPhone app available | adampk wrote: | I am surprised that the team didn't choose to add the "Fusion" | append at the end. | | This seems to fit into the genealogy of KinectFusion, | ElasticFusion, BundleFusion, etc. | | https://www.microsoft.com/en-us/research/wp-content/uploads/... | https://www.imperial.ac.uk/dyson-robotics-lab/downloads/elas... | https://graphics.stanford.edu/projects/bundlefusion/ | | Very impressive work. I have not seen any use cases for online 3D | reconstruction unfortunately. 6D.ai made terrific progress in | this tech but also could not find great use cases for online | reconstruction and ended up having to sell to Niantic. | | Seems like what people want, if they want 3D reconstruction, is | extremely high-fidelity scans (a la Matterport) and are willing | to wait for the model. Unfortunately TSDF approach create a | "slimey" end look which isn't usually what people are after if | they want an accurate 3D reconstruction. | | It SEEMS like _online_ 3D reconstruction would be helpful, but I | have yet to see a use case for "online"... | [deleted] | tintor wrote: | Use case: Mobile robotics, lidar replacement in self-driving | vehicles | tonyarkles wrote: | I'm very curious to see how well this would work for online | terrain reconstruction. I've got a drone with a pretty powerful | onboard computer and it's always nice to be able to solve and | tune problems with software instead of additional (e.g. LIDAR) | hardware. ___________________________________________________________________ (page generated 2022-03-14 23:00 UTC)