[HN Gopher] S2: Computational geometry and spatial indexing on t...
       ___________________________________________________________________
        
       S2: Computational geometry and spatial indexing on the sphere
        
       Author : ofou
       Score  : 90 points
       Date   : 2022-03-13 14:27 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jwuphysics wrote:
       | Astronomers have developed an indexing strategy using
       | Hierarchical, Equal Area, and iso-Latitude Pixelisation
       | (HEALPix): https://arxiv.org/abs/astro-ph/9905275
        
       | durkie wrote:
       | i was just starting a project and considering the use of s2. i'm
       | going with geohash instead because I would get different values
       | for the s2 cell based on implementation.
       | 
       | ruby:
       | 
       | S2Cells::S2LatLon.new(34.1234, -84.1234).to_s2_id(30)
       | 
       | => -8577780315821624425
       | 
       | S2Cells::S2LatLon.new(34.1234, -84.1234).to_s2_id(1)
       | 
       | => -8358680908399640576
       | 
       | https://gojekfarm.github.io/s2-calc/ agrees with these values
       | 
       | python's s2sphere (from sidewalk labs, seems like it'd be more of
       | a reference implementation) gives very different numbers:
       | 
       | s2sphere.Cell.from_lat_lng(s2sphere.LatLng.from_degrees(34.1234,
       | -84.1234))
       | 
       | => Cell: face 4, level 30, orientation 0, id 9868963757887927191
       | 
       | This discrepancy feels like it'd be some kind of signed/unsigned
       | integer issue, but I didn't think that would be a thing in Ruby.
       | It also doesn't seem like the level=1 cell value in Ruby would be
       | such a large, specific number. I'm missing something.
        
         | kevinventullo wrote:
         | For what it's worth, at least in the case you described, it is
         | indeed a case of signed vs unsigned representations. That is,
         | the numbers
         | 
         | -8577780315821624425
         | 
         | and
         | 
         | 9868963757887927191
         | 
         | have the exact same underlying 64-bit representation, which is
         | all that matters for S2 (you can verify this by checking their
         | difference is exactly 2^64). The former is just interpreting
         | the 64-bit string as a signed int, while the latter is
         | unsigned.
         | 
         | Separately, the level 1 number you mention
         | 
         | -8358680908399640576
         | 
         | is fairly simple if you look at its binary representation;
         | almost all of its trailing digits are 0.
         | 
         | Basically, you really shouldn't think of S2 values as integers,
         | but more as their underlying length 64 bit string.
        
           | durkie wrote:
           | thank you!
        
         | random4726427 wrote:
         | S2 cell IDs are usually uint64, so the Ruby version looks
         | wrong. If you cast that int64 to uint64 you get the same value
         | as the python version.
         | 
         | Level 1 IDs look like large random numbers in decimal, but when
         | viewed in binary they have many trailing zeros, that are used
         | to truncate to make the tokens.
        
           | durkie wrote:
           | thanks!
        
         | panarky wrote:
         | "The reference implementation of the S2 library is written in
         | C++. Versions in other languages are ported from the C++ code,
         | and may not have the same robustness, performance, or features
         | as the C++ version."
         | 
         | From https://s2geometry.io/about/overview
        
           | durkie wrote:
           | yes, that's the true "reference" implementation, i just
           | figured a repo from sidewalk labs (a google company) would be
           | maybe more authoritative.
        
       | Kelteseth wrote:
       | I'm wondering where this is used at Google and how this compares
       | to GDAL?
        
         | monktastic1 wrote:
         | It's used everywhere in Maps (of course), including where you'd
         | expect there to be lat/lngs.
        
         | mistrial9 wrote:
         | GDAL binary [0] uses a traditional C/C++ linked library
         | architecture, and is extensible with plugin drivers; the memory
         | model is therefore "local". A cloud architecture uses
         | distributed resources.
         | 
         | Secondly, GDAL is tightly bound to Proj [1] for reprojecting
         | spatial coordinates data from one spatial reference system
         | (SRS) to another. The "sphere" libraries imply no generalized
         | spatial reference, instead only a single spherical one.
         | 
         | [0] https://r-spatial.org/images/gis3.png
         | 
         | [1] https://proj.org/
        
           | jofer wrote:
           | More generally, gdal is a raster I/O library. S2 is a point
           | only system. It's not meant to store or work with raster data
           | in any way. (Sure, you can represent raster as points, but
           | it's hideously inefficient to do so.)
           | 
           | Basically, you shouldn't ever be choosing between the two. If
           | you're thinking of using a representation system like S2 for
           | raster data (e.g. images or anything else on a regular grid),
           | rethink things a bit.
        
       | PLenz wrote:
       | As a geographer turned DS it is my duty to remind CS people that
       | all geo methods are compromises. No system, s2, h3, mgrs, even
       | lat lon are perfect for every project and you must think
       | carefully about what you are doing and pick your geo
       | representation based on your givens and druthers.
        
         | [deleted]
        
         | monkeybutton wrote:
         | If you had to represent arbitrary plots of land as fixed sized
         | vectors for a ML model, what would you use?
        
           | s2mcallis wrote:
           | S2 wouldn't be a bad choice, it lets you compute "coverings"
           | of arbitrary regions as S2 cells, with variable resolution.
           | Fixed size is trickier but is probably doable, especially if
           | you're allowed to null out unused cells. Check out
           | https://s2.sidewalklabs.com/regioncoverer/
        
           | PLenz wrote:
           | Depends on where the land is and size and shape of the
           | polygons are
        
             | monkeybutton wrote:
             | Let's say the land is confined to north America, the size
             | can vary from an entire state to a single zip code (I know
             | zips are logical addressing, not geological, but I have
             | what I have), and the shape is unrestricted so it could be
             | non-convex. I suppose one could convert various kinds of
             | areas (states, cities, boroughs, ...) to lists of zipcodes
             | contained within and OHE them but I feel like that would be
             | the _worst_ solution.
        
       | Jabbles wrote:
       | > Disclaimer ... This is not an official Google product.
       | 
       | What _is_ an official Google product? What does it mean to be
       | one? What should I understand from the presence or absence of
       | this disclaimer? Probably some sort of SLO, but what precisely?
       | 
       | Tensorflow doesn't have this disclaimer:
       | https://github.com/tensorflow/tensorflow
        
         | daniel-thompson wrote:
         | > What does it mean to be one?
         | 
         | That it will eventually get killed.
         | 
         | > What should I understand from the presence or absence of this
         | disclaimer?
         | 
         | That it will probably last longer. Open-sourcing helps here too
         | obviously.
        
       | rajman187 wrote:
       | I've made use of Google's S2 library across a couple different
       | jobs for various use cases. The C++ bindings are well-documented
       | and very fast, in fact I found writing my own Python bindings
       | allowed a few faster operations than using the Python library
       | directly, for example generating a spanning set of cell IDs.
       | 
       | I find it liberating to not rely on a backend implementation of
       | geospatial indexing, or rather being agnostic to backend, simply
       | indexing very efficiently the 64-bit integers.
       | 
       | The hierarchical nature of the Hilbert curves means that the
       | child cells contain the ID of the parent, and also fit perfectly
       | into the parent. This is in contrast with H3 where you
       | necessarily cannot fit the 7 child hexagons into the parent
       | without some overlap (see for example the logo on the H3 website
       | https://eng.uber.com/h3/)
        
         | rajman187 wrote:
         | As a fun experiment, I wanted to use the library in Rust. While
         | there isn't an official Rust port, I just used the library
         | along with CXX (https://github.com/dtolnay/cxx) with a simple
         | geospatial struct.
        
       | davidmurdoch wrote:
       | Pokemon Go used (uses?) S2 for its coordinate system. I used it
       | in a (now dead) location based mobile game and being able to
       | locate proximal s2 cells at varying granularities was really
       | cool.
        
       | maxpert wrote:
       | IDK if people have seen or tried H3. It has worked really well
       | for me in past https://eng.uber.com/h3/
        
         | secondcoming wrote:
         | I know of at least one company that uses it. They seem happy
         | with it.
        
         | gregcoombe wrote:
         | H3 has the significant downside that children nodes do not fit
         | within parent nodes. So one needs to be very careful working
         | within any sort of multi-resolution algorithm (the most
         | effective uses I've seen are with a fixed resolution). S2 does
         | not suffer from this.
        
       | timjulien wrote:
       | We use S2 extensively in our infrastructure, and open-sourced our
       | Node.js TypeScript bindings for Google's reference C++ lib:
       | https://github.com/radarlabs/s2. (More context about how and why
       | here: https://radar.com/blog/open-source-node-js-
       | typescript-s2-lib...)
        
       | ducktective wrote:
       | This assumes the earth model to be sphere. GeographicLib [1],
       | however, works according to WGS84 earth model which is an
       | ellipsoid.
       | 
       | Anyone knows why they used a sphere model and why they didn't
       | just use GeographicLib (MIT license)?
       | 
       | [1]: https://geographiclib.sourceforge.io/
        
         | tantalor wrote:
         | > Why not project onto an ellipsoid? (The Earth isn't quite
         | ellipsoidal either, but it is even closer to being an ellipsoid
         | than a sphere.) The answer relates to the other goals stated
         | above, namely performance and robustness. Ellipsoidal
         | operations are still orders of magnitude slower than the
         | corresponding operations on a sphere. Furthermore, robust
         | geometric algorithms require the implementation of exact
         | geometric predicates that are not subject to numerical errors.
         | While this is fairly straightforward for planar geometry, and
         | somewhat harder for spherical geometry, it is not known how to
         | implement all of the necessary predicates for ellipsoidal
         | geometry.
         | 
         | http://s2geometry.io/about/overview
        
       | rurban wrote:
       | (2019) please.
       | 
       | Last big discussion 2018:
       | https://news.ycombinator.com/item?id=17849546
        
         | hobofan wrote:
         | I don't think GitHub projects are generally given a year tag,
         | as they evolve over time.
        
       ___________________________________________________________________
       (page generated 2022-03-13 23:01 UTC)