[HN Gopher] Serving Dynamic Vector Tiles from PostGIS
       ___________________________________________________________________
        
       Serving Dynamic Vector Tiles from PostGIS
        
       Author : liotier
       Score  : 104 points
       Date   : 2020-01-02 16:36 UTC (6 hours ago)
        
 (HTM) web link (info.crunchydata.com)
 (TXT) w3m dump (info.crunchydata.com)
        
       | crmrc114 wrote:
       | I have a pal who does GIS work in the oil and gas industry- I
       | think its crazy how much influence ESRI has on that market. Would
       | love to learn more about interaction with map data like this.
       | 
       | For a non-gis person this was a fun read. So thanks for the post!
        
         | bransonf wrote:
         | I have the same sentiment on ESRI. It's basically all I got
         | taught in my university courses, but it's not what I ever want
         | to use.
         | 
         | It's crazy that a privately held company holds like a third of
         | the market share.
         | 
         | And personally, I don't think their software is that good. I
         | find their documentation to be undesirable and their solutions
         | to be strict.
         | 
         | Case in point, the geodatabase (gdb) standard is purposefully
         | meant to obfuscate the data within. No one has ever been able
         | to explain to me why this is, and the standard has been open
         | sourced by now.
         | 
         | Not to mention, the number of times I've had ArcMap crash
         | without any helpful information as to why it crashed...
         | 
         | That said, ArcMap is the Excel of GIS. It captured market share
         | (especially government contracts) two or three decades ago and
         | no one has disrupted the desktop GIS platform. On the web
         | front, however, I see companies like MapBox far outpacing
         | anything ESRI is capable of yet.
         | 
         | And to anyone looking to learn GIS: Post GIS, GDAL and any
         | scripting language will make you more powerful than most of the
         | people I know within the field.
        
           | trynewideas wrote:
           | oblig. mention for others outside of the ESRI sphere of
           | influence that QGIS exists, is still FOSS, and still actively
           | developed https://qgis.org/en/site/
        
           | sleavey wrote:
           | > And to anyone looking to learn GIS: Post GIS, GDAL and any
           | scripting language will make you more powerful than most of
           | the people I know within the field.
           | 
           | Funny this article was posted today, because yesterday I was
           | looking into rendering a custom map for a ~100x100 km area
           | from OpenStreetMap data for a particular application. I've
           | got basically no experience making maps but I've dabbled with
           | GDAL and Rasterio. I was thinking of using Mapnik with a dump
           | of (part of) the OpenStreetMap database into a local PostGIS
           | instance. Ideally the rendered tiles should be vector format.
           | Do you think this approach seems reasonable or am I missing a
           | potentially simpler way?
        
             | bransonf wrote:
             | Are you trying to render a static map or make something
             | interactive?
             | 
             | If this is static, Mapnik is a good call. It has some extra
             | anti-aliasing under the hood and it's exceptionally fast.
        
               | sleavey wrote:
               | Static; thanks for the info. I would ideally like to dump
               | a bunch of SVG tiles for various zoom levels so I can
               | store them in a static directory on my server rather than
               | serve them dynamically. I take it that Mapnik is capable
               | of dumps like this? And, I would like to use the Python
               | bindings but they look relatively badly documented. Would
               | you suggest a newbie like me uses the C or XML interfaces
               | instead, if they are better documented?
        
               | bransonf wrote:
               | I've only used the Python library, and I think I
               | survived. I only used it for some visualization however.
               | 
               | And I'm still unclear. Are you trying to serve these
               | tiles to another application? Or are you trying to make a
               | digital/print map?
               | 
               | And SVG is probably excessive for static tiles. It won't
               | have the size reduction of a raster tile nor the benefit
               | of a true vector solution.
        
               | sleavey wrote:
               | > And I'm still unclear. Are you trying to serve these
               | tiles to another application? Or are you trying to make a
               | digital/print map?
               | 
               | Serve them from local storage to a viewer application.
               | 
               | > And SVG is probably excessive for static tiles. It
               | won't have the size reduction of a raster tile nor the
               | benefit of a true vector solution.
               | 
               | Ah, is there another vector format for tiles other than
               | SVG? Or are you saying that I should just generate a
               | bunch of compressed rasters?
        
               | bransonf wrote:
               | Ah, I haven't really got into that territory with mapnik.
               | But to the second point, yes you should just generate a
               | bunch of Raster tiles. And before doing this, ask
               | yourself if you really need to.
               | 
               | If this isn't a huge project, Mapbox is an easy packed
               | solution. Otherwise, there are dozens of really good tile
               | providers already.
        
           | aldoushuxley001 wrote:
           | QGIS does an admirable job competing with ArcGIS.
        
       | flippmoke wrote:
       | Author of Mapbox's Vector Tile specification here and also
       | contributor to some of the code that is used by PostGIS and I
       | wanted to add some additional clarity on some topics associated
       | with Vector Tiles and dynamic serving of them that seems to be a
       | new trend.
       | 
       | The Vector Tiles specification was designed for map visualization
       | but has expanded into other uses as well, but in general the
       | purpose is to be able to quickly provide a complete subset of
       | data for a specific area that is highly cacheable. Most of this
       | provided speed and cache-ability is specifically gained by
       | preprocessing all the data you will use in your map into tiles.
       | 
       | The general steps for turning raw data into Vector Tiles are:
       | 
       | 1. Determine a hierarchy of your data. For example if you are
       | talking about roads at some zoom levels you will want to see only
       | highways or major roads while at other zoom levels you will want
       | all your data.
       | 
       | 2. For each tile at each zoom level; Select your data following
       | your hierarchy rules, simplify your data based on your zoom level
       | (for example you might need less points to display your road) and
       | then clip your data to your tile and encode it to your Vector
       | Tile.
       | 
       | The problem is that doing these steps is often very complex and
       | requires thought about the cartography of your final resulting
       | map, but it can also drastically effect performance. If you are
       | dynamically serving tiles from PostGIS it is very hard to reduce
       | large quantities of data quickly in some cases. For example take
       | a very detailed coastline of a large lake that is very precise
       | and you are wanting to serve this dynamically. If you are
       | attempting to serve this data on demand each time you need a tile
       | you have to simplify and clip a potentially massive polygon.
       | While this might work for single requests, if you increase in
       | scale this quickly adds lots of load to a PostGIS server. The
       | only solution is to cache the resulting tiles for a longer period
       | to limit load on your database or to preprocess all your data
       | before serving.
       | 
       | Preprocessing of all the tiles is already something other tiling
       | tools such as tippecanoe are really good at doing and comes with
       | the benefit of helping you determine a hierarchy for your data.
       | Preprocessing might seem excessive when it comes to making
       | potentially millions of tiles, but in general it makes your
       | application faster because it is simply serving an already
       | created tile.
       | 
       | Therefore, if your data does not very change quickly I would
       | almost always suggest using preprocessing over dynamic rendering
       | of tiles. You might spend more effort maintaining something than
       | you expect if you start using PostGIS to create tiles on demand
       | over existing tiling tools.
        
         | durkie wrote:
         | Very good comment and thanks for your work on MVT. I use
         | PostGIS's MVT tools on a daily basis.
         | 
         | I do an intermediate approach: my queries are sometimes too
         | expensive to run dynamically, and my data change semi-
         | frequently (daily/weekly basis), but when they do change I have
         | a clear idea of what tiles are affected. So any time my data
         | needs updating I can mark tiles as stale and then I have a
         | sidekiq job that processes them and uploads them to S3. The
         | tile server itself pulls from S3.
         | 
         | This is probably not quite as fast as a dedicated tile server,
         | but it's far more reliable/responsive than dynamic rendering
         | and reduces load spikes on the database.
        
         | hkchad wrote:
         | So I saw this post earlier to day and tried it on a dataset we
         | have (fixed boundaries w/ some properties that change 4x/hr).
         | We use the value of the properties for styling of the vector
         | tiles. Currently the tiles are re-rendered every 4hrs (even
         | though the data is updated every 15 min) using tippiecanoe,
         | served by tileserver-gl and cached in cloudfront. So I wanted a
         | way to get new data to users faster. But as you have noted this
         | dynamic process crunchy posted IS SLOW, it takes about 3
         | minutes to paint the world on my brand new macbook pro (about 3
         | seconds w/ pre-rendered). Given the country boundaries do not
         | change very often is there a way to change just the properties
         | that actually needed updated in the already rendered vector
         | tiles? Our pipeline takes about 45 min to run completely to
         | regenerate the new tiles with updated properties. Or is there a
         | better way to present this data? We started out w/ GeoJSON
         | directly from the DB but the size of the files were huge, the
         | vector tiles are 30% the size of GeoJSON. We were in the MTS
         | private beta but they didn't have the 'update' process worked
         | out yet so it was a full refresh each time.
        
       | anonymousCmntr wrote:
       | I use this project for serving vector tiles with PostGIS and
       | Django REST.
       | 
       | https://github.com/corteva/djangorestframework-mvt
        
       | jokoon wrote:
       | I wish I could learn to build my own tiles, vector or PNG. I
       | don't really understand where the data comes from, how is data
       | gathered and assembled.
       | 
       | I'm also really curious about the choices involving the zoom
       | level, how do you decide to render things depending on the zoom
       | level, when is data discarded, to have good detail or better
       | performance and lighter tiles. I would really be willing to try
       | build lighter maps so I can have my own mapping software on a
       | desktop machine.
       | 
       | The data sizes and hardware requirements involved are generally
       | pretty big. It could be interesting to see how much details one
       | could achieve to make a "portable" map browser when limiting the
       | data size to 2GB, 5GB or 10GB.
       | 
       | I would really like to ask why, on some mapping software, you
       | can't see names of large enough cities/places/regions that are
       | obviously there. It often makes it difficult to browse maps
       | properly.
        
         | bransonf wrote:
         | I'll try my best to explain the process.
         | 
         | The data comes from places like the Census Bureau (roads, place
         | names) and then a lot of it has to be collected by the like of
         | OpenStreetMap/Google/Other Providers. (GIS Data is big
         | business)
         | 
         | For Vector based approaches (See mapbox) these data are stored
         | in special built databases and usually simplified geometries
         | are served to the browser. The benefit is continuous zoom, but
         | the pitfall is more server side computation and hence cost.
         | 
         | Because of the cost/compute, raster tiles (PNG, jpg, any pixel
         | format) have been much more popular. These start the same, you
         | collect all these data and put them in a database. The
         | difference is the added step of rendering tiles. This one-off
         | computation saves you work from then on. See maps.stamen.com
         | for an example of tiles made from OSM data.
         | 
         | And you're right about place names sometimes not being
         | apparent. This is a trade off when using open data and auto
         | generated tiles. With something like MapBox's vector tiles, you
         | have individual decimal level control of things like labels.
         | And zoom level is another computational trade off. You start at
         | 0 and define an arbitrary end. The higher the number, the
         | computation/data increases four fold each n. O(4^n)
         | 
         | And as far as why the size requirements are so big, geospatial
         | data is big. You have to record information on every point for
         | vectors which depending on quality can be a ton. And for
         | rasters, we're talking trillions of pixels really. That's why
         | all of this is server side.
         | 
         | And lastly to your point about lightweight desktop software,
         | tiles don't really have a place in the data process. They're
         | only really useful for the visualization aspect. And frankly, I
         | think we're reaching the capacity of the technique, we just
         | might have some headroom in server efficiency.
        
           | kylebarron wrote:
           | > The benefit is continuous zoom, but the pitfall is more
           | server side computation and hence cost.
           | 
           | I haven't tested this with dynamic tiles served from PostGIS,
           | but with static tiles served from S3 it's quite the opposite!
           | There's an initial cost to generating tiles, but once they're
           | generated, you can host them on S3 with zero server cost.
        
           | snodnipper wrote:
           | +1
           | 
           | > And lastly to your point about lightweight desktop
           | software, tiles don't really have a place in the data
           | process. They're only really useful for the visualization
           | aspect. And frankly, I think we're reaching the capacity of
           | the technique, we just might have some headroom in server
           | efficiency.
           | 
           | Not totally sure what you mean on your last point...data can
           | be feature centric (e.g. stored by feature id) or area
           | centric (stored by area location) etc. Storing data by
           | location is important far beyond visualisation and is
           | abstracted in databases such as PostGIS/Postgres (a branded
           | data structure). That said, I acknowledge that ArcGIS Pro,
           | QGIS etc. have limited support for tiled data but of course
           | that is changing. Safe funded much (all?) of the OGR MVT
           | development afaik.
        
             | bransonf wrote:
             | Oh, I meant more about data analysis. Typically you don't
             | import raster tiles unless we're talking about imagery.
             | 
             | But as far as like roads and boundaries, you should always
             | work with the raw vectors.
        
               | kylebarron wrote:
               | This isn't _necessarily_ true, doing data analysis on
               | vector tiles allow for high parallelization. See
               | TileReduce [0]
               | 
               | [0]: https://github.com/mapbox/tile-reduce
        
         | [deleted]
        
         | sp332 wrote:
         | This article from 2010 goes into a lot of detail about which
         | labels to show and how to place them.
         | https://news.ycombinator.com/item?id=1963612
        
         | sleavey wrote:
         | I think many systems are backed by a PostGIS database (an
         | extension for Postgres) with features and their coordinates.
         | Map zoom levels define which features should be visible as
         | layers on the map. A rendering frontend then grabs relevant
         | data for the layer being viewed and builds the tiles.
        
         | trynewideas wrote:
         | > I wish I could learn to build my own tiles, vector or PNG. I
         | don't really understand where the data comes from, how is data
         | gathered and assembled.
         | 
         | The tools are rapidly evolving. There's no great single entry
         | point and the best advice I can give is pretty generic: find a
         | small-scale thing you want to do and do research toward
         | accomplishing it.
         | 
         | The post you're commenting on is about how PostGIS databases
         | mostly do this work for vector tiles on its own now, so "to
         | build your own tiles", you'd set up a PostGIS database and re-
         | read this post. A year or two ago the advice would be pretty
         | different. A year or two from now and the advice will be
         | _completely_ different.
         | 
         | That said, from zero http://geojson.io is a dead
         | straightforward way to do basic operations with GeoJSON data.
         | You can paste in JSON and it renders on the map; you can draw
         | on the map and it generates GeoJSON. (https://tilejson.io does
         | the same for raster tile sets.)
         | 
         | Real-world data is massive and overwhelming to work with --
         | just drawing your own fake maps in geojson.io and working with
         | that might make some of its concepts easier to digest.
         | 
         | Maperitve[1] is a free and relatively straightforward app
         | focused on taking geo data as input and outputting maps. Work
         | with its rendering rules and you'll understand some of the
         | challenges with rendering at different zoom levels or in
         | different contexts.
         | 
         | Then this post from 2018[2] on Tippecanoe (tile and data
         | converter), TileServer GL (tile server), and Leaflet
         | (Javascript front end to view served tiles) covers how to
         | round-trip a package of vector tiles to GeoJSON data and back.
         | It's straightforward, works with a relatively small area of
         | data, doesn't require GIS experience, and though outdated it's
         | still relevant for understanding by practice how a data-to-
         | tiles pipeline can work.
         | 
         | Raster tiles are a little difficult to recommend learning as
         | tooling has mostly moved on from it in favor of vector tiles,
         | which pack more information and flexibility into less data, and
         | I honestly don't know what tools still reliably do that work --
         | once upon a time I used TileMill but it was already abandoned
         | by then and has been very lightly maintained since.
         | 
         | Re: optimization, here's another more advanced post[3] using
         | real-world data that illustrates some of the challenges.
         | 
         | The end-game is to get to a point where you can open something
         | like QGIS[4], a heavyweight tool that can do all of the above
         | and way too much more, or Maputnik[5], a vector tile styling
         | tool using a CSS-ish language, and not get immediately lost.
         | 
         | > I would really like to ask why, on some mapping software, you
         | can't see names of large enough cities/places/regions that are
         | obviously there.
         | 
         | You won't get a great answer why to that question, I'm afraid.
         | It's dependent on and configured in whatever the front end is,
         | generally done algorithmically, and in some cases manually
         | edited. An art as much as a science, and as fallible as both
         | combined. (See Justin O'Beirne's incredible reviews of Apple
         | Map updates[6] for an example.)
         | 
         | No single labeling strategy will make anyone (much less
         | everyone) happy and most end-user tools don't expose
         | customizability.
         | 
         | 1: http://maperitive.net/docs/TwoMinutesIntro.html
         | 
         | 2: https://medium.com/@kennethchambers/using-tippecanoe-
         | tileser...
         | 
         | 3: https://medium.com/@ibesora/a-data-driven-journey-through-
         | ve...
         | 
         | 4: https://qgis.org/en/site/
         | 
         | 5: https://maputnik.github.io
         | 
         | 6: https://www.justinobeirne.com/new-apple-maps
        
           | grizzles wrote:
           | > A year or two from now and the advice will be _completely_
           | different.
           | 
           | Could you elaborate on that? Sounds interesting.
        
         | snodnipper wrote:
         | huge topic.
         | 
         | > I wish I could learn to build my own tiles, vector or PNG. I
         | don't really understand where the data comes from, how is data
         | gathered and assembled.
         | 
         | There are many data providers out there. You might be
         | interested in OpenMapTiles, which is a pipeline from
         | OpenStreetMap (OSM) data.
         | https://github.com/openmaptiles/openmaptiles
         | 
         | Also check out Maputnik http://maputnik.github.io/editor/
         | 
         | If you want to learn about "tiling schemes" then head over to
         | https://www.maptiler.com/google-maps-coordinates-tile-bounds...
         | 
         | > I'm also really curious about the choices involving the zoom
         | level, how do you decide to render things depending on the zoom
         | level, when is data discarded, to have good detail or better
         | performance and lighter tiles. I would really be willing to try
         | build lighter maps so I can have my own mapping software on a
         | desktop machine.
         | 
         | Lots of different considerations - is a human going to look at
         | the map? If so then a cartographer will determine what is going
         | to be shown at a given scale. There are other constraints too,
         | such as limited space to show data and also hidden constraints,
         | such as the maximum amount of data for a region (e.g. ~500kb
         | per tile in the case of mapbox vector tiles)
         | 
         | > The data sizes and hardware requirements involved are
         | generally pretty big. It could be interesting to see how much
         | details one could achieve to make a "portable" map browser when
         | limiting the data size to 2GB, 5GB or 10GB.
         | 
         | Lots of projects out there doing impressive things there.
         | Quadtree tiles get you so far...k-d trees might yield other
         | useful properties. Skobbler have some pretty impressive data
         | compression technology (~12GiB for global coverage, routable
         | and searchable...with some limitations - skobbler.com/apps). Of
         | course the trick is to discard all that you don't need.
         | 
         | > I would really like to ask why, on some mapping software, you
         | can't see names of large enough cities/places/regions that are
         | obviously there. It often makes it difficult to browse maps
         | properly.
         | 
         | If there is limited budget then the effort to create
         | appropriate labels is limited. Data sources can be limited /
         | incomplete...there can be nuances between jurisdictions etc.
         | and of course label prioritisation has been a longstanding
         | problem. What happens when you rotate a map and the text labels
         | collide with one another...which ones do you keep...which do
         | you discard etc. These things are also context dependent...why
         | not include continent names? Or region names? Or province
         | names? What about the difference between physical and political
         | geography? A cartographer can help ensure that the right
         | information is available at the right time...whilst
         | acknowledging that they have to tell little white lies in every
         | map they make.
        
       ___________________________________________________________________
       (page generated 2020-01-02 23:00 UTC)