[HN Gopher] Writing a Postgres Foreign Data Wrapper for Clickhou...
       ___________________________________________________________________
        
       Writing a Postgres Foreign Data Wrapper for Clickhouse in Go
        
       Author : arunk-s
       Score  : 50 points
       Date   : 2020-11-19 20:09 UTC (2 hours ago)
        
 (HTM) web link (arunsori.me)
 (TXT) w3m dump (arunsori.me)
        
       | stareatgoats wrote:
       | I sometimes fondly remember my time as an office hero, using
       | MSAccess to attach to just about any data source imaginable, copy
       | to a temptable and run whatever cleaning was required before
       | loading into the server database.
       | 
       | Good times; almost 25 years ago now. Sometimes I wonder if we're
       | stuck.
        
       | gscho wrote:
       | This is really cool. I had not heard of Foreign Data Wrappers for
       | Postgres before! Are these used in production commonly or more of
       | a toy thing?
        
         | caseyohara wrote:
         | Here's a great list of Postgres FDWs:
         | https://wiki.postgresql.org/wiki/Foreign_data_wrappers
         | 
         | I had no idea PG has native FDWs for Twitter and S3. That's
         | pretty awesome.
        
         | kid_atticus wrote:
         | There are generally two classes for FDWs: Postgres<->Postgres
         | and Postgres->Everything else.
         | 
         | The first one is generally suitable for production and is very
         | useful for sharded Postgres when you want to communicate across
         | shards without having to go back out through the application.
         | 
         | The second one's mileage really varies. Some implementations
         | might or might not be prod ready or mig target only specific
         | version combinations. Can be very useful for data engineering
         | or analytics use cases for quick ETL into a staging database.
         | Or for data migrations between database vendors.
        
           | [deleted]
        
         | rch wrote:
         | This could be of interest to you:
         | 
         | Ville Tuulos - How to Build a SQL-based Data Warehouse for 100+
         | Billion Rows in Python
         | 
         |  _PyData SV 2014 - In this talk, we show how and why AdRoll
         | built a custom, high-performance data warehouse in Python which
         | can handle hundreds of billions of data points with sub-minute
         | latency on a small cluster of servers. This feat is made
         | possible by a non-trivial combination of compressed data
         | structures, meta-programming, and just-in-time compilation
         | using Numba, a compiler for numerical Python. To enable smooth
         | interoperability with existing tools, the system provides a
         | standard SQL-interface using Multicorn and Foreign Data
         | Wrappers in PostgreSQL._
         | 
         | https://www.youtube.com/watch?v=xnfnv6WT1Ng
        
         | arunk-s wrote:
         | I definitely think they are used in production though I haven't
         | tried it myself.
         | 
         | But I can find ancedots of people using it production on the
         | web[1].
         | 
         | 1: https://carto.com/blog/postgres-fdw/
        
           | noefingway wrote:
           | Currently using this one https://github.com/pramsey/pgsql-
           | ogr-fdw for an image migration project. Using it to pull
           | image metadata thru ODBC. While it's primary design is for
           | GIS data, it works well for any ODBC database.
        
         | mildbyte wrote:
         | We use them in production at Splitgraph [0] to power our DDN
         | (like a CDN, but for data). We make a PostgreSQL-compatible
         | endpoint available to the public to query any of the tens of
         | thousands of open datasets by referencing them as virtual
         | tables: they're not hosted by us but we proxy to them using
         | Postgres FDWs. When a query comes in, we intercept it and
         | redirect it to a FDW instance that handles query translation
         | and planning from the PG dialect to that of the backend data
         | source.
         | 
         | We wrote an FDW for Socrata-powered [1] government open data
         | portals to query the public datasets that we index in the
         | Splitgraph catalog as a proof-of-concept. However, there are
         | plenty of other FDWs that we're working on integrating to let
         | people add their own backend data sources (RDS, Snowflake etc).
         | 
         | FDW plugin quality varies (some of them can't push down all
         | predicates or JOINs) but it's definitely an interesting way to
         | think about accessing data. We also added a lot of scaffolding
         | around foreign data wrappers in our open-source tool [2] that
         | makes it easy to add a FDW-managed data source to a PostgreSQL
         | instance.
         | 
         | [0] https://www.splitgraph.com/blog/data-delivery-network-
         | launch
         | 
         | [1] https://www.tylertech.com/products/socrata
         | 
         | [2] https://www.splitgraph.com/blog/foreign-data-wrappers
        
       ___________________________________________________________________
       (page generated 2020-11-19 23:00 UTC)