[HN Gopher] PgSync: Sync Postgres data between databases
       ___________________________________________________________________
        
       PgSync: Sync Postgres data between databases
        
       Author : aeontech
       Score  : 95 points
       Date   : 2020-03-24 16:39 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | arcticfox wrote:
       | How does this compare to dumping and loading for full databases?
       | I have a number of cloning processes that do that, should I look
       | into this instead?
        
       | pritambaral wrote:
       | I think pgloader does it better:
       | https://pgloader.readthedocs.io/en/latest/ref/pgsql.html
        
         | durkie wrote:
         | what makes it better?
        
           | dfsegoat wrote:
           | A few things I really like about pgloader (which to be clear,
           | uses the COPY protocol):
           | 
           | - Handles MySQL -> pg; MSSQL -> pg; And pg -> pg copy
           | 
           | - Very fast (lisp/"real threading")
           | 
           | - Syncs schemas effectively (this says data only)
           | 
           | - Has it's own DSL for batch jobs: specify tables to
           | include/exclude, renaming them on the fly in dest, and cast
           | dtypes between src and dest if needed. etc
        
             | massaman_yams wrote:
             | This uses COPY for export & import:
             | 
             | https://github.com/ankane/pgsync/blob/master/lib/pgsync/tab
             | l...
        
               | dfsegoat wrote:
               | That was noted in the first line of my comment FWIW.
        
             | pampa wrote:
             | I tried it once, could not make it work.
             | 
             | I tried migrating a smallish mysql database (~10Gb) to
             | postgres and it always crashed with a weird runtime memory
             | error. Reducing the number of threads or doing it table by
             | table didnt help.
        
               | folkhack wrote:
               | Just offering an anecdotal experience to counter, not
               | saying you or your experience is wrong...
               | 
               | I've used pgloader multiple times because I'm a huge
               | Postgres evangelist for that exact use case multiple
               | times without issue. Honestly - it's a favorite tool in
               | my toolbox.
        
               | phoe-krk wrote:
               | > weird runtime memory error.
               | 
               | If it's "heap exhausted, game over", there's a solution
               | for that - you need to tell pgloader to allocate more
               | memory for itself.
        
         | sharadov wrote:
         | Looks good, am going to give it a test drive. Does it use
         | foreign data wrappers to create connections between servers?
        
       | metreo wrote:
       | Purely in Ruby as well... nice.
        
         | why-el wrote:
         | Ruby is just for plumbing, i.e. preparing queries and whatnot,
         | it's still pg_dump under the hood.
        
       | ahachete wrote:
       | How does it work in the event of real time changes happening on
       | the database(s)?
       | 
       | It appears not to use logical decoding/replication. If so, how it
       | does sync the data? It sounds like a hard problem, and not very
       | efficient, to do it without logical decoding/replication.
       | 
       | I didn't see documentation about it. Unless.. it is intended to
       | be used only with offline databases.
        
         | massaman_yams wrote:
         | It's batch sync, not streaming or replication. By default,
         | appears to dedupe based on PK.
        
           | toomim wrote:
           | Is this two-way or one-way sync? It's sounding like it's one-
           | way. I don't see anything about how to deal with clobbers,
           | for instance, if the same data is changed in two different
           | databases at the same time.
           | 
           | And "batch one-way sync" is better described as "copy".
           | 
           | So I think this is a "postgres database copy" tool. Also
           | known as clone, or backup. And as such, it's competing with
           | the existing postgres database cloning tools, like pg_dump.
           | So how is this different than that?
        
             | sharadov wrote:
             | Kind of like MSSQL SSIS, but with only data export-import
             | functionality.
        
             | massaman_yams wrote:
             | One-way, yes. pg's COPY command can't do upserts, this can.
             | 
             | The primary use case for this appears to be ETL. Pg_dump is
             | more backup-oriented and optimized for larger operations,
             | vs. a bit more fine-grained in most ETL processes.
        
       | tomnipotent wrote:
       | Embulk has been around for a long time, and supports considerably
       | more options than just Postgres (especially since pgsync looks
       | like it's just doing statement-based data copying).
       | 
       | https://github.com/embulk/embulk
        
       | loganfrederick wrote:
       | Is there something specific in Postgres (perhaps in the binlog)
       | that makes it easier to do this than MySQL? Seems like there's
       | more tools for this kind of data loading than I've seen for
       | MySQL.
       | 
       | BTW: Awesome work ankane! Cool dude too :)
        
       ___________________________________________________________________
       (page generated 2020-03-24 23:00 UTC)