[HN Gopher] Migrating Large Heroku Postgres Instances to AWS Aur...
       ___________________________________________________________________
        
       Migrating Large Heroku Postgres Instances to AWS Aurora Without
       Downtime
        
       Author : sciguymcq
       Score  : 81 points
       Date   : 2020-11-12 16:27 UTC (6 hours ago)
        
 (HTM) web link (thecodinginterface.com)
 (TXT) w3m dump (thecodinginterface.com)
        
       | shangxiao wrote:
       | Nice!
       | 
       | Slight nitpick: Django should automatically create the Hstore
       | extension for you [1] during migrate whenever you use a
       | HStoreField
       | 
       | [1]
       | https://github.com/django/django/blob/f5e07601b233a50e6bcca4...
       | 
       | Edit: I might've been mistaken here in thinking the above
       | migration operation was automatically applied as the docs seem to
       | suggest you need to add it yourself [2] but at least it's as easy
       | as adding an operation ;)
       | 
       | [2]
       | https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/f...
        
       | davidverhasselt wrote:
       | In a similar vein, I recently purchased this ebook
       | https://pawelurbanek.com/heroku-migrate-postgres-rds - I haven't
       | finished it yet but I enjoy the small, edited format of it.
        
       | tenaciousDaniel wrote:
       | I'm considering moving from a self-hosted Mongo setup to Aurora,
       | either with PG or MySQL. Anyone here have opinions on Aurora?
        
         | mbesto wrote:
         | I've heard mixed reviews on Aurora MySQL. Some people move to
         | it and see huge performance gains, others simply don't or have
         | some limitations.
        
         | bastawhiz wrote:
         | It does the job, but aurora postgres lags behind mysql when it
         | comes to features. Failovers are surprisingly slow, too. You're
         | pretty much stuck with cloudwatch for monitoring, which has
         | significant lag.
        
           | mayank wrote:
           | > aurora postgres lags behind mysql when it comes to features
           | 
           | And performance, on some workloads we tested. I would
           | benchmark write performance very carefully before using
           | Aurora Postgres (Aurora MySQL does not seem to suffer from
           | the same issue on the same workload).
           | 
           | > Failovers are surprisingly slow, too.
           | 
           | The recommendation now is to use RDS Proxy for faster
           | failovers.
        
         | dageshi wrote:
         | If you're thinking of going for the version that auto sleeps
         | when not in use be aware there's a 30-40 second startup time
         | from first incoming request. If it's a database you don't need
         | to be accessible except in bursts this can be really handy for
         | cutting costs as you're not paying for any read/write resources
         | you're not using. Of course if you do need it generally
         | accessible all the time then you'd want to provision it
         | accordingly.
        
         | mnutt wrote:
         | Aurora (Postgres) looked like it would be great, and the
         | performance for our use case was much better than RDS. But
         | there was a showstopper for us with the replication: If the
         | cluster leader fails, _every read replica will force a restart_
         | to elect a new leader, resulting in minimum ~30s complete
         | cluster downtime. In our situation we have no problem with a
         | short downtime in write availability, but the fact that we
         | can't maintain read availability via replicas is a deal
         | breaker.
        
           | hyperdimension wrote:
           | I'm no DBA, but is that not the _entire_ point of a cluster
           | /replicas?
           | 
           | Sure, maybe writes won't be accepted or replicated until a
           | new master is elected, but you can't even read from the
           | cluster?
           | 
           | I'm curious at this point: what is it exactly that you gain
           | from a cluster? Is it only having to wait 30s vs. however
           | long it takes for the master to come back online?
        
             | mnutt wrote:
             | That was the mind-boggling part to me. After testing
             | failover and noticing the behavior we had a number of
             | discussions with AWS reps and support engineers that
             | confirmed it. And Aurora's failover documentation hints at
             | it but downplays the severity.
        
         | atanasovskib wrote:
         | I would suggest checking out other smaller cloud providers for
         | managed databases over Aurora. Check out Aiven.io
        
       | 0xCMP wrote:
       | Very thorough and nice article! It's really nice to have this all
       | in one place for reference later.
       | 
       | In skimming this page I wish the following had been more
       | prominent:
       | 
       | > In the Real World for this process to work you need to request
       | for the Heroku Data Support team to establish continuous WAL log
       | shipping to an AWS S3 bucket along with a base physical backup
       | using the WAL-E Python based library.
       | 
       | I kept searching in vain for _where exactly you were able to get
       | a WAL-E backup into the bucket_ and finally found that line.
        
       | pedrokost wrote:
       | The author has gained a significant amount of Postgres-related
       | knowledge when performing this migration.
       | 
       | Taking this into consideration, is it still worth using the
       | managed Aurora, instead of the EC2 self-managed instance?
        
         | jrochkind1 wrote:
         | RDS Postgres (actual postgres, not the Aurora product which can
         | be "mostly postgres compatible") is another option, not just
         | "EC2 self-managed instance".
         | 
         | I guess we'd have to look at price comparisons for the size you
         | actually need. I think depending on load, the price differences
         | may be nominal.
        
           | mijoharas wrote:
           | Thought I'd weigh in here, since we moved from heroku to ec2
           | to aurora to RDS postgres, so I can probably speak to this a
           | little more.
           | 
           | * ec2 self managed is easily the cheapest, we had a solid
           | setup, with continuous backups and a read replica, if cost is
           | a factor, it's easily a winner. However, there is a _lot_ of
           | knowledge that goes with it. When it comes down to it, you
           | can pay someone else to handle that. This isn't just the
           | setup cost, you need to factor in ongoing maintenance (the
           | number of people at my company that could have done
           | complicated things with the instance was probably me and
           | another engineer, and we didn't want to be permanently on
           | call for this) and general risk.
           | 
           | * EC2 did probably work out to between 1/2 to 3/4 (probably
           | 2/3) the price of the equivalent RDS (tough to say exactly,
           | as I'd need to factor in all the ancillary costs that are
           | more "bundled together" in RDS)
           | 
           | * RDS was much cheaper than aurora for our workload. (almost
           | 1/2 the price).
           | 
           | I think the main thing to remember, is that ec2 is much
           | cheaper than RDS, but RDS is cheaper than engineers. With
           | that, while we didn't need to do anything complicated other
           | than the migration, the risk and possible engineering time
           | and bus factor didn't feel worth it to stay on ec2.
        
             | rohansahai wrote:
             | Been on RDS for awhile now but have recently had some
             | issues that are pushing us to heavily consider EC2 -
             | especially since we've heard about considerable performance
             | gains running on some of the bare metal EC2 options out
             | there vs RDS.
        
               | mijoharas wrote:
               | We had equivalent performance (and we'd benchmarked both)
               | for equivalent RDS and EC2 instances. (EDIT: to note, we
               | didn't dig in heavily, just looked and saw it seemed the
               | same, maybe there's something smart you can do to make
               | EC2 a lot faster)
               | 
               | First thing to think about for RDS (or any postgres
               | instance really) is to figure out what the bottlenecks
               | are, if your cache hit ratio is slightly low, or queries
               | are generally feeling a little sluggish but cpu and ram
               | aren't too high, I'd recommend looking into increasing
               | the provisioned IOPS of the machine (requires no
               | downtime, basically you can just make the storage
               | faster).
               | 
               | If your issues are ram/cpu being high you might want to
               | consider moving to a larger instance.
               | 
               | It can also be worth looking into the number of temp
               | files being created by your queries and tweak your
               | work_mem setting (you can use log_temp_files to see if
               | this is likely to help a bit).
               | 
               | Other than that, remember that every connection to the db
               | is a separate process, so you don't want too many. It
               | could be worth looking at RDS proxy or pg_bouncer if you
               | think you have a lot of connections to your db.
        
         | sciguymcq wrote:
         | I currently manage, or am "lead dev", for around a half dozen
         | to 10 apps using Postgres as the database. For two of those I
         | use a managed database service (AWS RDS / AWS Aurora Postgres)
         | because they drive mission critical high value products /
         | services and I wouldn't consider using a standalone install of
         | Postgres. For other apps that are not so mission critical I am
         | perfectly happy saving considerable money and running them on a
         | VPS on AWS EC2 or Digital Ocean or Linode. So ultimately it
         | depends on the app and the risk tolerance for Recovery Time
         | Objective and Recovery Point Objective.
        
           | rohansahai wrote:
           | That's super interesting! Have heard people do the opposite
           | for performance reasons (run on bare EC2) and storage cost
           | reason. Do you notice a big performance difference between
           | the EC2 apps and RDS apps? Also have you had any EBS volume
           | related issues on RDS or unexplained latency/downtime ?
        
           | fastball wrote:
           | Worth noting that DO also has a hosted Postgres service.
        
       | patman81 wrote:
       | That's impressive. We struggle to find a way to "migrate" from
       | AWS Aurora 1 to 2 (that is from MySQL 5.6 to 5.7). There seems to
       | be no (easy) way to upgrade without downtime for large databases.
        
       | mijoharas wrote:
       | Might have been nice to see this post a year ago :).
       | 
       | My company migrated our ~2TB postgres db from heroku to AWS
       | (originally aurora, we switched back to RDS postgres). Haven't
       | had time for a blog post, but figure I might post our learnings
       | here, and the differences in our approach.
       | 
       | * We did the second step (ec2 -> aurora) via AWS DMS (database
       | migration service). This service may be a little more of a pain
       | than it's worth, but it's what we settled on.
       | 
       | * we used wal-g rather than wal-e for the initial phase, it has
       | slightly better performance, and we were happy with the tool.
       | 
       | * We wrote ansible playbooks to perform the heroku -> ec2
       | migration. it's definitely nice and makes you feel a little more
       | secure having run the db migration multiple times over, and it
       | would be a giant PITA to do all that manually. (hadn't used
       | ansible much before, I have to say, I'm a fan! had used
       | chef/puppet in bits and pieces before and was never that keen on
       | them).
       | 
       | * Aurora wasn't the right fit for us. Be careful about reading
       | the pricing. On the AWS pricing calculator it asks you for how
       | many million requests you have per month. This is _not_ how it's
       | charged. If you read the docs, the IO charges apply to
       | AuroraReadIO and AuroraWriteIO, which is not per db request, it's
       | per db storage request, which is number of requests * number of
       | db blocks that each request fetches.
       | 
       | * Our Aurora costs were through the roof, (and variable and
       | spiky, never something you want!). We migrated again to RDS
       | postgres, and bumping the provisioned IOPs up we had equivalent
       | performance (possibly better?) for half the price (and the price
       | was stable). We benchmarked everything else but the cost with the
       | db migration, instead relying on the AWS simple pricing
       | calculator. The difference between predicted and actual storage
       | IO costs was over 10x out if my memory serves (making the entire
       | DB double the price per month). BEWARE!
       | 
       | * Aws DMS is kinda buggy, last I checked the table selection
       | editor UI didn't work when you added/modified multiple fields.
       | You can switch to a json editor, so generate the table selection
       | once, and check it in the json editor in case the UI got borked.
       | 
       | * Speed wasn't great with AWS DMS as well, we had to flip
       | BatchApplyEnabled on, and take a lot of care with it.
       | 
       | * There were definitely some shenanigans to do with LOBs, but I
       | can't fully remember them.
       | 
       | * DMS does not update id sequences, so if you have any tables
       | with incrementing ids, make sure that you set the sequences
       | manually before switching dbs.
       | 
       | * We used debian as our intermediate ec2 instance. We've
       | historically used ubuntu for many ec2 instances, but the default
       | ubuntu AMI image uses MBR for it's partition (!!! in 2020!!!) and
       | while we had some code to swap it to GPT while running, that felt
       | a little too spicy for a db migration, so we used base debian
       | which has a sane GPT partitioned disk for it's AMI. (Note: we
       | needed to create partitions larger than what MBR can support so
       | this was a hard requirement).
       | 
       | * The last learning I should add, is that heroku support were
       | actually really great during the whole process, very
       | knowledgeable and gave some setup example files for the ec2
       | instance. After that they basically leave it to you, but I can't
       | speak highly enough about them (thanks jessie!)
       | 
       | * One more for you, be careful about the version of libc that you
       | have. even if the postgres versions are identical, subsequent
       | libc versions can have slightly different string sorting, which
       | can corrupt your indexes that are migrated after copying the
       | basebackup. (we needed to rebuild an index or two due to this,
       | not a huge deal, but an interesting gotcha. I swear I checked our
       | locale settings 5 times when it went wrong until I clocked on to
       | it being the libc version.)
       | 
       | There were probably more learnings than that, but that's what
       | comes to mind.
       | 
       | (EDIT: added final bullet point). (EDIT 2: added one more bullet)
        
         | sciguymcq wrote:
         | I originally wanted to use DMS but our DB failed the pre-
         | migration assessment so that forced us to Logical Replication.
         | 
         | We also worked with Jesse. You are right he is amazingly
         | helpful. Unfortunately we got some pretty questionable help
         | from others before he was assigned to our support ticket but
         | once that happened it was infinitely improved.
        
           | mijoharas wrote:
           | Ahhhh... well you might've dodged a bullet to be honest! If I
           | were to do it all again, there's a good chance I might stick
           | with the standard tools rather than DMS.
           | 
           | Thanks for the in depth article by the way, very well
           | written!
           | 
           | Keep an eye on your AuroraStorageIO costs going forward, as
           | we said, we halved our costs and increased performance by
           | switching back to RDS and increasing the PIOPs available. I'm
           | sure that'll be workload sensitive though, so YMMV. We did a
           | bunch of benchmarking, and aurora seemed nice apart from
           | this.
           | 
           | We must've lucked out, as we got Jesse at the start.
        
         | yowlingcat wrote:
         | Fantastic comment, thank you so much. I've been following
         | Aurora for a while but a little curious (and skeptical) about
         | sharp edges vs vanilla PG. If I may ask you a follow-on
         | question, who do you think Aurora would be the right fit for?
        
           | mijoharas wrote:
           | Hmmmm... That's a good question.
           | 
           | The performance was great, I can't complain about it, but the
           | unpredictable (and high) price was the issue.
           | 
           | In terms of aurora I think it probably can scale better to
           | much larger workloads. If we put the time in and dug in and
           | optimised any queries that could touch a large number of
           | blocks (we do well, but I'm sure there are queries that have
           | some seq scans somewhere in our codebase) then the price
           | wouldn't have been an issue, and we'd happily have stayed
           | with it.
           | 
           | In addition, we have a single large master and one replica.
           | with my understanding of aurora's architecture, and how the
           | storage is separated from the "postgres" instance, it could
           | possibly do a lot better with drastically different
           | architectures with a lot of different smaller "postgres"
           | instances. We didn't have time to investigate many other
           | architectures and in the end moving to RDS felt safest
           | (didn't want to spend too much time when we could be building
           | core features)
           | 
           | Hope that answers your question! Some of the above is more an
           | educated guess with a smattering of experience so please
           | treat it as such.
        
             | mijoharas wrote:
             | On the other hand, vanilla postgres is very powerful, and
             | you shouldn't underestimate how much you can win by
             | cranking up the PIOPs for an RDS postgres instance.
             | 
             | Remember, always benchmark representative workloads.
        
       | ekump wrote:
       | This is a great write-up! I wrote a far less thorough blog post
       | on the same topic a few months ago:
       | https://vericred.github.io/2020/08/10/migrating-to-aurora.ht....
       | 
       | Overall, we're happy with our decision to migrate to Aurora. We
       | feel it offers better performance, observability and scalability
       | compared to heroku postgres. To echo some other comments:
       | 
       | 1. Jesse at Heroku is great. I had never setup my own postgres
       | instance before, and his guidance helped me a lot.
       | 
       | 2. We used DMS, and it was ok, not great. Dealing with LOBs was a
       | bit of a pain. In order to keep the migration time to a minimum
       | we setup multiple DMS tasks and grouped tables into those tasks
       | based on their max LOB size. Most of our tables didn't have any
       | LOB columns and they migrated quickly.
       | 
       | 3. The ability to fast clone a database is nice. We can create a
       | copy of our production database in about five minutes for use in
       | non-prod environments. This would take a couple of hours in
       | heroku.
       | 
       | 4. The Aurora IO costs are ridiculous. Are there any good tools
       | to see what is exactly causing so much I/O?
        
         | aeyes wrote:
         | > 4. The Aurora IO costs are ridiculous. Are there any good
         | tools to see what is exactly causing so much I/O?
         | 
         | I don't use Aurora so I don't know what is available but
         | pg_stat_statements does have I/O statistics so you can review
         | it manually or use a tool like pgAnalyze (not sure if it
         | supports Aurora) to see what queries are I/O intensive.
        
         | mijoharas wrote:
         | Hey, regarding 4. we came across this but didn't find anything
         | from the article particularly helpful for us:
         | 
         | https://aws.amazon.com/blogs/database/reducing-aurora-postgr...
         | 
         | (we bounced to normal RDS postgres and reduced our costs a lot
         | while keeping performance similar, check out my other comments
         | on this thread for details).
        
       ___________________________________________________________________
       (page generated 2020-11-12 23:02 UTC)