[HN Gopher] Migrating Large Heroku Postgres Instances to AWS Aur... ___________________________________________________________________ Migrating Large Heroku Postgres Instances to AWS Aurora Without Downtime Author : sciguymcq Score : 81 points Date : 2020-11-12 16:27 UTC (6 hours ago) (HTM) web link (thecodinginterface.com) (TXT) w3m dump (thecodinginterface.com) | shangxiao wrote: | Nice! | | Slight nitpick: Django should automatically create the Hstore | extension for you [1] during migrate whenever you use a | HStoreField | | [1] | https://github.com/django/django/blob/f5e07601b233a50e6bcca4... | | Edit: I might've been mistaken here in thinking the above | migration operation was automatically applied as the docs seem to | suggest you need to add it yourself [2] but at least it's as easy | as adding an operation ;) | | [2] | https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/f... | davidverhasselt wrote: | In a similar vein, I recently purchased this ebook | https://pawelurbanek.com/heroku-migrate-postgres-rds - I haven't | finished it yet but I enjoy the small, edited format of it. | tenaciousDaniel wrote: | I'm considering moving from a self-hosted Mongo setup to Aurora, | either with PG or MySQL. Anyone here have opinions on Aurora? | mbesto wrote: | I've heard mixed reviews on Aurora MySQL. Some people move to | it and see huge performance gains, others simply don't or have | some limitations. | bastawhiz wrote: | It does the job, but aurora postgres lags behind mysql when it | comes to features. Failovers are surprisingly slow, too. You're | pretty much stuck with cloudwatch for monitoring, which has | significant lag. | mayank wrote: | > aurora postgres lags behind mysql when it comes to features | | And performance, on some workloads we tested. I would | benchmark write performance very carefully before using | Aurora Postgres (Aurora MySQL does not seem to suffer from | the same issue on the same workload). | | > Failovers are surprisingly slow, too. | | The recommendation now is to use RDS Proxy for faster | failovers. | dageshi wrote: | If you're thinking of going for the version that auto sleeps | when not in use be aware there's a 30-40 second startup time | from first incoming request. If it's a database you don't need | to be accessible except in bursts this can be really handy for | cutting costs as you're not paying for any read/write resources | you're not using. Of course if you do need it generally | accessible all the time then you'd want to provision it | accordingly. | mnutt wrote: | Aurora (Postgres) looked like it would be great, and the | performance for our use case was much better than RDS. But | there was a showstopper for us with the replication: If the | cluster leader fails, _every read replica will force a restart_ | to elect a new leader, resulting in minimum ~30s complete | cluster downtime. In our situation we have no problem with a | short downtime in write availability, but the fact that we | can't maintain read availability via replicas is a deal | breaker. | hyperdimension wrote: | I'm no DBA, but is that not the _entire_ point of a cluster | /replicas? | | Sure, maybe writes won't be accepted or replicated until a | new master is elected, but you can't even read from the | cluster? | | I'm curious at this point: what is it exactly that you gain | from a cluster? Is it only having to wait 30s vs. however | long it takes for the master to come back online? | mnutt wrote: | That was the mind-boggling part to me. After testing | failover and noticing the behavior we had a number of | discussions with AWS reps and support engineers that | confirmed it. And Aurora's failover documentation hints at | it but downplays the severity. | atanasovskib wrote: | I would suggest checking out other smaller cloud providers for | managed databases over Aurora. Check out Aiven.io | 0xCMP wrote: | Very thorough and nice article! It's really nice to have this all | in one place for reference later. | | In skimming this page I wish the following had been more | prominent: | | > In the Real World for this process to work you need to request | for the Heroku Data Support team to establish continuous WAL log | shipping to an AWS S3 bucket along with a base physical backup | using the WAL-E Python based library. | | I kept searching in vain for _where exactly you were able to get | a WAL-E backup into the bucket_ and finally found that line. | pedrokost wrote: | The author has gained a significant amount of Postgres-related | knowledge when performing this migration. | | Taking this into consideration, is it still worth using the | managed Aurora, instead of the EC2 self-managed instance? | jrochkind1 wrote: | RDS Postgres (actual postgres, not the Aurora product which can | be "mostly postgres compatible") is another option, not just | "EC2 self-managed instance". | | I guess we'd have to look at price comparisons for the size you | actually need. I think depending on load, the price differences | may be nominal. | mijoharas wrote: | Thought I'd weigh in here, since we moved from heroku to ec2 | to aurora to RDS postgres, so I can probably speak to this a | little more. | | * ec2 self managed is easily the cheapest, we had a solid | setup, with continuous backups and a read replica, if cost is | a factor, it's easily a winner. However, there is a _lot_ of | knowledge that goes with it. When it comes down to it, you | can pay someone else to handle that. This isn't just the | setup cost, you need to factor in ongoing maintenance (the | number of people at my company that could have done | complicated things with the instance was probably me and | another engineer, and we didn't want to be permanently on | call for this) and general risk. | | * EC2 did probably work out to between 1/2 to 3/4 (probably | 2/3) the price of the equivalent RDS (tough to say exactly, | as I'd need to factor in all the ancillary costs that are | more "bundled together" in RDS) | | * RDS was much cheaper than aurora for our workload. (almost | 1/2 the price). | | I think the main thing to remember, is that ec2 is much | cheaper than RDS, but RDS is cheaper than engineers. With | that, while we didn't need to do anything complicated other | than the migration, the risk and possible engineering time | and bus factor didn't feel worth it to stay on ec2. | rohansahai wrote: | Been on RDS for awhile now but have recently had some | issues that are pushing us to heavily consider EC2 - | especially since we've heard about considerable performance | gains running on some of the bare metal EC2 options out | there vs RDS. | mijoharas wrote: | We had equivalent performance (and we'd benchmarked both) | for equivalent RDS and EC2 instances. (EDIT: to note, we | didn't dig in heavily, just looked and saw it seemed the | same, maybe there's something smart you can do to make | EC2 a lot faster) | | First thing to think about for RDS (or any postgres | instance really) is to figure out what the bottlenecks | are, if your cache hit ratio is slightly low, or queries | are generally feeling a little sluggish but cpu and ram | aren't too high, I'd recommend looking into increasing | the provisioned IOPS of the machine (requires no | downtime, basically you can just make the storage | faster). | | If your issues are ram/cpu being high you might want to | consider moving to a larger instance. | | It can also be worth looking into the number of temp | files being created by your queries and tweak your | work_mem setting (you can use log_temp_files to see if | this is likely to help a bit). | | Other than that, remember that every connection to the db | is a separate process, so you don't want too many. It | could be worth looking at RDS proxy or pg_bouncer if you | think you have a lot of connections to your db. | sciguymcq wrote: | I currently manage, or am "lead dev", for around a half dozen | to 10 apps using Postgres as the database. For two of those I | use a managed database service (AWS RDS / AWS Aurora Postgres) | because they drive mission critical high value products / | services and I wouldn't consider using a standalone install of | Postgres. For other apps that are not so mission critical I am | perfectly happy saving considerable money and running them on a | VPS on AWS EC2 or Digital Ocean or Linode. So ultimately it | depends on the app and the risk tolerance for Recovery Time | Objective and Recovery Point Objective. | rohansahai wrote: | That's super interesting! Have heard people do the opposite | for performance reasons (run on bare EC2) and storage cost | reason. Do you notice a big performance difference between | the EC2 apps and RDS apps? Also have you had any EBS volume | related issues on RDS or unexplained latency/downtime ? | fastball wrote: | Worth noting that DO also has a hosted Postgres service. | patman81 wrote: | That's impressive. We struggle to find a way to "migrate" from | AWS Aurora 1 to 2 (that is from MySQL 5.6 to 5.7). There seems to | be no (easy) way to upgrade without downtime for large databases. | mijoharas wrote: | Might have been nice to see this post a year ago :). | | My company migrated our ~2TB postgres db from heroku to AWS | (originally aurora, we switched back to RDS postgres). Haven't | had time for a blog post, but figure I might post our learnings | here, and the differences in our approach. | | * We did the second step (ec2 -> aurora) via AWS DMS (database | migration service). This service may be a little more of a pain | than it's worth, but it's what we settled on. | | * we used wal-g rather than wal-e for the initial phase, it has | slightly better performance, and we were happy with the tool. | | * We wrote ansible playbooks to perform the heroku -> ec2 | migration. it's definitely nice and makes you feel a little more | secure having run the db migration multiple times over, and it | would be a giant PITA to do all that manually. (hadn't used | ansible much before, I have to say, I'm a fan! had used | chef/puppet in bits and pieces before and was never that keen on | them). | | * Aurora wasn't the right fit for us. Be careful about reading | the pricing. On the AWS pricing calculator it asks you for how | many million requests you have per month. This is _not_ how it's | charged. If you read the docs, the IO charges apply to | AuroraReadIO and AuroraWriteIO, which is not per db request, it's | per db storage request, which is number of requests * number of | db blocks that each request fetches. | | * Our Aurora costs were through the roof, (and variable and | spiky, never something you want!). We migrated again to RDS | postgres, and bumping the provisioned IOPs up we had equivalent | performance (possibly better?) for half the price (and the price | was stable). We benchmarked everything else but the cost with the | db migration, instead relying on the AWS simple pricing | calculator. The difference between predicted and actual storage | IO costs was over 10x out if my memory serves (making the entire | DB double the price per month). BEWARE! | | * Aws DMS is kinda buggy, last I checked the table selection | editor UI didn't work when you added/modified multiple fields. | You can switch to a json editor, so generate the table selection | once, and check it in the json editor in case the UI got borked. | | * Speed wasn't great with AWS DMS as well, we had to flip | BatchApplyEnabled on, and take a lot of care with it. | | * There were definitely some shenanigans to do with LOBs, but I | can't fully remember them. | | * DMS does not update id sequences, so if you have any tables | with incrementing ids, make sure that you set the sequences | manually before switching dbs. | | * We used debian as our intermediate ec2 instance. We've | historically used ubuntu for many ec2 instances, but the default | ubuntu AMI image uses MBR for it's partition (!!! in 2020!!!) and | while we had some code to swap it to GPT while running, that felt | a little too spicy for a db migration, so we used base debian | which has a sane GPT partitioned disk for it's AMI. (Note: we | needed to create partitions larger than what MBR can support so | this was a hard requirement). | | * The last learning I should add, is that heroku support were | actually really great during the whole process, very | knowledgeable and gave some setup example files for the ec2 | instance. After that they basically leave it to you, but I can't | speak highly enough about them (thanks jessie!) | | * One more for you, be careful about the version of libc that you | have. even if the postgres versions are identical, subsequent | libc versions can have slightly different string sorting, which | can corrupt your indexes that are migrated after copying the | basebackup. (we needed to rebuild an index or two due to this, | not a huge deal, but an interesting gotcha. I swear I checked our | locale settings 5 times when it went wrong until I clocked on to | it being the libc version.) | | There were probably more learnings than that, but that's what | comes to mind. | | (EDIT: added final bullet point). (EDIT 2: added one more bullet) | sciguymcq wrote: | I originally wanted to use DMS but our DB failed the pre- | migration assessment so that forced us to Logical Replication. | | We also worked with Jesse. You are right he is amazingly | helpful. Unfortunately we got some pretty questionable help | from others before he was assigned to our support ticket but | once that happened it was infinitely improved. | mijoharas wrote: | Ahhhh... well you might've dodged a bullet to be honest! If I | were to do it all again, there's a good chance I might stick | with the standard tools rather than DMS. | | Thanks for the in depth article by the way, very well | written! | | Keep an eye on your AuroraStorageIO costs going forward, as | we said, we halved our costs and increased performance by | switching back to RDS and increasing the PIOPs available. I'm | sure that'll be workload sensitive though, so YMMV. We did a | bunch of benchmarking, and aurora seemed nice apart from | this. | | We must've lucked out, as we got Jesse at the start. | yowlingcat wrote: | Fantastic comment, thank you so much. I've been following | Aurora for a while but a little curious (and skeptical) about | sharp edges vs vanilla PG. If I may ask you a follow-on | question, who do you think Aurora would be the right fit for? | mijoharas wrote: | Hmmmm... That's a good question. | | The performance was great, I can't complain about it, but the | unpredictable (and high) price was the issue. | | In terms of aurora I think it probably can scale better to | much larger workloads. If we put the time in and dug in and | optimised any queries that could touch a large number of | blocks (we do well, but I'm sure there are queries that have | some seq scans somewhere in our codebase) then the price | wouldn't have been an issue, and we'd happily have stayed | with it. | | In addition, we have a single large master and one replica. | with my understanding of aurora's architecture, and how the | storage is separated from the "postgres" instance, it could | possibly do a lot better with drastically different | architectures with a lot of different smaller "postgres" | instances. We didn't have time to investigate many other | architectures and in the end moving to RDS felt safest | (didn't want to spend too much time when we could be building | core features) | | Hope that answers your question! Some of the above is more an | educated guess with a smattering of experience so please | treat it as such. | mijoharas wrote: | On the other hand, vanilla postgres is very powerful, and | you shouldn't underestimate how much you can win by | cranking up the PIOPs for an RDS postgres instance. | | Remember, always benchmark representative workloads. | ekump wrote: | This is a great write-up! I wrote a far less thorough blog post | on the same topic a few months ago: | https://vericred.github.io/2020/08/10/migrating-to-aurora.ht.... | | Overall, we're happy with our decision to migrate to Aurora. We | feel it offers better performance, observability and scalability | compared to heroku postgres. To echo some other comments: | | 1. Jesse at Heroku is great. I had never setup my own postgres | instance before, and his guidance helped me a lot. | | 2. We used DMS, and it was ok, not great. Dealing with LOBs was a | bit of a pain. In order to keep the migration time to a minimum | we setup multiple DMS tasks and grouped tables into those tasks | based on their max LOB size. Most of our tables didn't have any | LOB columns and they migrated quickly. | | 3. The ability to fast clone a database is nice. We can create a | copy of our production database in about five minutes for use in | non-prod environments. This would take a couple of hours in | heroku. | | 4. The Aurora IO costs are ridiculous. Are there any good tools | to see what is exactly causing so much I/O? | aeyes wrote: | > 4. The Aurora IO costs are ridiculous. Are there any good | tools to see what is exactly causing so much I/O? | | I don't use Aurora so I don't know what is available but | pg_stat_statements does have I/O statistics so you can review | it manually or use a tool like pgAnalyze (not sure if it | supports Aurora) to see what queries are I/O intensive. | mijoharas wrote: | Hey, regarding 4. we came across this but didn't find anything | from the article particularly helpful for us: | | https://aws.amazon.com/blogs/database/reducing-aurora-postgr... | | (we bounced to normal RDS postgres and reduced our costs a lot | while keeping performance similar, check out my other comments | on this thread for details). ___________________________________________________________________ (page generated 2020-11-12 23:02 UTC)