[HN Gopher] Terraform vs. AWS CloudFormation
       ___________________________________________________________________
        
       Terraform vs. AWS CloudFormation
        
       Author : historynops
       Score  : 80 points
       Date   : 2021-10-06 20:25 UTC (2 hours ago)
        
 (HTM) web link (gswallow.medium.com)
 (TXT) w3m dump (gswallow.medium.com)
        
       | johnl1479 wrote:
       | I can appreciate the author's criticisms of the shortcomings of
       | Cloudformation, but this is really just a "Why you should use
       | Terraform" post.
        
       | mylons wrote:
       | "But CDK transpiles into CloudFormation templates. For that
       | reason alone I can't recommend it."
       | 
       | CDK is superior to terraform for a glaring reason: it's a first
       | class citizen in AWS' eyes and terraform is not.
        
       | thecopy wrote:
       | > With Terraform, your local executable makes rest calls to each
       | service's REST API for you, meaning no intermediary sits between
       | you and the service you're controlling. Want an RDS instance?
       | Terraform will make calls directly to the RDS API.
       | 
       | How is this different than CloudFormation making the same calls?
        
         | lykr0n wrote:
         | You give CloudFormation a list of instructions. It accepts it
         | and gives you an ID to watch for updates, then it goes off and
         | executes them.
         | 
         | Terraform executes a list of instructions. It executes them in
         | front of you while you wait.
         | 
         | Both are fine until you run into something like this:
         | 
         | I'm pushing a Elastic Container Service Task Definition change
         | via CDK. A CloudFormation change is submitted, and I wait for
         | it to finish. In the background, it's trying to do the update
         | but the update fails due to some misconfiguration with the new
         | container.
         | 
         | CloudFormation doesn't fail or return an error. It times out
         | after an hour and reverts the change. I have to know to dig
         | into the AWS console to find my failed tasks to view the error.
         | 
         | If I did this update via Terraform, I would get the error back
         | in my console quickly as Terraform is directly telling ECS to
         | make the change. With CDK, the CloudFormation changeset is
         | generated, it is submitted to CloudFormation, then the tool
         | polls the AWS API for progress updates. Sometimes you get
         | specific messages back, sometimes it fails and you need to go
         | in and see what it failed on.
        
       | kennu wrote:
       | That's right - use AWS CDK instead. You don't have to worry about
       | the low-level CloudFormation syntax and details. I switched a few
       | years ago and haven't looked back. CDK keeps getting better and
       | better, also handling things like asset deployments (Docker
       | images, S3 content, bundling), quick Lambda updates with
       | --hotswap, quick stack debugging with --no-rollback, etc.
        
         | fdgsdfogijq wrote:
         | I'm always surprised that more people arent aware of CDK. Its
         | an extremely powerful way to write software. Especially once
         | you get good at it. CFN pales in comparison, CDK to me feels
         | like the future of software development.
        
         | k__ wrote:
         | Pulumi is also nice for non-AWS related stuff.
        
         | nagyf wrote:
         | I agree, and have the same experience. CDK is so much easier,
         | much less verbose, and unit testable (at least to some degree).
         | 
         | Since resource importing is possible in CDK (not nice, but
         | possible) you can even start using it if you already have
         | resources that you do not want to recreate.
        
       | zenux wrote:
       | Fun fact: in the leak of the Twitch (Amazon) repositories of this
       | morning, I saw that the developers use Terraform !
        
       | cube2222 wrote:
       | You can use more than one tool.
       | 
       | CloudFormation is great because of its transactionality, so it
       | lends itself nicely to deploying multiple services which are
       | versioned together. You either succeed fully, or all services
       | will be rolled back.
       | 
       | This way you can deploy your whole infra with Terraform, and then
       | deploy to your i.e. ECS cluster using CloudFormation. Works great
       | in practice.
        
         | zapt02 wrote:
         | The rollback functionality of CF is a blessing. We use both CF
         | and Terraform at my company and i vividly recall multiple times
         | where my connection had cut out during "terraform apply" and
         | left the Terraform infrastructure in a half-finished state.
        
           | acdha wrote:
           | > The rollback functionality of CF is a blessing
           | 
           | When it works, which is a big caveat: we had far more cases
           | where it failed in a way which required manual remediation
           | and the gaps in validation meant that you'd be in a "apply /
           | error / rollback" loop requiring 20+ minutes before you could
           | try again. Terraform was always considerably faster but it
           | was especially the orders of magnitude improvement in retry
           | time which convinced most of us to switch.
           | 
           | The CloudFormation team has been working on this so it's
           | possible that experience has improved but the scar tissue
           | will take time to fade.
        
           | nickjj wrote:
           | Rollback doesn't always work with CF. I've noticed so many
           | times that it would mostly delete everything but not certain
           | things once in a while. Then you're left having to play
           | detective to manually figure out what you need to delete
           | while having to delete dependencies by hand in a specific
           | order.
           | 
           | I've spent hours just waiting for CF to fail deleting EKS or
           | RDS related resources then I end up getting billed for $30+ a
           | month sometimes because I forgot to manually delete a NAT
           | gateway.
        
           | vageli wrote:
           | > i vividly recall multiple times where my connection had cut
           | out during "terraform apply"
           | 
           | The issue could be at least partially resolved by using
           | automation (like atlantis for example) to apply your plans.
        
       | l0b0 wrote:
       | Unless things have changed in the meantime, the killer feature of
       | CloudFormation for me is that I don't have to keep track of the
       | state locally. Having to set up tracking of the infra state in
       | Terraform is a huge pain, since it should be stored independently
       | of both the infra code (to allow deploying anything but HEAD) and
       | the infra itself (duh). As long as Terraform doesn't query the
       | existing infra to work out what needs doing I don't want to go
       | back to it.
        
       | Pensacola wrote:
       | While the read was interesting and informative, something about
       | the tone made me search for a disclaimer/disclosure of interest.
       | Are you an "influencer?"
        
       | draklor40 wrote:
       | CloudFormation, with its HORRIBLE YAML templating (whatever
       | dsl/language) and arcane error messages is a horror story. I hate
       | it so much that I'd rather quit my job than debug why
       | CloudFormation decided for no reason to update my RDS instance
       | for a PR that was just a README file update.
        
         | emmanueloga_ wrote:
         | How about Pulumi? [1] Seems compatible with CF and supports
         | TypeScript as configuration language. Any fans?
         | 
         | 1: https://www.pulumi.com/docs/guides/adopting/from_aws/
        
       | orf wrote:
       | I spent a bit of time trying to deploy a lambda app with
       | Cloudformation. I wanted to use a relational database, so I
       | needed to handle migrations.
       | 
       | Ok, so apparently I need to write a custom Cloudformation
       | resource to execute a lambda function that will run the
       | migrations prior to deploying the new version of the lambda. Kind
       | of neat that you can do that.
       | 
       | Except I messed up the output of the custom resource lambda and
       | Cloudformation completely locked my deployment up for _3 hours_.
       | 3 hours. I couldn 't do _anything_ - rollback, update, whatever.
       | 
       | Cloudformation via a CDK is interesting, and I don't hate it, but
       | oh boy if it gets into a weird state it can completely kill your
       | iteration loop. And the docs say something along the lines of "if
       | it's stuck for too long contact support". No thanks.
        
         | zapt02 wrote:
         | CF does have a lot of quirks (especially stacks locking up for
         | various reasons, or rollbacks taking hours).
         | 
         | I find it easiest to run migrations when an application is
         | first starting up (with an appropriate transaction lock so
         | other instances won't cause the migration to run more than
         | once), this way you don't have to do a lot of devops magic for
         | it to work.
        
       | singlewind wrote:
       | To be honest, I don't agree this. Manage an infrastructure need
       | evidence and trace how this get created. I've been in the
       | situation a few times. Have been threw projects terraform code
       | doesn't match aws infrastructure. We don't know when an how the
       | drift happen. At least, cloudformation can have some feature to
       | detect the difference and help me trace back which commit
       | actually has been deployed. CDK make the job easier for
       | developers because it deliver some convenience and offer more
       | pattern to write code. I like both.
        
       | Hikikomori wrote:
       | Vanilla cloudformation is bad, but so is terraform (for my use
       | case anyway). We wrap our cloudformation with python, you need
       | something similar for terraform to make it less terrible (cdktf,
       | terragrunt, terrascript).
        
       | flurie wrote:
       | One of the most amazing things I saw at AWS reInvent was an
       | advanced talk on IaC that provided the code of a lambda function
       | inline in a CloudFormation template. I realize that this is just
       | one talk, and there are plenty of ways to structure things well,
       | but this practice is directly encouraged by the design of
       | CloudFormation[1]. AWS has attempted redefining the lambda
       | deployment story multiple times, there are multiple companies
       | whose primary offering is providing a better way to deploy code
       | to serverless offerings, but this still stands out to me as one
       | of the most terrible ways to do things, and I blame the design of
       | CloudFormation.
       | 
       | [1]
       | https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...
        
         | xyzzy123 wrote:
         | I'm going off track here but Pulumi have a totally mind-bending
         | feature where you can write the code of a lambda function not
         | only inline, but such that it captures the value of variables
         | from the surrounding infra code at the time the function is
         | serialized.
         | 
         | See: https://www.pulumi.com/docs/intro/concepts/function-
         | serializ...
         | 
         | Seeing the specific examples they use it for (AWS infra glue)
         | makes me think that there is room for infrastructure related
         | lambdas to be defined right in cfn or infra code, with very low
         | ceremony, even if you wouldn't want to deploy "applications"
         | like that.
        
       | nzoschke wrote:
       | Counterpoint... Use CloudFormation!
       | 
       | Managed services offer big benefits over software. With CF, new
       | stacks, change sets, updates, rollbacks and drift detection are
       | an API call away.
       | 
       | Managed service providers offer big benefits over software. With
       | CF and AWS support, help with problems are a support ticket away.
       | 
       | Using a single cloud provider has a big benefit over a multi-
       | cloud tooling. I only run workloads on AWS, so the CF syntax,
       | specs and docs unlocks endless first party features. A portable
       | Terraform + Kubernetes contraption is a lowest common denominator
       | approach.
       | 
       | Of course everything depends.
       | 
       | I've configured literally 1000s of systems with CloudFormation
       | with very few problems.
       | 
       | I have seen Terraform turn into a tire-fire of migrations from
       | state files to Terraform enterprise to Atlantis that took an
       | entire DevOps team to care for.
        
         | acdha wrote:
         | > Managed services offer big benefits over software. With CF,
         | new stacks, change sets, updates, rollbacks and drift detection
         | are an API call away. > > Managed service providers offer big
         | benefits over software. With CF and AWS support, help with
         | problems are a support ticket away.
         | 
         | The problem is when those help tickets get responses like "try
         | deleting everything by hand and see if it recreates without an
         | error next time". They've worked on CloudFormation over the
         | last year or but everyone I've known who's switched to tools
         | like Terraform did so after getting tired of unpredictable
         | deployment times or hitting the many cases where CloudFormation
         | gets itself into an irrecoverable state. I can count on no
         | fingers the number of development teams who used CF and didn't
         | ask for help recovering from an error state in CF which
         | required out-of-band remediation.
         | 
         | I believe they've also gotten better at tracking new AWS
         | features but there were multiple cases where using Terraform
         | got you the ability to use a feature 6+ months ahead of CF.
         | 
         | > A portable Terraform + Kubernetes contraption is a lowest
         | common denominator approach.
         | 
         | Terraform is much, much richer than CloudFormation so I'd
         | compare it to CDK (with the usual aesthetic debate over
         | declarative vs. procedural models) and it doesn't really make
         | sense to call it LCD in the same way that you might use that to
         | describe Kubernetes because it's not trying to build an
         | abstraction which covers up the underlying platform details.
         | Most of the Terraform I've written controls AWS but there's a
         | significant value to also being able to use the same tool to
         | control GCP, GitLab, Cloudflare, Docker, various enterprise
         | tools, etc. with full access to native functionality.
        
         | dolni wrote:
         | > I've configured literally 1000s of systems with
         | CloudFormation with very few problems.
         | 
         | This is a great way of saying "I've never used CloudFormation"
         | without stating it directly.
        
         | void_mint wrote:
         | > Managed services offer big benefits over software.
         | 
         | TF can be used as a managed service.
         | 
         | > Managed service providers offer big benefits over software.
         | With CF and AWS support, help with problems are a support
         | ticket away.
         | 
         | The same is true with TF, except 100000% better unless you're
         | paying boatloads of money for higher tiered support.
         | 
         | > I only run workloads on AWS, so the CF syntax, specs and docs
         | unlocks endless first party features.
         | 
         | CF syntax is an abomination. Lots of the bounds of CF are
         | dogmatic and unhelpful.
         | 
         | > I have seen Terraform turn into a tire-fire of migrations
         | from state files to Terraform enterprise to Atlantis that took
         | an entire DevOps team to care for.
         | 
         | CF generally takes an entire DevOps team to care for, for any
         | substantial project.
        
         | ldoughty wrote:
         | Agree. CF is not a magic bullet, but neither is ansible or
         | terraform.
         | 
         | We used ansible heavily with AWS for 2 years. Then we decided
         | to gut it out and do CF directly. Why? If we want to switch
         | clouds, it's not like the ansible or terraform modules are
         | transferable ... So might as well go the native supported
         | route.
         | 
         | I agree with the article, messages can be cryptic, but at the
         | end of the day, I have a CF stack that represents an entity. I
         | can blow away the stack, and if there's any failure or issue, I
         | can escalate my permissions and kill it again. Still a problem?
         | Then it's AWS's fault and a ticket away (though I've only had
         | to do this once in 5 years and > 150,000 CF stacks.
         | 
         | I also would argue, if a stack deletion stalls development, you
         | are probably using hard-coded stack names, which isn't wise.
         | Throw in a "random" value like a commit or pipeline identifier.
         | 
         | I've had far less issues with CF than terraform or ansible. I
         | have yet to see CF break backward compatibility, while I had a
         | nightmare day when I couldn't run any playbooks in ansible
         | because the module had a new required parameter on a minor or
         | patch version bump.l (which was when I called it quits on
         | ansible, I then relooked at terraform, and decided to go
         | native)
         | 
         | I will caveat that our use case for AWS involves LOTS of
         | creation and deletion, so I find it super helpful to manage my
         | infrastructure in "stacks" that are created and deleted as a
         | unit.. I dont need to worry about partial creations or
         | deletions.. like ever... It basically never fails redoing
         | known-working stuff... Only "first time" and usually because we
         | follow least-privilege heavily
        
           | HatchedLake721 wrote:
           | I'm confused. Isn't Ansible and CloudFormation what apple is
           | to an orange with completely different use cases and purpose?
           | 
           | One is a configuration management and deployment tool.
           | 
           | The other one is cloud resource provisioning service.
           | 
           | They're meant to work in tandem, not one to replace another.
        
             | mooreds wrote:
             | I think Ansible has extensions which allow for managing
             | infra such as AWS. See https://docs.ansible.com/ansible/lat
             | est/collections/amazon/a... for example.
        
       | booleanbetrayal wrote:
       | Yeah, importing existing resources into Cloudformation is a
       | nightmare in "Am I going to break everything? _Fingers Crossed_
       | ".
       | 
       | It is also very possible to get into very bad situations if your
       | settings drift and you attempt to reconcile those changes.
        
       | easton wrote:
       | Something funny (well, kind of sad) about CloudFormation I
       | noticed this summer was that if you deploy a CloudFormation stack
       | which updates a ECS service and deploys tasks which then fail
       | health checks, CloudFormation will do nothing about this and just
       | let ECS keep killing and restarting tasks for.. well, at least
       | several hours. You have to know to go into ECS and drain the
       | tasks manually and then initiate a rollback from CF to get your
       | service back into a good state. The bug reports about this I
       | found were going back years.
       | 
       | The upside is that I got really well acquainted with how ECS
       | worked.
        
         | fictionfuture wrote:
         | I had this same bug!! Cost us like $1000 before we fixed it;
        
         | tkahnoski wrote:
         | 100x this. Prior company committed to doing Infrastructure as
         | Code and CloudFormation worked well except for this hiccup. We
         | didn't even have that many services on ECS but we probably had
         | 1 ticket a week asking support to help us with a 'stuck' stack.
         | 
         | Our commitment to CloudFormation was doubled down on that we
         | could do containers, Lambda, and 95% of any other AWS
         | Services....
         | 
         | However, in hidsight using SAM and the ECS CLI probably would
         | have resulted in a more predictable CI/CD process as we weren't
         | fighting deploy semantics through CloudFormation abstraction.
        
       | cmaggiulli wrote:
       | Writing Terraform scripts for AWS is 70% of my job. I do have
       | some issues with the AWS provider in Terraform. Firstly, there
       | are bugs. I ran into a bug a few days ago where the ARN attribute
       | on a Lamba alias was resolving to the ARN of the Lambda, not it's
       | alias. I only figured it out because I found a GitHub Issue.
       | Additionally, Hashicorp is often playing catch-up with Amazon. A
       | few days ago AWS released a new instruction set architecture for
       | Lambdas that would save my org a lot of money. However after I
       | saw the announcement in AWS I see tons of different GitHub issues
       | created to add this functionality. So I start editing my files
       | based off the documentation only for that issue to be closed and
       | pointed to a new one with different syntax. So I start working
       | off the new syntax only for that issue to close and be pointed to
       | a different one
        
       | robohoe wrote:
       | That's right, don't use CloudFormation. Use CDK which will
       | generate and obfuscate CF for you and you won't have to worry
       | about it.
        
         | Arelius wrote:
         | I'm not sure I understand... Is obfuscating the CF a good
         | thing?
        
           | yjftsjthsd-h wrote:
           | I'm pretty sure that was sarcasm. I disagree with said
           | sarcasm, because CDK takes you one layer away from the actual
           | thing that gets run but gives you a much nicer thing to work
           | with so it can still be a good trade off; writing rust (or
           | whatever) "obfuscates" the underlying CPU instructions but it
           | still turns out to be a good idea.
        
       | robohoe wrote:
       | I will admit that troubleshooting permissions-related deployment
       | issues in StackSets are a super nightmare inducing events.
        
       ___________________________________________________________________
       (page generated 2021-10-06 23:00 UTC)