Cloud Security Automation: Code vs. CloudFormation or Terraform Templates

Right now I’m working on updating many of my little command line tools into releasable versions. It’s a mixed bag of things I’ve written for demos, training classes, clients, or Trinity (our mothballed product). A few of these are security automation tools I’m working on for clients to give them a skeleton framework to build out their own automation programs. Basically, what we created Trinity for, that isn’t releasable.

One question that comes up a lot when I’m handing this off is why write custom Ruby/Python/whatever code instead of using CloudFormation or Terraform scripts. If you are responsible for cloud automation at all this is a super important question to ask yourself.

The correct answer is there isn’t one single answer. It depends as much on your experience and preferences as anything else. Each option can handle much of the job, at least for configuration settings and implementing a known-good state. Here are my personal thoughts from the security pro perspective.

CloudFormation and Terraform are extremely good for creating known good states and immutable infrastructure and, in some cases, updating and restoring to those states. I use CloudFormation a lot and am starting to also leverage Terraform more (because it is cross-cloud capable). They both do a great job of handling a lot of the heavy lifting and configuring pieces in the proper order (managing dependencies) which can be tough if you script programmatically. Both have a few limits:

They don’t always support all the cloud provider features you need, which forces you to bounce outside of them.
They can be difficult to write and manage at scale, which is why many organizations that make heavy use of them use other languages to actually create the scripts. This makes it easier to update specific pieces without editing the entire file and introducing typos or other errors.
They can push updates to stacks, but if you made any manual changes I’ve found these frequently break. Thus they are better for locked-down production environments that are totally immutable and not for dev/test or manually altered setups.
They aren’t meant for other kinds of automation, like assessing or modifying in-use resources. For example, you can’t use them for incident response or to check specific security controls.

I’m not trying to be negative here – they are awesome awesome tools, which are totally essential to cloud and DevOps. But there are times you want to attack the problem in a different way.

Let me give you a specific use case. I’m currently writing a “new account provisioning” tool for a client. Basically, when a team at the client starts up a new Amazon account, this shovels in all the required security controls. IAM, monitoring, etc. Nearly all of it could be done with CloudFormation or Terraform but I’m instead writing it as a Ruby app. Here’s why:

I’m using Ruby to abstract complexity from the security team and make security easy. For example, to create new Identity and Access Management policies, users, and roles, the team can point the tool towards a library of files and the tool iterates through and builds them in the right order. The security team only needs to focus on that library of policies and not the other code to build things out. This, for them, will be easier than adding it to a large provisioning template. I could take that same library and actually build a CloudFormation template dynamically the same way, but…
… I can also use the same code base to fix existing accounts or (eventually) assess and modify an account that’s been changed in the future. For example, I can (and will) be able to asses an account, and if the policies don’t match, enable the user to repair it with flexibility and precision. Again, this can be done without the security pro needing to understand a lot of the underlying complexity.

Those are the two key reasons I sometimes drop from templates to code. I can make things simpler and also use the same ‘base’ for more complex scenarios that the infrastructure as code tools aren’t meant to address, such as ‘fixing’ existing setups and allowing more granular decisions on what to configure or overwrite. Plus, I’m not limited to waiting for the templates to support new cloud provider features; I can add capabilities any time there is an API, and with modern cloud providers, it there’s a feature it has an API.

In practice you can mix and match these approaches. I have my biases, and maybe some of it is just that I like to learn the APIs and features directly. I do find that having all these code pieces gives me a lot more options for various use cases, including using them to actually generate the templates when I need them and they might be the better choice. For example, one of the features of my framework is installing a library of approved CloudFormation templates into a new account to create pre-approved architecture stacks for common needs.

It all plays together. Pick what makes sense for you, and hopefully this will give you a bit of insight into how I make the decision.