As awesome as those AMIs from AWS Marketplace are, it’s often not possible to use most of them AS-IS at various organisations due to policies that enforce:
- Mandated operating system with specific distro and version
- Blacklisted operating system packages for security reasons
- Whitelisted versions of various tech stacks, tools, libraries, etc
I’ve never found a marketplace AMI that fulfils all of the above requirements without some further provisioning, which can sometimes defeat the purpose of using an AMI in the first place. Another challenge with marketplace AMIs is the reliance on the providers to actively update their AMIs, in a manner that suits your schedule.
So this is where such organisations turn to custom AMIs to manage the above requirements, and one pattern that I’ve seen emerged in the projects I’ve been on is a dependency tree of custom AMIs. Here’s an example:
- Start by picking a public AMI that contains the mandated operating system
- Create a base custom AMI which uses the above AMI as the source
- Provision the base custom AMI with common set ups that you want to apply across all custom AMIs within the dependency tree - this is a good place to remove blacklisted packages
- Using that base AMI as the source, create other custom AMIs - there are many ways to structure the other AMIs, the example from the above diagram involves creating tech stack AMIs as the source of the application/project AMIs, and tool AMIs which are not based on any specific tech stack
The main benefit of this dependency tree is that it allows you to update an AMI from a node point, down to its descendant nodes.
If you want to make a common change across all AMIs, like removing another blacklisted package, you can update the base AMI which should then build all descendant AMIs. If you want to upgrade the version of JDK, PHP, node.js, etc, then you can update the corresponding tech stack AMI which should then build only its descendant AMIs.
A common criticism to such dependency tree is on the duration it can take to build all nodes in the tree, with the overhead surrounding each node’s provisioning.
It can be easily solved by optimising the tree implementation. Make sure that you have enough parallel builds to allow multiple traversal of the tree branches, and balance the need to provision inside an AMI vs during cloud-init user data scripts execution.
I won’t go into too much details about the implementation on this post besides pointing out two important points:
- First, set up a build pipeline to represent the dependency tree, where building one node should trigger its descendant nodes. Most popular tools these days provide such pipeline implementation like Jenkins Workflow, Jenkins Build Pipeline, Bamboo Pipeline, and TeamCity Pipeline.
- Next is to set up parent-child dependency edges between two nodes in the tree, where the created AMI ID of a parent node should become the source AMI ID of the child node. And to implement this, I would highly recommend Packer with JSON Updater Post-Processor (shameless plug).
This custom AWS AMIs dependency tree pattern has worked really well in my past projects, even prior to Packer’s popularity (good ol’ shell scripts, those were the days). It’s easy to implement and simply works.