IaC framework selection guideline: Practical recommendations
Aug 07, 2024 • 10 min read
This is the second part of the two-part selection guideline for Infrastructure as Code (IaC) framework selection. The first part provides the recommended approach to IaC framework selection. The second part reviews the actual IaC frameworks and makes specific recommendations for relevant use cases.
The field of cloud configuration management automation is evolving. Due to its dynamic nature, this part of the guideline may become outdated quickly. Therefore, if you are reading it long after the publication date, it is important to take these recommendations with caution and conduct your own analysis. You can refer to the first part of the guideline for additional information.
IaC framework options
This list of IaC frameworks in the table below isn’t exhaustive. However, it includes all the recommended options for the use cases in the scope of the guideline.
AWS CF | AWS CDK | ARM | Bicep | Terraform | OpenTofu | Pulumi | KRM | |
Target platforms | AWS | AWS | Azure | Azure | Any | Any | Multiple | Kubernetes+ |
Available as | SaaS | Software | SaaS | Software | Software+SaaS | Software | Software+SaaS | Software+API |
Availability terms | Propr. | OSS | Propr. | OSS | Propr. | OSS | OSS/open core | OSS |
Owner | Platform | Platform | Platform | Platform | Medium company | Foundation | Small company | Foundation |
Vendors, support | Platform | Platform | Platform | Platform | Single | Many | Single | Many |
Community | Large | Large | Large | Growing | Large | Growing | Medium | Large |
Risk: Availability | Small | Small | Small | Medium | Small | Medium | Medium | Small |
Risk: Cost | Small | Small | Small | Small | Medium | Small | Medium | Small |
Risk: Legal | Small | Small | Small | Small | Medium | Small | Small | Small |
Risk: Vendor lock | N/A | N/A | N/A | N/A | High | Small | High | Small |
The concerns outlined in the left column of the above table are discussed in the first part of the guideline. The following sections explore the listed IaC frameworks.
Proprietary frameworks from major public clouds
Most of the major public cloud providers offer their own proprietary IaC frameworks, which are designed to work seamlessly with their respective cloud services but may have limitations when it comes to multi-cloud or hybrid cloud scenarios.
AWS: CloudFormation and CDK
AWS CloudFormation (AWS CF in the table above) is a very well-supported low-level desired state configuration service for AWS resources. Considering the general Amazon policy regarding its cloud products and its attention to CloudFormation in particular, CloudFormation is a safe, low-risk choice for AWS-only systems.
CloudFormation offers limited functionality for managing resources from external SaaS providers. While the CloudFormation Registry includes some SaaS platforms, the variety is currently small and none of the other major clouds are supported. Additionally, custom resource providers are an option, but they require significant effort to develop and maintain custom software gateways to third-party resource management APIs. This complexity makes custom resources a less viable solution for most scenarios.
The Cloud Development Kit (AWS CDK) is a higher-level IaC framework on top of CloudFormation. Like CloudFormation, CDK has robust support from Amazon. Instead of introducing a new infrastructure description language, CDK leverages software libraries for various general-purpose programming languages. While generally a safe choice for AWS-based systems, CDK’s specific language support can vary.
Due to its reliance on general-purpose imperative programming languages, CDK may not be ideal for automating infrastructure automation itself (meta-automation). However, it’s convenient for developing cloud applications that are native to the AWS environment.
Azure: ARM and Bicep
Azure Resource Manager (ARM) is the Microsoft proprietary low-level desired state configuration service for Azure resources, similar in concept to AWS CloudFormation. However, it doesn’t have the same level of coverage and robustness as CloudFormation on AWS. Still, it’s the best-supported low-level Azure resource configuration option. Notably, the primary Terraform provider for Azure uses ARM internally.
Azure Bicep is a relatively new, higher-level declarative IaC framework developed by Microsoft on top of ARM. As of 2024, Bicep is still rough around the edges. Currently, it doesn’t offer a clear benefit over existing cloud configuration management tools for Azure. However, with active development and Microsoft’s commitment, Bicep may become the preferred choice for IaC deployments on Azure in the mid-term.
Google Cloud
While Google Cloud offers its own Deployment Manager service, this service suffers from limited support for many Google Cloud services and a lack of maintenance (as of early 2024, the last update occurred in 2020). Historically, Google Cloud relied on HashiCorp Terraform as the preferred IaC framework. So, unlike the other major public cloud providers, Google Cloud doesn’t have viable proprietary IaC tooling.
Kubernetes resource model
The Kubernetes API object model, also known as KRM (Kubernetes Resource Model), is designed to represent arbitrary cloud resources. The Kubernetes API server’s extensibility through Custom Resource Definitions (CRDs) allows it to serve as a platform for representing not only native Kubernetes objects, but also generic cloud resources.
Each major cloud provider offers a Kubernetes API extension for its infrastructure: AWS Controllers for Kubernetes, Azure Service Operator for Kubernetes, GCP Config Connector for Kubernetes. Additionally, Crossplane, a CNCF project, provides a generic Kubernetes API extension that supports major public clouds and other resources managed through APIs. These extensions are deployed as regular Kubernetes applications. This approach enables managing cloud and SaaS resources directly through the Kubernetes API.
The Kubernetes ecosystem offers various tools to manage Kubernetes resources. With cloud provider extensions integrating with the Kubernetes API, these tools can also manage generic cloud resources. This means that if a system already has basic Kubernetes infrastructure, it can be extended to manage application-facing cloud infrastructure in general. For scenarios where the focus is solely on application infrastructure (refer to the Use cases section), the Kubernetes resource model is a reasonable and secure option for IaC tooling.
Terraform and clones
Since its release 10 years ago, HashiCorp Terraform as a cloud infrastructure management approach and configuration language became a de-facto standard. It shaped the landscape of cloud-independent IaC tools.
Terraform
As of 2024, HashiCorp Terraform remains the most popular cloud configuration management framework outside of AWS. It has a vast ecosystem of first-party and third-party extensions, supporting a wide range of API-controlled SaaS resources. Notably, all three major cloud providers support Terraform, either as a secondary or even primary IaC framework option.
However, Terraform’s transition to a proprietary product in 2023 has diminished its third-party support options. Users now face potential vendor lock-in to HashiCorp. Unlike major cloud providers with diversified revenue streams, HashiCorp might prioritize increasing user dependence on billable resources through pricing changes or product updates that increase reliance on paid features.
OpenTofu
OpenTofu is a Cloud Native Computing Foundation (CNCF)-governed, open-source fork of HashiCorp Terraform offering high backward compatibility with it. While it lacks the SaaS features of HashiCorp Terraform Cloud, other companies provide similar managed services. OpenTofu benefits from Terraform’s extensive ecosystem of cloud and SaaS resource extensions. Despite its recent introduction, the project exhibits active development backed by engineering resources comparable in size to HashiCorp’s Terraform team.
CNCF governance protects users and third-party vendors from potential licensing or terms of use changes, mitigating some long-term risk. However, it doesn’t guarantee OpenTofu’s indefinite survival. While many commercial companies rely on OpenTofu and contribute to its development, this collaborative environment fosters confidence in its sustainability.
Compared to Terraform, OpenTofu presents adoption risks with a different structure but potentially comparable impact. However, the OpenTofu team’s commitment to Terraform compatibility makes migration to Terraform a viable risk mitigation strategy. OpenTofu promises a lower cost of ownership for both self-hosted solutions (due to vendor neutrality and fewer dependencies), and managed services (due to potential competition in the managed service space).
Other cloud IaC frameworks
The dominance of AWS CloudFormation and HashiCorp Terraform has limited the market share of most other cloud IaC frameworks. For the purposes of this document, only Pulumi warrants specific consideration due to its relatively widespread adoption.
Pulumi
Pulumi, like AWS CDK, is an IaC framework for general-purpose programming languages like TypeScript or Python. It relies on its own “providers” to interact with specific cloud services. Pulumi doesn’t directly leverage the existing Terraform providers, nor does it use cloud-native, low-level IaC services such as AWS CloudFormation. This limits Pulumi applications as there are very few 3rd party Pulumi providers available.
However, Pulumi’s direct interaction with cloud APIs, bypassing intermediate representations like CloudFormation, simplifies integration with native cloud applications. Pulumi allows application code to control its interaction with the cloud infrastructure. In this sense, Pulumi resembles the Serverless framework for Lambda functions but caters to a broader spectrum of cloud-based application designs.
Pulumi and its ecosystem are products of a single, small private company of the same name. It follows an open-core model, where the IaC framework itself is open-source, but essential features reside in proprietary cloud services. Third-party involvement, in both the Pulumi ecosystem and support, is minimal. Due to Pulumi’s originality as a framework, a Pulumi-based IaC solution can’t be easily migrated to other IaC platforms.
In summary, adopting Pulumi necessitates mid-term preparation for either in-house maintenance of the Pulumi framework or migration of the IaC solution to another toolset. While this approach might be risky for infrastructure management use cases with longer lifespans, Pulumi’s tight integration capabilities could be suitable for shorter-lived application development scenarios.
Decision making
When the decision-maker has autonomy in their choices, the selection of tools is primarily driven by the application domain. The following sections provide IaC framework recommendations for specific domains. These recommendations, however, should be assessed against the concerns and interests of the system owner and developer outlined in the first part of the guideline.
This table summarizes the recommendations for greenfield projects:
PlatformDomain | AWS | Azure | GCP | Multi-cloud | Kubernetes | PlatformUsers |
Application composition | CDK, SAM, etc. | Pulumi | Pulumi | Pulumi | KRM | Application developers |
CDK/CF | Bicep | OpenTofu | OpenTofu | KRM | System engineers | |
Deployment environments | CF (with CDK) | Bicep | OpenTofu | OpenTofu | KRM | |
Foundational infrastructure | CF | Terraform Cloud | Terraform Cloud | Terraform Cloud |
Where
- Platforms:
- AWS, Azure, GCP: The system is fully on the specific cloud platform.
- Kubernetes: Kubernetes (Kubernetes API) is, or can be used as the resource abstraction layer for cloud infrastructure management.
- Multi-cloud: The system consists of infrastructure components from multiple cloud platforms.
- IaC frameworks:
- CDK: AWS CDK.
- CF: AWS CloudFormation.
- CF (with CDK): The configuration management artifact is an AWS CloudFormation manifest. However, a system engineer or a build automation facility can use AWS CDK as a development tool for making CloudFormation manifests.
- SAM, etc.: AWS Serverless Application Model (SAM), a framework for Lambda-based applications; other software development frameworks for cloud applications, such as the Serverless framework.
- Pulumi: Pulumi framework.
- Bicep: Azure Bicep.
- OpenTofu: OpenTofu toolkit and auxiliary tools.
- Terraform Cloud: HashiCorp Terraform with HashiCorp Terraform Cloud SaaS.
- KRM: IaC tooling based on the Kubernetes resource model (KRM). In particular, command line automation tools like Helm or Kustomize, suitable for inclusion in “release pipeline” style automation, or GitOps style operations tools.
Established tooling
The first question the decision maker should answer is: Does the operational domain have established tooling for cloud infrastructure configuration management automation? This could be an established history of prior IaC implementations or an organizational policy mandating the use of a specific tool stack, supported by tooling procurement and support contracts covering new and projected use cases. For example, if the organization uses HashiCorp Terraform, has a Terraform Cloud subscription, and is willing to pay for production and non-production use of it for the new information system.
If the organization has established a tool stack for this domain:
- Avoid changing working tools.
- Only consider re-tooling if replacing the existing tools is an explicit objective of the project.
Domain: Application composition
Application composition involves assembling application components from cloud and custom building blocks.
- This use case primarily focuses on SDLC, with operational concerns being less relevant.
- Cloud application components are typically immutable. Best practices dictate that instances of these components be created, run, and then destroyed to be replaced with instances of a new version.
- Cloud application delivery processes usually take the form of “release pipelines”—imperative sequences of steps that control the life cycle of an application component instance.
- Application components tend to be numerous but have a limited lifespan. It’s reasonable to expect that an application component will be rewritten within five years. Once created, application components are often left alone with minimal maintenance.
The peculiarities of using an IaC framework for application composition make it challenging to manage its adoption and use cases effectively. If the toolkit poses legal or financial risks, those risks will also be uncontrollable. Therefore, the IaC framework should be designed to avoid such risks by being either:
- Open-source software managed by a respected vendor-neutral foundation like the CNCF, Apache Foundation, or Eclipse Foundation.
- A well-established platform-provided toolkit, assuming the platform does not directly benefit from the toolkit but rather from the increased platform usage.
Application development involves two types of developers:
- Application developers: They focus on creating user-facing application components. They leverage general-purpose software development tools and follow SDLC processes, such as build, test, and deployment automation. For them, the cloud serves as another application platform API utilized during runtime.
- System engineers: They create the system’s internal components. They are familiar with cloud operations methods and tools. This group may also be responsible for deployment environment management. Hence, there is a desire for tooling consolidation with that usage domain.
For greenfield implementations, the table lists recommended IaC framework options for these groups:
Option | AWS | Azure | GCP | Multi-platform | Kubernetes |
For application developers | CDK, SAM, etc. | Pulumi | Pulumi | Pulumi | KRM |
For system engineers | CDK/CF | Bicep | OpenTofu | OpenTofu | KRM |
A few notes regarding the “For application developers” option:
- This option takes more risk by leveraging niche tools. If it’s not acceptable, another option (“For system engineers”) should be chosen.
- Higher-level frameworks (such as AWS SAM or Kubernetes API) are preferred when available and feasible.
Domain: Deployment environment management
Deployment environment management refers to the runtime environment for application workloads deployed on top of the generic cloud infrastructure. It ranges from simple setups to complex application platforms. This domain operates at the same access level and within the scope of the same IT processes as the application workloads.
These aspects should be considered for IaC implementation for deployment environment management:
- This domain is evenly concerned with operations and SDLC.
- Managed deployment environments, while fungible, are often mutable. This means that, unlike application components, they are frequently modified in place.
- Deployment environments often contain persistent data.
- Deployment environments are less numerous than application components. But they still exist in significant numbers demanding lifecycle automation.
In the realm of deployment environment management tooling selection, it’s crucial to opt for tools that mitigate legal and financial uncertainties arising from the diversity of use cases and the scale of usage. Organic adoption can be challenging to control, therefore, it’s advisable to prioritize vendor-neutral, open-source software or well-established platform-specific tools. This approach mirrors the selection criteria for IaC frameworks used in application composition.
Because operation and in-place mutations are primary use cases, declarative desired state configuration frameworks are preferred. These frameworks naturally handle resource updates in place, state differences, and drift detection. IaC frameworks for general-purpose programming languages aren’t designed for these scenarios.
For greenfield implementations, the recommended IaC frameworks for this domain are:
Option | AWS | Azure | GCP | Multi-platform | Kubernetes |
For system engineers | CF (with CDK) | Bicep | OpenTofu | OpenTofu | KRM |
Domain: Foundational infrastructure
Before application teams of an organization can use cloud resources, the organization must establish its footprint in the cloud. This activity is akin to traditional IT operations, highly privileged, and typically carried out by a dedicated team.
- This domain primarily focuses on operations, policy compliance, and security.
- It involves bootstrapping the organization’s cloud infrastructure from the ground up, without any pre-existing foundation on the target cloud platform.
- It deals with persistent and valuable cloud entities that are updated in place, similar to configuration management.
- It handles a few higher-level resources (such as cloud accounts, projects, or subscriptions) that this team provides to other teams consuming cloud resources. Therefore, its scale of operations is small and limited.
- Often, the same cloud operations team manages provisioning across multiple cloud platforms in use at the organization.
Due to its limited size and lack of dedicated engineering resources for tool development or maintenance, the cloud infrastructure team requires a well-established and feature-rich cloud infrastructure management toolkit. The toolkit should be functional from the outset, without requiring a pre-existing infrastructure. Comprehensive commercial support is also a requirement. In this scenario, legal and cost-related concerns are less significant, as the team’s scope of use is limited and controlled within the organization, and the customers are confined to internal stakeholders.
Cloud infrastructure bootstrapping from the ground up favors cloud-provided services and mature SaaS offerings. Change management and compliance use cases need declarative desired state configuration frameworks that can control cloud resource configuration without destroying and re-creating them.
For greenfield implementations, the recommended IaC frameworks for this domain are:
Option | AWS | Azure | GCP | Multi-platform |
For system engineers | CF | Terraform Cloud | Terraform Cloud | Terraform Cloud |
Summary
In the continuation of this blog post series, we discussed the various options available when selecting an IaC framework. These options include choosing a framework for building cloud applications, deploying environment management, and foundational infrastructure management. By considering the target platforms and the specific use cases within each domain, engineers can determine the most suitable IaC framework for a specific scenario. As a reminder, in the first part of this series, we emphasized the importance of understanding the IaC application domain and the target cloud platform. These factors determine the selection process and help narrow down the choices. We recommend reading the first part of the series to gain insight into the decision-making process and key considerations for choosing an IaC framework. Together, these two parts provide a playbook for engineers involved in selecting IaC tools for new cloud-based projects.
Credits
Special thanks to Sergey Plastinkin, Alexander Danilov and Dmitry Mezhensky for their invaluable feedback and support.