How to Avoid Cloud Vendor Lock-In

Organizations are increasingly moving from static to dynamic cloud infrastructures. In this new dynamic world, security, operations, networking, and development all face different challenges related to their specific business requirements. But at their root, these challenges can be addressed by meeting a few shared core requirements, which we’ll cover in this blog post.

One of the biggest challenges when moving to a dynamic cloud environment is to avoid choosing vendors that will lock you into using their products or inhibit your ability to move towards a modern, collaborative, flexible, and secure multi-cloud environment. While it’s probably not possible to avoid cloud vendor lock-in risks altogether, we’ll try to offer recommendations here that can help reduce those risks.

Core Requirements

Codification

We strongly recommend that you choose vendors that enable codification of infrastructure so that your routines, processes, and configurations are captured in code. This is essential to enabling rapid and error-free infrastructure deployments as you scale up your cloud environments.

Immutability

Another critical requirement is immutability, which ensures that you can delete and restack your environment without fear of data loss.

We suggest looking for Infrastructure as Code (IaC) solutions that leverage the concept of immutable infrastructure to ensure that once an infrastructure image has been provisioned in your cloud environment, it cannot be modified. When infrastructure changes are needed, a new infrastructure image will be created and used to fully replace the existing image. Automated tools enable new images to be spun up and deployed quickly, making this approach feasible.

Immutability eliminates configuration drift, ensures consistency between development, test, staging, and production environments, and makes it easier to track infrastructure versions and roll back to previous versions if needed.

Declarative Code

The final core requirement is to use solutions that employ declarative code. This means that you can specify the desired state in a way that is generally human-readable. Declarative code is a great way to future-proof your environment and reduce your dependence on expert coders to programmatically define the desired end state of your cloud environment.

As an example, a couple of years ago, when we were doing PCI security audits, we were required to stipulate an industry-standard sampling methodology. In this case, our client had roughly 100 systems. Some of these sampling methodologies required that you have common sets of systems and sample 10% of them. For this client, we had two sets of 10 systems that we needed to sample against. Sampling 20 systems was a lot but was manageable.

But if we had been asked to perform the same effort today for this client, which now has thousands of cloud systems, is engaged in load balancing, and routinely spins up containers to meet demand, the sampling project would have been orders of magnitude more time consuming without declarative code and would have required an army of coders.

Bottom line—using declarative code is a must for today’s complex and rapidly expanding cloud environments.

Vendor Lock-In Risks

Now that we’ve covered core requirements to look for when choosing cloud vendors let’s talk about some of the specific steps you can take to minimize vendor lock-in risks.

Roles & Responsibilities

In the Infrastructure as a Service (IaaS) world, AWS, Azure, and Google Cloud are the gold standards. They all do a good job defining roles and responsibilities that they will perform and those expected to be performed by your organization.

For example, AWS has a comprehensive shared responsibility model, which they break down into two high-level categories:

  • Of the Cloud. This category describes the roles and responsibilities that AWS will perform and includes functions such as physical infrastructure and hypervisor to create and run virtual machines. Many of the cloud security features are turned on by default.
  • In the Cloud. This category describes the customer’s roles and responsibilities, such as hardening your security group to only permit traffic that is necessary for each user’s workload. If you have a PCI workload and use virtual machines, you will be expected to have host intrusion detection systems for your Elastic Compute Clouds (EC2s).

Before you pick any technology component or solution for your cloud environment, be sure that it can fulfill the “In the In the Cloud” roles and responsibilities required by your Cloud Service Provider (CSP).

Many times, when you ask a cloud technology or solution provider for a roles and responsibilities matrix, they will either not have one or will have one that is limited to what is included in a SOC2 audit. These should be red flags for you. If you do select one of these vendors, be sure to perform significant due diligence to ensure they can meet the In the Cloud roles and responsibilities required by your CSP. If not, it’s probably best to consider other providers.

Virtualization

During the initial phase of cloud migration, we find that organizations tend to virtualize technology by “lifting and shifting” existing legacy applications to the cloud. While this approach can be a quick way to move workload to the cloud, the resulting cloud applications usually lack much of the fidelity that native cloud applications provide.

Lift and shift applications often have functionality that overlaps with functions provided by your CSP’s “Of the Cloud” functions, which increases complexity and can cause problems determining which application should perform these overlapping functions.

For example, in the AWS environment, the roles and responsibilities model calls for AWS to provide stateful inspection of PCI security groups. If you virtualize a firewall in this environment, stateful inspection may be performed by the “lift and shift” application as well as the AWS “Of the Cloud” functions, which increases complexity. Now, instead of just looking at the inbound and outbound security group rules, you need to go into the running config, make sure the virtualized hardware is OK, and ensure that the traffic in those security groups is properly restricted based on job function. This expands scope, which expands compliance efforts.

While it can be tempting to accelerate your cloud migration by using virtualized applications, we recommend selecting cloud-native applications whenever possible to avoid the complexity and cost associated with “lift and shift” applications.

Networking

Over the last couple of years working with clients, we’ve seen first-hand how cloud operations are evolving. Many organizations have gone from a single account, with many Virtual Private Clouds (VPCs), to segmenting their environments at the account level. This account level segmentation is generally referred to as a “landing zone” approach. The greatest form of segmentation in the cloud is now at the account level. Many of our clients have grown from one account to 2,000 accounts in a year.

Now that you are scaling exponentially, how do you monitor for compliance? Have you adapted your model to include service discovery and routing? For example, in PCI, generating an accurate inventory in the cloud has become a hot topic, especially with organizations that treat their systems as disposable and cycle them weekly or daily.

As clients scale up their cloud environments in these ways, it becomes increasingly apparent that using declarative code for interrogating and making changes to their environment is critical.

We know that in multi-cloud environments, we want to consume services. Maybe you want to use a VM in Azure for compute, and S3 for storage in AWS. Setting up the available private connections between environments can be costly. You might be thinking about moving to a multi-cloud environment—which is being talked about a lot lately—and ask, why would anybody do this? It’s a valid question.

This is where we recommend looking at services and solutions that provide service mesh capability. Generally, this means leveraging a common foundation and technology that allows for service-to-service connection, authorization, and encryption using mutual TLS connections. This starts to solve the questions of ensuring there is parity between your environments and can lower the costs of moving to a multi-cloud environment. It also helps to minimize vendor lock-in risks.

Secret Management

Native key management solutions such as Azure Vault and AWS KMS are great. They solve the issue of hardcoding sensitive variables and configs that are stored in your code repositories. With these advanced tools, you can just store parameters that are used to make calls to the native key management solutions when secret management functions are needed.

Although this technology has been a big step forward, some core risks apply to these native solutions. For example, if you currently use AWS and your organization wants to diversify risks by moving to a multi-cloud environment by adding Azure, figuring out how to manage secrets and keys across platforms can be a challenge. In this situation, you will still want to authenticate the requestor’s identity against a trusted source before granting access to systems resources. Ideally, you’ll perform this in a dynamic way that can provision against any identity model that you’d like to use. Selecting solutions that provide this level of flexibility is important for avoiding vendor lock-in.

Webinar

For a deeper dive on this topic, check out our webinar on Avoiding Vendor Lock-In.

Additional Resources

Here are some additional resources on related topics.

We Can Help

If you have questions about avoiding vendor lock-in, just give us a call at (833) 292-1609 or email us at sales@tevora.com.

About the Author

Christopher Callas is the Senior Manger of Cloud Security at Tevora.

Speak to a Cyber Security Expert Today

  • This field is for validation purposes and should be left unchanged.