Amazon Web Services is a powerful and mature cloud platform that keeps growing rapidly. Quite often, new features are added in such pace that many developers and architects barely have a chance to experiment with all of them on real projects. However, as with any cloud platform, the god is in the details. If you want to make the most out of your cloud services, you must know how to use them wisely, and, more important, how NOT to use them.
Based on what I've seen in my experience of training, consulting, designing and implementing cloud solutions, I've compiled a list of common 'gotchas' that might help you avoid many pitfalls on your own projects.
Update: Shortly after writing this blog post, I was excited to discover
a very recent HighScalability article by
Chris Fregly, a former Netflix Streaming Platform Engineer, which describes similar interesting facts you might encounter with AWS. Make sure to check it out as well!
1. Assuming the identity of Availability Zones across different AWS accounts.
When you launch AWS resources in a specific AZ, you refer to this AZ by name, such as us-east-1a or us-east-1b. However, these names are purely virtual: when you create a new AWS account and use the us-east region (which has 5 different AZs at the time of this writing), AWS randomly selects 3 physical AZs in this region and assigns the letters a, b, and c to them. The primary purpose of such design decision is to balance the load across multiple AZs in the same region. Otherwise, the users would be more likely to select the first option, us-east-1a, causing an uneven load distribution across the AWS insfrastructure. As a result, us-east-1a in one AWS account is probably a completely different AZ than us-east-1a in another account.
2. Referring to Elastic Load Balancers (ELB) using IP addresses.
The golden rule of designing highly available and fault tolerant systems in the cloud is:
do not assume fixed location, health or availability of individual components. The same applies to ELBs: as they scale to accommodate the traffic of your application, AWS may need to relocate or replace them, causing the IP address to change. Therefore, when dealing with ELBs, always refer to them by their DNS names. If you use a custom public domain for your application, create a CNAME record pointing to the DNS of the ELB.
If you use Route 53, there is a special kind of A record available for you: the Alias record. In this case, you can create an A record for your domain that points to the name of the ELB instead of its IP address. In this case, AWS would perform a double DNS resolution behind the scenes and substitute the alias with the particular IP address of the ELB.
3. Performing load testing without pre-warming the ELB.
Behind the scenes, ELB is a load balancing software, which is most likely installed on the instances similar to EC2. And, as any other instance in your infrastructure, it takes time to scale it up and down to accommodate the changes to the traffic. Therefore, if you plan to perform load testing on your application, or if you expect sudden significant load spikes, you should contact AWS support to "pre-warm" your ELBs ahead of time.
4. Hard-coding the API keys inside AMIs or instances.
When you deploy computations on the EC2 instances, at some point they will inevitably need to use other AWS services (S3, SQS, DynamoDB, CloudWatch, you name it). In order to authenticate on these services, you'll need an IAM key pair: an Access Key ID and a Secret Key. Naturally, the problem is: how does an EC2 instance acquire these keys? An obvious (but a very suboptimal) solution would be to store the keys in a configuration file somewhere in the storage volume. It would work, but you'd have a lot of trouble with securing and rotating these keys.
So there's a better way: use the EC2 Metadata Service. Using this HTTP-based service, you can query many interesting details from within the instance: its public and private IP, name of its security group(s), region and availability zone, etc. The same applies to API keys: if you assign an IAM role to your EC2 instance, you can get the API credentials in your bootstrap script using a simple query:
$ curl http://169.254.169.254/iam/security-credentials/role-name
Note that
169.254.169.254 is the metadata service address that is the same for all EC2 instances.
5. Launching EC2 instances outside VPCs.
According to the release notes, new AWS accounts are now
VPC-enabled by default, which means that every EC2 instance should be associated with a particular VPC at launch time. While VPC is one of the more complicated AWS features, launching EC2 instances inside VPC provides numerous benefits, particularly:
- You can assign persistent static IP addresses that 'survive' stop and reboot operations. You can also assign multiple IPs and network interfaces to a single instance.
- You get a more fine-grained control over the security configuration. In VPC, the security groups involve both egress and ingress filtering, compared to classic security groups where you can only configure ingress filtering.
- You can run your instance on a single-tenant hardware.
- You can launch resources in private subnets, which is particularly useful for database servers and private ELBs.
So, if you're designing a new solution on an old AWS account, consider using VPC for all your instances. Probably the only case when you wouldn't want to do so is when you have a large legacy system built on top of EC2-Classic instances that would involve a lot of migration effort.
6. Performing port scanning or penetration testing without being authorized.
Doing such things is a violation of AWS Terms of Service. If you need to test your cloud applications for vulnerabilities, you should contact AWS support beforehand so that they would temporarily disable their intrusion detection systems for certain machines and services.
7. Using S3 ACL for managing access to S3 buckets.
Amazon S3 is one of the oldest AWS services. When it appeared on the market, there was no IAM yet. As a result, to manage access to S3 buckets, S3 implemented its own security mechanism, known as S3 ACLs. Nowadays, the preferred way of managing access to S3 is using IAM policies. Not only do they offer a more fine-grained control over access to S3 operations, but they also provide a unified way of managing access to other AWS services.
8. Involving your application servers for transferring data from browser to S3.
S3 supports direct browser-to-bucket uploads. All you need to do is generate a couple of hidden fields in the HTML to authenticate the client properly. If you define your form as
<form
enctype=“multipart/form-data">, the browser would even perform a multipart upload for large files.
Also, if you develop mobile applications that upload data to S3, you can use the AWS Security Token Service (STS) to generate temporary access keys for S3 and upload the data directly to buckets without proxying it through your servers.
9. Waiting for the EBS snapshot operation to complete before unfreezing the filesystem.
If you need to perform hot backups of your production environment, you must ensure that the backup captures the filesystem in a consistent state. To do this, you should freeze the filesystem (e.g.,
xfs_freeze) and then issue the snapshot creation command. However, you don't need to wait for the entire copying operation to finish before unfreezing the FS. Since the backup operation "captures a point in time", it only takes a matter of seconds for EBS services to capture the blocks that need to be persisted. Therefore, the right way would be to freeze the FS, issue the snapshot command, and then unfreeze the FS and expect the copy operation to be completed asynchronously. This helps you to ensure the minimum downtime of your environment.
10. Using On-Demand EC2 instances for every case.
An important aspect of designing solutions for the cloud is not only availability, scalability, security and fault-tolerance, but also cost-effectiveness. AWS provides you with a variety of cost-saving options under particular use cases. For example, if you expect to use a fixed amount of EC2 instances for a long period of time, consider using Reserved Instances. If you need a fleet of short-lived instances, outages of which can be tolerated, consider using Spot Instances (check out
the blog post on Spot Instances by our engineer, Taras Kushnir). Choosing the right type of instance can make a dramatic improvement to the TCO of your infrastructure and you get a much bigger bang for your buck.
About the author: Yuriy Guts is an AWS Certified Solutions Architect with several years of experience with architecting cloud solutions for various companies across the globe. He is responsible for the cloud solutions stream at ELEKS R&D and specializes in AWS and Windows Azure.