ELEKS Labs: Amazon

Showing posts with label Amazon. Show all posts

11/06/2013

AWS: 10 Things You're Probably Doing Wrong as an Architect

Amazon Web Services is a powerful and mature cloud platform that keeps growing rapidly. Quite often, new features are added in such pace that many developers and architects barely have a chance to experiment with all of them on real projects. However, as with any cloud platform, the god is in the details. If you want to make the most out of your cloud services, you must know how to use them wisely, and, more important, how NOT to use them.
Based on what I've seen in my experience of training, consulting, designing and implementing cloud solutions, I've compiled a list of common 'gotchas' that might help you avoid many pitfalls on your own projects.

Update: Shortly after writing this blog post, I was excited to discover a very recent HighScalability article by Chris Fregly, a former Netflix Streaming Platform Engineer, which describes similar interesting facts you might encounter with AWS. Make sure to check it out as well!

1. Assuming the identity of Availability Zones across different AWS accounts.

When you launch AWS resources in a specific AZ, you refer to this AZ by name, such as us-east-1a or us-east-1b. However, these names are purely virtual: when you create a new AWS account and use the us-east region (which has 5 different AZs at the time of this writing), AWS randomly selects 3 physical AZs in this region and assigns the letters a, b, and c to them. The primary purpose of such design decision is to balance the load across multiple AZs in the same region. Otherwise, the users would be more likely to select the first option, us-east-1a, causing an uneven load distribution across the AWS insfrastructure. As a result, us-east-1a in one AWS account is probably a completely different AZ than us-east-1a in another account.

Details: AWS: Regions and Availability Zones

2. Referring to Elastic Load Balancers (ELB) using IP addresses.

The golden rule of designing highly available and fault tolerant systems in the cloud is: do not assume fixed location, health or availability of individual components. The same applies to ELBs: as they scale to accommodate the traffic of your application, AWS may need to relocate or replace them, causing the IP address to change. Therefore, when dealing with ELBs, always refer to them by their DNS names. If you use a custom public domain for your application, create a CNAME record pointing to the DNS of the ELB.
If you use Route 53, there is a special kind of A record available for you: the Alias record. In this case, you can create an A record for your domain that points to the name of the ELB instead of its IP address. In this case, AWS would perform a double DNS resolution behind the scenes and substitute the alias with the particular IP address of the ELB.

Details: AWS: Elastic Load Balancing

3. Performing load testing without pre-warming the ELB.

Behind the scenes, ELB is a load balancing software, which is most likely installed on the instances similar to EC2. And, as any other instance in your infrastructure, it takes time to scale it up and down to accommodate the changes to the traffic. Therefore, if you plan to perform load testing on your application, or if you expect sudden significant load spikes, you should contact AWS support to "pre-warm" your ELBs ahead of time.

Update: Chris Fregly describes the DNS side of this phenomenon in his recent post on HighScalability.

4. Hard-coding the API keys inside AMIs or instances.

When you deploy computations on the EC2 instances, at some point they will inevitably need to use other AWS services (S3, SQS, DynamoDB, CloudWatch, you name it). In order to authenticate on these services, you'll need an IAM key pair: an Access Key ID and a Secret Key. Naturally, the problem is: how does an EC2 instance acquire these keys? An obvious (but a very suboptimal) solution would be to store the keys in a configuration file somewhere in the storage volume. It would work, but you'd have a lot of trouble with securing and rotating these keys.
So there's a better way: use the EC2 Metadata Service. Using this HTTP-based service, you can query many interesting details from within the instance: its public and private IP, name of its security group(s), region and availability zone, etc. The same applies to API keys: if you assign an IAM role to your EC2 instance, you can get the API credentials in your bootstrap script using a simple query:

$ curl http://169.254.169.254/iam/security-credentials/role-name

Note that 169.254.169.254 is the metadata service address that is the same for all EC2 instances.

Details: AWS: Instance Metadata and User Data

5. Launching EC2 instances outside VPCs.

According to the release notes, new AWS accounts are now VPC-enabled by default, which means that every EC2 instance should be associated with a particular VPC at launch time. While VPC is one of the more complicated AWS features, launching EC2 instances inside VPC provides numerous benefits, particularly:

You can assign persistent static IP addresses that 'survive' stop and reboot operations. You can also assign multiple IPs and network interfaces to a single instance.
You get a more fine-grained control over the security configuration. In VPC, the security groups involve both egress and ingress filtering, compared to classic security groups where you can only configure ingress filtering.
You can run your instance on a single-tenant hardware.
You can launch resources in private subnets, which is particularly useful for database servers and private ELBs.

So, if you're designing a new solution on an old AWS account, consider using VPC for all your instances. Probably the only case when you wouldn't want to do so is when you have a large legacy system built on top of EC2-Classic instances that would involve a lot of migration effort.

6. Performing port scanning or penetration testing without being authorized.

Doing such things is a violation of AWS Terms of Service. If you need to test your cloud applications for vulnerabilities, you should contact AWS support beforehand so that they would temporarily disable their intrusion detection systems for certain machines and services.

Details: AWS: Acceptable Use Policy

7. Using S3 ACL for managing access to S3 buckets.

Amazon S3 is one of the oldest AWS services. When it appeared on the market, there was no IAM yet. As a result, to manage access to S3 buckets, S3 implemented its own security mechanism, known as S3 ACLs. Nowadays, the preferred way of managing access to S3 is using IAM policies. Not only do they offer a more fine-grained control over access to S3 operations, but they also provide a unified way of managing access to other AWS services.

Details: AWS: Using IAM Policies for S3

8. Involving your application servers for transferring data from browser to S3.

S3 supports direct browser-to-bucket uploads. All you need to do is generate a couple of hidden fields in the HTML to authenticate the client properly. If you define your form as <form enctype=“multipart/form-data">, the browser would even perform a multipart upload for large files.
Also, if you develop mobile applications that upload data to S3, you can use the AWS Security Token Service (STS) to generate temporary access keys for S3 and upload the data directly to buckets without proxying it through your servers.

Details: AWS: Browser-Based Uploads Using POST

9. Waiting for the EBS snapshot operation to complete before unfreezing the filesystem.

If you need to perform hot backups of your production environment, you must ensure that the backup captures the filesystem in a consistent state. To do this, you should freeze the filesystem (e.g., xfs_freeze) and then issue the snapshot creation command. However, you don't need to wait for the entire copying operation to finish before unfreezing the FS. Since the backup operation "captures a point in time", it only takes a matter of seconds for EBS services to capture the blocks that need to be persisted. Therefore, the right way would be to freeze the FS, issue the snapshot command, and then unfreeze the FS and expect the copy operation to be completed asynchronously. This helps you to ensure the minimum downtime of your environment.

Details: AWS Whitepaper: Backup and Recovery Approaches Using AWS [PDF]

10. Using On-Demand EC2 instances for every case.

An important aspect of designing solutions for the cloud is not only availability, scalability, security and fault-tolerance, but also cost-effectiveness. AWS provides you with a variety of cost-saving options under particular use cases. For example, if you expect to use a fixed amount of EC2 instances for a long period of time, consider using Reserved Instances. If you need a fleet of short-lived instances, outages of which can be tolerated, consider using Spot Instances (check out the blog post on Spot Instances by our engineer, Taras Kushnir). Choosing the right type of instance can make a dramatic improvement to the TCO of your infrastructure and you get a much bigger bang for your buck.

Details: AWS Economics Center.

About the author: Yuriy Guts is an AWS Certified Solutions Architect with several years of experience with architecting cloud solutions for various companies across the globe. He is responsible for the cloud solutions stream at ELEKS R&D and specializes in AWS and Windows Azure.

11/04/2013

Getting more from the AWS: the spot instances (via SDK for .NET)

Intro

We have discussed basic principles of work with On-Demand instances of AWS web-services in the previous post. On-Demand instances are quite straightforward and have expected behaviour. But... what about something more stochastic and thus more interesting?

Amazon EC2 instance types

Amazon offers us several types of it's compute instances: on-demand, reserved and spot instances. One can read an article on the AWS website but to make long story short there are several types of instances:

On-Demand instances are fixed hourly rate mostly always available instances which are often used for applications which need some basic guarantees about start time of the instance
Reserved instances are instances, which we can (as it comes from their name) reserve for some period of time and they will be always available within this reserved time period
Spot instances are the most interesting kind of instances, because we act like auction player and set a price which we are willing to pay to run such an instance and if our bid beats other customers at the moment we are able to launch requested instances for some time period. If current spot price moves higher than our's, Amazon EC2 service shuts down our instance

More about Spot instances

Spot instances are kind of tricky instances which are not guaranteed to run when we request them. We have to place a bid and wait until it wins (if it will someday) in order to run a spot instance. To do this we can send spot instance request with our suggested price to Amazon EC2 service. Each request has a state which indicates if our bid won. Initially request has state "open". Our request’s state becomes "active" in case our bid won and we can see new running instances which are connected to our request with SpotInstanceRequestId property.

Describing spot requests

A DescribeSpotInstanceRequestsRequest is used to describe our spot requests state. We can query opened and active spot requests using filter property. For example, that is how one can query opened spot requests:

var describeSpotRequestsRequest = new DescribeSpotInstanceRequestsRequest();

describeSpotRequestsRequest.Filter.Add(new Filter(){Name="state", Value="open"});

var openedSpotInstanceRequests = ec2Client.DescribeInstances(describeSpotRequestsRequest);
var openedRequests =
openedSpotInstanceRequests.SpotInstanceRequest;

Running spot instances

To launch Spot instances we have to send RequestSpotInstances request and pass our price, maximum desirable spot instances number and usual launch parameters like image Id, instance type, key pair name, security groups etc. If our price wins AWS will try to launch as many instances as it can limited by our desired maximum number.

var ec2Client = new AmazonEC2Client(accessKey, secretKey);
var spotRequest = new RequestSpotInstancesRequest();

spotRequest.SpotPrice = "2.0";
spotRequest.InstanceCount = maxInstancesCount;
spotRequest.LaunchSpecification = new LaunchSpecification() {ImageID = imageID};

var spotResponse = ec2Client.RequestSpotInstances(spotRequest);
var spotResult = spotResponse.RequestSpotInstancesResult;
var placedSpotRequests = spotResult.SpotInstanceRequest.Select(rq => rq.SpotInstanceRequestId);
Spot request can throw several exceptions, one of which is AmazonEC2Exception with ErrorCode “MaxSpotInstanceCountExceeded”. This exception means we requested more instances than our account allows us to launch.

Requesting spot instances can fail even not for our fault, but because Amazon is not able to provide us with such number of EC2 instances in our region at the moment. A “InsufficientInstanceCapacity” error code is thrown with AmazonEC2Exception in this case.

Requesting current spot price

It is useful to know the current price of spot instances when placing your own bid. AWS SDK has DescribeSpotPriceHistoryRequest for this purposes.

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var spotPriceHistoryRequest = new DescribeSpotPriceHistoryRequest();
spotPriceHistoryRequest.StartTime = DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss.000Z");
spotPriceHistoryRequest.EndTime = spotPriceHistoryRequest.StartTime;

var spotPriceHistoryResponse = ec2Client.DescribeSpotPriceHistory(spotPriceHistoryRequest);
var priceHistory = spotPriceHistoryResponse.DescribeSpotPriceHistoryResult.SpotPriceHistory;

var currentPrice = priceHistory.First().SpotPrice;

A note: as it is said in the official documentation: “You can view the Spot Price history over a period from one to 90 days based on the instance type, the operating system you want the instance to run on, the time period, and the Availability Zone in which it will be launched.”

Tagging spot instances

There is a problem with tagging of spot instances: we cannot tag them just after start, because we don’t know when they will start. But there is a simple workaround: when we place spot requests, we can tag them and when actual spot instances would be launched, we can match each running instance spot request id and get corresponding tags.

Stopping spot instances

To stop Spot instances we have to work with SpotInstanceRequest but not with the running spot instance itself just like when launching spot instances. CancelSpotInstanceRequestsRequest does the job with SpotInstanceRequest Id’s to stop in parameters.

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var cancelSpotRequestsRequest = new CancelSpotInstanceRequestsRequest();
cancelSpotRequestsRequest.SpotInstanceRequestId.Add(spotRequestId);

var cancelSpotInstancesResponse = ec2Client.CancelSpotInstanceRequests(cancelSpotRequestsRequest);
var cancelledSpotRequests = cancelSpotInstancesResponse.CancelSpotInstanceRequestsResult.CancelledSpotInstanceRequest.Select(csr => csr.SpotInstanceRequestId);

After the spot request is cancelled AWS shuts down corresponding Spot instance.

Conclusion

Spot instances service from Amazon is interesting, tricky yet powerful. It allows using Amazon unused capacities at a lower price. You can purchase these instances by placing a bid, which you are willing to pay for those instances and for the time your bid is highest, you get them. Spot instances are useful in a number of situations when losing partial work is acceptable like cost-driven workloads or application testing. They help you to save your money when used wisely.

10/15/2013

The not so short introduction to EC2 instances in AWS SDK for .NET

Intro

Today is the era of cloud computing: never-ending computing resources available on demand. Amazon is one of the biggest players on the market of cloud computing. It provides different cloud services: Elastic Cloud, Elastic Block Store, Simple Email Service, Cloud Drive and others. Amazon Elastic Compute Cloud (EC2) allows people to launch virtual servers in the Amazon Web Services (AWS) cloud. It provides various types of virtual computing environments, storage and virtual isolated networks.

In this post we’ll learn how to work with EC2 using AWS SDK for .NET. You can download SDK from the official website. We’ll assume you’ve already created some sample project in your favorite C#/.NET development environment and referenced AWSSDK.dll in that project. We’ll mostly use Amazon.EC2, Amazon.EC2.Model and Amazon.Runtime namespaces. So let’s take a look on some common operations with AWS cloud: launching, tagging and stopping EC2 instances, describing environment and others.

Instantiation

Every operation with AWS is executed using AmazonEC2 interface. We can create Amazon EC2 client just using simple constructor:

AmazonEC2 amazonClient = new AmazonEC2Client(accessKey, secretKey);

where accessKey and secretKey are credentials, which we can get in our Amazon account after registration.

Another simple constructor uses accessKey and secretKey from application configuration file and we have to pass just an element of RegionEndPoint enumeration:

public AmazonEC2Client(RegionEndpoint region);

In case we need something more sophisticated, there are tons of constructor overloads. One of the most useful is a constructor with AWSCredentials and AmazonEC2Config parameters:

public AmazonEC2Client(AWSCredentials credentials, AmazonEC2Config config)

For example, we can pass BasicAWSCredentials instance with accessKey and secretKey for the first parameter and setup proxy settings with AmazonEC2Config class and ProxyPort/ProxyHost properties for the second parameter.

Validating credentials

We can create any simple request to AWS in order to validate credentials. Let’s choose a request that won’t lead to big data transfer between our application and AWS services, because credentials verification can be used quite frequently. For example we will use DescribeAvailabilityZones request but we can also use DescribeRegions request or any other. If request call throws an exception (AmazonEC2Exception), we can check it’s type from string property ErrorCode. If it’s equal to “AuthFailure”, than our credentials are invalid. Source code can look like this:

try
{
    var ec2Client = new AmazonEC2Client(accessKey, secretKey);
    var response = ec2Client.DescribeAvailabilityZones();

    return true;
}
catch (AmazonEC2Exception e)
{
    return false;
}

Describing environment

If our application lets user to configure any API credentials or we’re just showing some useful information about it’s service, we will have to describe our Amazon environment: Key Pair Names, Security Groups, Placement Groups, Availability Zones, VPC subnets etc.

General requesting template looks like this:

var ec2Client = new AmazonEC2Client(accessKey, secretKey);
var describeGroupsRequest = new DescribePlacementGroupsRequest();

try
{
    var response = ec2Client.DescribePlacementGroups(describeGroupsRequest);
    var placementGroupsResult = response.DescribePlacementGroupsResult;
    var placementGroupsInfo = placementGroupsResult.PlacementGroupInfo;
    placementGroupNames = placementGroups.Select(group => group.GroupName);
}
catch (Exception e)
{
    placementGroupNames = Enumerable.Empty<string>();
}

If we’re going to describe many environment items and copy-paste code above into each method, it will do some odd job: each time creating and destroying instance of AmazonEC2Client class. That’s why we can create it’s instance only one time and then execute all needed requests, accumulate results in some storage and then return it.

Describing instances

DescribeInstances request is used to track all instances that we own. We can request useful information about all or some certain instances. In order to choose these certain instances we can fill filter parameter of DescribeInstancesRequest to match specified instance IDs, key pair names, availability zone, instance type, current instance state and many others.

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var describeRequest = new DescribeInstancesRequest();

describeRequest.Filter.Add(new Filter(){Name="instance-state-name",
Value="running"});

var runningInstancesResponse = ec2Client.DescribeInstances(describeRequest);
var runningInstances =
runningInstancesResponse.DescribeInstancesResult.Reservation.SelectMany(reservation =>
reservation.RunningInstance).Select(instance => instance.InstanceId);

Variable runningInstances will contain IDs of all instances that are running as a result of such DescribeInstancesRequest with instance-state-name filter. We can notice interesting code convention of Amazon SDK when list of objects (like list of RunningInstance instances) is named in singular (named RunningInstance). It's because of real nature of these classes: all of them are just object model of XML response of AWS.

Running instances

Running instances is definitely one of the main purposes of using AWS EC2. It’s a bit different for On-Demand and Spot Instances.

To launch On-Demand instance, we have to create a RunInstances request and fill it properties wisely. First, we need set Image Id to launch and preferable number of instances from this image. This preferable number consists of minimum and maximum number of instances to launch. If Amazon capacity allows to launch maximum number of instances it does so. If no it tries it’s best to satisfy us. Our request fails if Amazon is not able to launch minimum value of instances we requested. Secondly, we can specify Key Pair name, instance type, security group and many other in our RunInstancesRequest.

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var runRequest = new RunInstancesRequest();
runRequest.ImageId = imageID;
runRequest.MinCount = minimumValue;
runRequest.MaxCount = maximumValue;
runRequest.InstanceType = “t1.micro”;
// some other configurations

var runInstancesResponse = ec2Client.RunInstances(runRequest);
var runInstancesResult = runInstancesResponse.RunInstancesResult;
var runningIDs = runInstancesResult.Reservation.RunningInstance.Select(i => i.InstanceId);

Response contains actually started running instances. Such request can throw AmazonEC2Exception with ErrorCode “InstanceLimitExceeded” if we are not allowed to run as many instances as were requested (limitations of current plan in Amazon account).

Requesting on-demand instances can fail even not for our fault, but because Amazon is not able to provide us with such number of EC2 instances in our region at the moment. A “InsufficientInstanceCapacity” error code is thrown with AmazonEC2Exception in this case.

Tagging instances

EC2 tag is just a key value pair which we can assign to each running instance (and spot requests). Tags are useful if we want to supply our instances with some additional information (specific for our application). Keys and values are simple strings, but no one prohibit to base64-encode anything we like in that tag.
We can tag only already running instances with CreateTagsRequest: we need to specify list of instance id’s and tags we want to assign to it and any some additional information. See sample code below:

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var createTagRequest = new CreateTagsRequest();
createTagRequest.ResourceId.Add(someInstanceId);
createTagRequest.Tag.Add(new Tag { Key = “NodeRole”, Value = “LogCollector” });

ec2Client.CreateTags(createTagRequest);

Stopping instances

Just like running AWS SDK provides us with API to stop running instances. And the procedure is also a bit different for On-Demand and Spot instances.

To stop On-Demand instance, we have to create TerminateInstances request and pass ID of an instance we want to terminate. Simple as that:

var ec2Client = new AmazonEC2Client(accessKey, secretKey);

var terminateRequest = new TerminateInstancesRequest();
terminateRequest.InstanceId.Add(instancesId);

var terminateResponse = ec2Client.TerminateInstances(terminateRequest);
var terminatingInstances = terminateResponse.TerminateInstancesResult.TerminatingInstance.Select(ti => ti.InstanceId);

Conclusion

AWS SDK allows us to manage EC2 instances easily and it’s consistent in terms of code conventions. SDK allows us to manage everything from code like if we were working from Amazon web dashboard and it’s worth looking into because cloud computing becomes a new trend today. The current post covers some basic operations with SDK. To continue learning AWS SDK take a look at official documentation.

1/29/2013

Cloud compute instances: it's not always about the horsepower

A short time ago we were consulting for one of our customers who considered migrating their application to the cloud. The system embodied a variety of computer vision algorithms, and one of the primary purposes of the back end services was detecting features in images and matching them against the feature database. The algorithms were both CPU- and memory-intensive, therefore one of our first steps involved benchmarking the recognition services on different Amazon EC2 instance types to find an optimal hardware configuration that could be efficiently utilized by the application.

So we launched a bunch of instances with varying computing capacity and gathered the initial results. To ensure complete utilization of hardware resources, we tried running 2, 4, 8, and even 16 benchmarks simultaneously on the same virtual machine.

So far, no surprises here. Obviously, Cluster Compute instances were no match for low-cost single-core workers. Also, we can clearly observe that performance was degrading significantly when the machine was running low on memory and the number of page faults was increasing (check out c1.xlarge with 4 and 8 concurrently running benchmarks).

On the other hand, to make the most out of the Cluster Compute instances, we have to load them heavily, otherwise we would be paying for idle time as a result of overprovisioned capacity. In many cases, providing enough load is a real issue: after all, not many tasks can make a dual-socket, 32-virtual-CPU machine cry. In our case, the only option was to launch more and more benchmarks simultaneously because running one benchmark wasn’t even close to reaching 100% utilization.

That got us thinking: what is the optimal configuration with respect to cloud infrastructure cost? In other words, how can we get the best bang for the buck in this situation? Taking the EC2 hourly rates into account, we built one more chart, and this time the results were much more interesting:

For our particular case, c1.medium and m3.xlarge instances, despite not having shown the best results with respect to running time, suddenly made it to the top 3 cost effective instance types, whereas powerful machines such as cc1.4xlarge and cc2.8xlarge displayed cost effectiveness only under a significant load.

Based on this simple case, three lessons can be learned:

Measurability matters. If you possess concrete figures about the performance results of your application, you can choose a better deployment strategy to minimize operational expenses.
Avoiding idle time on powerful machines with many logical CPUs can be difficult. Not all algorithms and implementations provide the necessary degree of parallelism to ensure efficient utilization of hardware.
If fast processing time is not critical for your product, consider using multiple nodes operating on commodity hardware as an alternative to operating on single high-end servers.

11/14/2012

NVIDIA Tesla K20 benchmark: facts, figures and some conclusions

Newest GPGPU flagman, Tesla K20 was announced by NVIDIA at Supercomputing conference in Salt Lake City yesterday (BTW, you can meet Roman Pavlyuk, ELEKS' CTO and Oleh Khoma, Head of HPC Unit there). Due to partnership with NVIDIA we got access to K20 couple of months ago and did lots of performance tests. Today we're going to tell you more about it's performance in comparison with several other NVIDIA accelerators that we have here at ELEKS.

Test environment

We implemented set of synthetic micro-benchmarks that measure performance of following basic GPGPU operations:

Host/Device kernel operations latency
Reduction time (SUM)
Dependent/Independent FLOPs
Memory management
Memory transfer speed
Device memory access speed
Pinned memory access speed

You can find more information and benchmark results below. Our set of tests is available on GitHub, so that you can run them on your hardware if you want. We ran these tests on seven different test configurations:

GeForce GTX 580 (PCIe-2, OS Windows, physical box)
GeForce GTX 680 (PCIe-2, OS Windows, physical box)
GeForce GTX 680 (PCIe-3, OS Windows, physical box)
Tesla K20Xm (PCIe-3, ECC ON, OS Linux, NVIDIA EAP server)
Tesla K20Xm (PCIe-3, ECC OFF, OS Linux, NVIDIA EAP server)
Tesla M2050 (PCIe-2, ECC ON, OS Linux, Amazon EC2)
Tesla M2050 (PCIe-2, ECC ON, OS Linux, PEER1 HPC Cloud)

One of the goals was to determine the difference between K20 and older hardware configurations in terms of overall system performance. Another goal: to understand the difference between virtualized and non-virtualized environments. Here is what we got:

Host/Device kernel operations latency

One of the new features of K20 is Dynamic Parallelism that allows you to execute kernels from each other. We did a benchmark that measure latency of kernel schedule and execution with and without DP. Results without DP look like that:

Surprisingly, new Tesla is slower than old one and GTX 680, probably because of the driver which was in beta version at the moment we measured performance. It is also obvious that AWS GPU instances are much slower than closer-to-hardware PEER1 ones, because of virtualization.

Then we tried to run similar benchmark with DP on:

Obviously we couldn't run these tests on older hardware because it doesn't support DP. Surprisingly, DP scheduling is slower than traditional one, but DP execution time is pretty much the same with ECC ON and traditional is faster with ECC OFF. We expected that DP latency would be less than traditional. It is hard to say what is the reason of such slowness. We suppose that it could be a driver, but it is just our assumption.

Reduction time (SUM)

Next thing we tried to measure was reduce execution time. Basically we calculated array sum. We did it with different arrays and grid sizes (Blocks x Threads x Array size):

Here we got expected results. New Tesla K20 is slower on small data sets, probably because of less clock frequency and not fully-fledged drivers. It becomes faster when we work with big arrays and use as many cores as possible.

Regarding virtualization, we found that virtualized M2050 is comparable with non-virtualized one on small data sets, but much slower on large data sets.

Dependent/Independent FLOPs

Peak theoretical performance is one of the most misunderstood properties of computing hardware. Some people says it means nothing, some says it is critical. The truth is always somewhere between these points. We tried to measure performance in FLOPs using several basic operations. We measured two types of operations, dependent and independent in order to determine if GPU does automatic parallelization of independent operations. Here's what we got:

Surprisingly, but we haven't got better results with independent operations. Probably we have some issue with our tests or misunderstood how does automatic parallelization work in GPU, but we couldn't implement the test where independent operations are automatically paralleled.
Regarding overall results, Teslas are much faster than GeForces when you work with double precision floating point numbers, which is expected: consumer accelerators are optimized for single precision because double is not required in computer games, primary software they were designed for. FLOPs are also highly dependent on clock speed and number of cores, so newer cards with more cores are usually faster, except of one case with GTX 580/680 and double precision: 580 is faster because of higher clock frequency.
Virtualization doesn't affect FLOPs performance at all.

Memory management

Another critical thing for HPC is basic memory management speed. As there are several memory models available in CUDA it is also critical to understand all the implications of using each of them. We wrote a test that allocate and release 16 b, 10 MB and 100 MB blocks of memory in different models. Please note: we got quite a different results in this benchmark, so it makes sense to show them on charts with logarithmic scale. Here they go:

Device memory is obviously the fastest option in case you allocate big chunk of memory. And GTX 680 with PCIe-3 is our champion in device memory management. Teslas are slower than GeForces in all the tests. Virtualization seriosly affects Host Write Combined memory management. PCIe-3 is better than PCIe-2 which is also obvious.

Memory transfer speed

Another important characteristics of an accelerator is speed of data transfer from one memory model to other. We measured it by copying 100 MB blocks of data between Host and GPU memory in both directions using regular, page locked and write combined memory access models. Here's what we got:

Obviously, PCIe-3 configurations are much faster than PCIe2. Kepler devices (GTX 680 and K20) are faster than other. If you use Page Locked and Write Combined models it makes your transfer speed faster. Virtualization slightly affects regular memory transfer speed, and doesn't affect others at all. We also tested internal memory transfer speed (please note, we haven't multiplied it by 2 as NVIDIA does usually in their tests):

Tesla K20s are faster than GeForce, but difference is not so big. M2050 are almost two times slower then their succesors.

Device memory access speed

We also measured device memory access speed for each configuration we have. Here they go:

Alligned memory access is way faster than non-aligned (almost 10 times difference). Newer accelerators are better than older. Double precicion read/write is faster than single for all the configurations. Virtualization doesn't affect memory access speed at all.

Pinned memory access speed

Last metric we measured was pinned memory access speed when device interacts with host memory. Unfortunately we weren't able to run these tests on GTX 680 with PCIe-3 due to issue with big memory blocks allocation in Windows.

New Tesla is faster then old one. PCIe-3 is obviously faster. Aligned access is almost ten times faster and if you read double precision floats your memory access speed is two times bigger than if you work with single precision floats. Virtualized environment is slower than non-virtualized.

Conclusions

All-in-all new Tesla K20 performs slightly faster than their predecessors. There is no revolution. There is evolution - we got better performance, new tools that make programmer's life easier. There also are several things that are not mentioned in this benchmark, like better support of virtualization and as a result cloud-readyness of K20. Some results were surprising. We expect better results of K20 in several months when new, optimized version of drivers will be available (NVIDIA always has some issues with new drivers just after release, but usually fix them after several updates).

You can find spreadsheet with complete results at Google Docs. Benchmark sources are available at our GitHub.

10/08/2012

DevTalks #4 presentations

Materials from our internal DevTalks event (October 4, 2012).
1. Tiny Google projects (by Ostap Andrusiv)

2. Amazon Web Services crash course: exploring capabilities of the Cloud (by Yuriy Guts)

8/28/2012

Amazon Glacier: why and how does it may work?

Recently Amazon announced Glacier - a new service in their AWS suite. It allows you to store large amounts of data very cheaply - just $0.01/Gb per month plus some expenses for traffic and API calls. It is incredible cheap price, compare it with S3: $0.125 /Gb for first Tb of data. But where is the trick? Well, actually, there is one important detail: retrieval time for you data could be up to several hours.

Why do we need it?

Such a long retrieval time means that you can use it only for some data that should be stored probably for a long time, but there is no need to access it quickly. Consider some kind of historical financial data: in some countries there are government regulations that require financial institutions to store every single transaction for several years after it occurs. It turns out that most of these transactions would never been accessed. It could happen only in case of some investigation or system audit, which happens not very often. Nowadays most of these data is stored on some hard drives or even magnetic tapes and usually they are not connected to the network, so that retrieval time is also up to several hours. And that is the target market for Glacier.

Perito Moreno Glacier. Patagonia, Argentina (photo taken by Luca Galuzzi)

Amazon targets customers who want to store lots of data for a very long time, do not access it very often and quickly, but data should be stored in a very reliable way. Glacier offers you 99.999999999% durability. That's right, eleven nines - impressive reliability! It is very expensive to build such a reliable storage in-house, so only really big corporations had access to such a reliable storage in past. There are several services that adress same issue, but to be honest they don't look seriously enough to be enterprise vendors. Amazon is the first enterprise level vendor on this market.

How does it (may) work?

As a disclaimer: I am not an Amazon employee and there are no information about Glacier architecture available in any public sources. So, I can only imagine and suppose how does it actually may work.

Lets imagine that we want to build service like Glacier. First of all we would need lots of storage hardware. And it must be pretty cheap (in terms of cost per gigabyte) hardware, because we want to sell it for such a little amount of money. There are only two types of hardware that fit these requirements: hard disk drives and magnetic tape. Last one is much cheaper, but less reliable because of magnetic layer degradation. It means one should perform periodic data refresh to prevent data loss. They may use special custom hard drives with big capacity an slow access time, simply because speed is not critical for them. It makes overall solution cost even less. I don't know what kind of data storage hardware Amazon uses, but I think hard drives is little bit more possible option.

Second component of big data warehouses is infrastructure that connect users with their data and make it available for them in timeframe described in SLA. It could be network, power supplies, cooling and lots and lots of things you can find in modern datacenters. If you would build service like S3 infrastructure cost would be even bigger than storage cost. But here are one important difference between S3 and Glacier: you don't have to provide access to data quickly. It means that you don't have to keep your hard drive turned on, which means reduced power consumption. It means that you don't even have to keep your hard drive plugged into server case! It could be stored in simple locker. And all you need is employee who is responsible for finding and plugging your hard drive into server when user asks access to data. And several hours are definitely enough to do it even for human being. Or little cute orange robot:

Sounds crazy? Well, lets look at this solution from the other side. What is Amazon, first of all? Cloud vendor? Nope. They are retail company. One of the biggest in the world. And they have probably the best logistics and warehouse infrastructure in the world. Lets imagine you order hard drive on Amazon web site. How much time does it usualy take for Amazon to find it in their warehouse, pack and send it to you? Several hours? Just imagine that they don't send it, but plug it into a server and turn on instead. Sounds like pretty similar task, isn't it?

It is amazing how Amazon integrates their businesses with each other. AWS was a side product of their main retail business. Product they started to sell just because they realized that it has value not only for their business, but also for other people. And now we can see how AWS uses offline infrastructure of Amazon to provide absolutely new kind of service. Fantastic fusion of online and offline infrastructures working together to create something new!

7/24/2012

AWS Simple Icons

Have you ever tried to draw cloud solution architecture in Visio? It is always hard to choose template for this task. Fortunately, Amazon Web Services provides simple icon set (including Visio stencils) for such kind of diagrams. Check this out: