1/29/2013

Cloud compute instances: it's not always about the horsepower

A short time ago we were consulting for one of our customers who considered migrating their application to the cloud. The system embodied a variety of computer vision algorithms, and one of the primary purposes of the back end services was detecting features in images and matching them against the feature database. The algorithms were both CPU- and memory-intensive, therefore one of our first steps involved benchmarking the recognition services on different Amazon EC2 instance types to find an optimal hardware configuration that could be efficiently utilized by the application.


So we launched a bunch of instances with varying computing capacity and gathered the initial results. To ensure complete utilization of hardware resources, we tried running 2, 4, 8, and even 16 benchmarks simultaneously on the same virtual machine.


So far, no surprises here. Obviously, Cluster Compute instances were no match for low-cost single-core workers. Also, we can clearly observe that performance was degrading significantly when the machine was running low on memory and the number of page faults was increasing (check out c1.xlarge with 4 and 8 concurrently running benchmarks).

On the other hand, to make the most out of the Cluster Compute instances, we have to load them heavily, otherwise we would be paying for idle time as a result of overprovisioned capacity. In many cases, providing enough load is a real issue: after all, not many tasks can make a dual-socket, 32-virtual-CPU machine cry. In our case, the only option was to launch more and more benchmarks simultaneously because running one benchmark wasn’t even close to reaching 100% utilization.

That got us thinking: what is the optimal configuration with respect to cloud infrastructure cost? In other words, how can we get the best bang for the buck in this situation? Taking the EC2 hourly rates into account, we built one more chart, and this time the results were much more interesting:


For our particular case, c1.medium and m3.xlarge instances, despite not having shown the best results with respect to running time, suddenly made it to the top 3 cost effective instance types, whereas powerful machines such as cc1.4xlarge and cc2.8xlarge displayed cost effectiveness only under a significant load.

Based on this simple case, three lessons can be learned:
  • Measurability matters. If you possess concrete figures about the performance results of your application, you can choose a better deployment strategy to minimize operational expenses.
  • Avoiding idle time on powerful machines with many logical CPUs can be difficult. Not all algorithms and implementations provide the necessary degree of parallelism to ensure efficient utilization of hardware.
  • If fast processing time is not critical for your product, consider using multiple nodes operating on commodity hardware as an alternative to operating on single high-end servers.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.