ELEKS Labs

11/08/2012

Cloud Solution for Global Team Engagement at Your Fingertips

Effective management of global teams is one of the most common issues for Localization services providers. Many LSPs are looking for ways to minimize their costs and time through automation. Generally, LSPs perform localization services engaging both their in-house staff and sub-contractors who are usually located all over the world.

To address this challenge, ELEKS has developed a cloud-base system that can be used by both subcontractors and in-house teams. This system allows for a full automation of products installation needed for the teams, whereas automation is synchronized with the products localization process.

The cloud solution has already shown its numerous benefits:

· measurability by projects, resources, hours

· increased efficiency

· decreased deployment and support (in man-hours)

· centralized storage and management system

· easier support and daily backups

· server uptime – 99.8%

Full Presentation in Prezi http://prezi.com/jyt9qe-5cqum/cloud-solution-for-global-team-engagement/

The solution was showcased at Localization World Conference in Seattle 2012 by Taras Tovstyak. The presentation included the case study and the tasty features of cloud system. Also there was presented a glimpse into the future of localization –the video of Dynamic Localization in action.

11/07/2012

HTML5 Canvas: performance and optimization

It's no doubt that HTML5 is going to be next big platform for software development. Some people say it could even kill traditional operating systems and all applications in future will be written with HTML5 and JavaScript. Others say HTML5 apps will have their market share, but never replace native applications completely. One of the main reasons is poor JavaScript performance, they say. But wait, browser vendors say they did lots of optimizations and JavaScript is fast as it was never before! Isn't it true?
Well, simple answer is yes... and no. Modern JavaScript engines such as Google's V8 have impressive performance in case you compare them with their predecessors five-ten years ago. Although, their results are not so impressive if you compare them with statically typed languages such as Java or C#. And of course it will be absolutely unfair competition if we compare JavaScript with native code written with C++.
But how one can determine if their application could be written in JavaScript or should they choose native tools?
Recently we had a chance to make such kind of decision. We were working on proposal for tablet application that should include Paint-like control where user can draw images using standard drawing tools like Pencil and Fill. Target platforms were Android, Windows 8 and iOS, so cross-platform development tools had to be taken into consideration. From the very beginning there was a concern that HTML5 canvas could be too slow for such task. We implemented simple demo application to test canvas performance and prove if it is applicable in that case. Leaping ahead, let us point out that we have mixed fillings about gathered results. On the one hand canvas was fast enough on simple functions like pencil drawing due to native implementation of basic drawing methods. On the other hand, when we implemented classic Flood Fill algorithm using Pixel Manipulation API we found that it is too slow for that class of algorithms. During that research we applied set of performance optimizations to our Flood Fill implementation. We measured their effect on several browsers and want to share them with you.

Initial implementation

Our very first Flood Fill implementation was very simple:

We tested it with 3 desktop browsers running on Core i5 (3.2 GHz) and 3rd generation iPad with iOS 6. We got following results with that implementation:

Surprisingly, IE 10 is even slower than Safari on iPad. Chrome proved that it is still fastest browser in the world.

Optimize pixel manipulation

Let's take a look at getPixelColor function:
Code looks little bit ugly, so let's cache result of ((y * (img.width * 4)) + (x * 4)) expression (pixel offset) in variable. Also it makes sense to cache img.data reference into another variable. WE also applied similar optimizations to setPixelColor function:

At least code looks more readable. And what about performance?

Impressive, we got 40-50% performance gain on desktop browsers and about 30% on Safari for iOS. IE 10 now has comparable performance to mobile Safari. It seems that Safari's JavaScript compiler already applied some of optimization we did, so effect was less dramatic for it.

Optimize color comparison

Let's take a look at getPixelColor function again. We mostly use it in if statement to determine if pixel already was filled with new color: getPixelColor(img, cur.x + dx[i], cur.y + dy[i]) != hitColor. As far as you probably know, HTML5 canvas API provide access to individual color components of each pixel. We use this components to get whole color in RGB format, but here we actually don't need to do it. Let's implement special function to compare pixel color with given color:
Here we use standard behavior of || operator: it doesn't execute right part of the expression if left part returns true. This optimization allows us to minimize array reads and arithmetic operations count. Let's take look at its effect:

Almost no effect: 5-6% faster on Chrome and IE and 2-3% slower on FF and Safari. So, problem must be somewhere else. We left this fix in our code because the code is little bit faster in average with it than without.

Temp object for inner loop

As you probably noticed, our code in main flood fill loop looks little bit ugly because of duplicated arithmetic operations:

Let's rewrite it using temp object for new point we work with:
And test effect:

Results are discouraging. It seems that side-effect of such fix is higher garbage collector load and as a result overall slowness of the application. We tried to replace it with two variables for coordinates, defined in outer scope but it didn't help at all. Logical decision is to revert that code, what we actually did.

Visited pixels cache

Let's think again about pixel visiting in Flood Fill algorithm. It is obvious that we should visit each pixel only once. We guarantee such behavior by comparing colors of neighbor pixels with hit pixel color, which must be slow operation. In fact, we can mark pixels as visited and compare colors only if pixel is not visited. Let's do it:
So, what are results? Well, here they go:

Again, absolutely unexpected results: IE 10 is 10% faster with that fix, but other browsers are dramatically slower! Safari is even slower than initial implementation. It is hard to tell what is the main reason of such behavior, but we can suppose that it could be garbage collector. It also makes sense to apply it in case you don't target mobile Safari and want to have maximum performance in worst case (Sorry IE, it is you. As usual).

Conclusions

We tried to make some more optimizations but it didn't help. Worst thing about JavaScript optimizations is that it is hard to predict their effect, mainly because of implementation differences. Remember, there are two basic rules when you optimize JavaScript code:

benchmark results after each optimization step
test in each browser you want your application work with

HTML5 is cool, but still much slower than native platforms. You should think twice before choosing it as a platform for any compute-intensive application. In other words, there will be no pure HTML5 Photoshop for a long time. Probably you can move some calculations to server-side, but sometimes it is not an option.
You can check our demo code at GitHub: https://github.com/eleks/canvasPaint
You can play with app, deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html
Stay tuned!

UPD: Part 2: going deeper!

11/05/2012

How does good web application look like? Part 2: dev/ops point of view.

Last time when we were writing about web applications our main focus was on user perspective. It is time to discuss another dimensions. So, how does good web application look like for developers and operations team?

We prepared short list of most important properties for that guys:

Availability - ability to operate within a declared proportion of time. Usually is defined in a Service Level Agreement (SLA) as a specific number of "nines" (e.g. four nines = 0.9999, hence the system can be unavailable for at most 0.0001 time = one hour per year). Availability of your application is not the only matter of well-written code, but also depends on hardware, network configuration, deployment strategy, operation team proficiency and many other things.
Scalability - ability to serve increasing amount of requests without a need for architectural changes. In case you have scalable application you can simply add more hardware into your cluster and server more and more clients. In case you host your application in a cloud you even can dynamically scale it up and down making your application incredibly cost-efficient.
Fault tolerance - ability to operate in case of some unpreventable failure (usually hardware). Usually it means that system can lose some part of functionality in case of failure, but other parts should be working. Fault tolerance is related to availability and some people claim it to be one of the properties of highly available applications.
Extensibility - system functionality can be extended without a need for core and/or architectural changes. Usually in this case the system is extended by adding plug-ins and extension modules. Sometimes it could be quite tricky to implement extensibility, especially for SaaS.
Multitenancy - ability to isolate logical user spaces (e.g. individual or organization) so that the tenants feel like they are the only user/organization in the system. It sounds easy, but could be a challenge on a large scale.
Interoperability - ability to integrate with other systems usually by providing or consuming some kind of API. With comprehensive API your service can leverage full power of developers community. Consuming other services API you can extend your application functionality in an easiest way.
Flexibility - architectural property that describes the ability of the system to evolve and change without a need to perform significant changes in its architecture. Holy Grail of software architecture - it is almost impossible to achieve it for fast growing web applications, but you should always do your best.
Security - ability to prevent information from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction. Another critical property for both enterprise and consumer markets. Nobody wants their data to be available for unauthorized access.
Maintainability - ease of maintenance, error isolation and correction, upgrading and dealing with changing environment. Of course it is better to have system that doesn't require maintenance at all, but in real world even the best systems do require it. You have to provide comprehensive maintenance toolset to your operations team in order to have your system up and running most of the time.
Measurability - ability to track and record system usage. Usually it is required for analytic purposes and in pay-per-use scenarios. Even if you don't have pay-per-use scenarios it is always better to understand your hardware utilization rate in order to optimize costs.
Configurability - ability to change system look and behavior without need to change anything in its code. Being critical for web products that are installed on premise it is also important for software-as-a-service model.
Disaster Recovery - ability to recover after significant failures (usually hardware). This usually includes a disaster recovery plan that lists possible failure scenarios and steps the operations team should perform to recover the system from failure.
Cost Efficiency/Predictability - ability to operate efficiently in relation to the cost of operation. Being closely related to measurability this property concentrates on financial effectiveness of web application.

You have to account lots of things when you're developing your web application. Hope this list would be helpful for you. Stay tuned!

11/02/2012

New ELEKS web-site

Several facts about our new web-site:

pure HTML5/CSS3
responsive design - check it out on your mobile device
Metro/Windows 8 like interface
hosted on Windows Azure cloud
based on open-source Orchard CMS

Making of:

10/08/2012

DevTalks #4 presentations

Materials from our internal DevTalks event (October 4, 2012).
1. Tiny Google projects (by Ostap Andrusiv)

2. Amazon Web Services crash course: exploring capabilities of the Cloud (by Yuriy Guts)

9/21/2012

How does good web application look like? Part 1: user point of view.

People often say that software is "good" when they are satisfied with it. But it is tricky to describe what are objective criteria for "good" software. Moreover, it may mean absolutely different things for various people. Recently we've asked ourselves how do we see ideal web application. What characteristics should it have? What is important and what isn't?
We prepared short list of characteristics that we think should be here for almost any web application that pretends to be called "good". We divided this list into three parts. Today we would like to share first part of this list - how does application look like from user point of view.

So, what is important for users?

Usability. By usability people usually mean ease of use and learnability for user. If you have poor usability in your enterprise system you have to spend lots of time and money to train your users and even it may not help. For consumer market poor usability usually means you lost lots of users just because they don't understand how to use your system.
Globalization. Ability to adapt to language and cultural specifics of various target markets. You don't have to globalize your application if you target local market. But in case you have big plans for total world domination you have to take care about this.
Multiplatform Optimization. Ability to operate efficiently on various platforms and browsers (including mobile). Ten years ago it was pretty easy to make your application available for almost any user. Simply because almost all of them were using Internet Explorer on their desktop PCs running Windows. Nowadays this is no longer the truth - you have to deal with various browsers on various operating systems running on various hardware. And don't forget about native mobile clients for your service!
Customizability. Ability to customize system UI and behavior according to particular user needs. Little bit controversial characteristic for web application. There are some people who say that you don't have to allow users to customize your application, because users don't actually know how it should look and behave like. But sometimes it is important to provide this sort of freedom to them.
Responsiveness. Ability to respond to user interaction within a given time period. Nobody likes application that freezes after some action and don't show any progress to a user. User should be able to interact with application even if it is performing some complex long running task at the moment. Sometimes it makes sense to change the way your application interact with user completely in order to make it more responsive.
Bandwidth Awareness. Efficient usage of network channel between system and user. It is critical for big content-delivery services like YouTube. For those users who have fast connection it should show video with HD resolution, but for users that are connected through slow and expensive 3G it should consider to show video with less amount of traffic.
Search Engine Friendliness. Ease of index by search engines, conformity with SEO best practices. If you want your service to be indexed by search engines you have to think about it from the very beginning. Correct usage of things like human-readable URLs, semantic markup and permalinks can be the differentiator for your service that puts it into first place on Google Search Result page.
Social Media Integration. Integration with social networks: sharing, friends' recommendations, etc. Social networks changed Web dramatically during last couple of years. If you deal with consumer market you should integrate at least with biggest social networks.
Native web stack. Ability to run on browser without need to install any additional plug-ins. Users don't want to install any additional software on their devices to run your application. Sometimes they even can't, because of platform limitations (think iOS, for instance). So, you have to rely only on native web stack. Thankfully, HTML5 adoption is growing so fast.
Accessibility. Ease of use for users with disabilities or specific needs. You should think about accessibility if you want to make your service available for every single person. You must to do it if you do, for instance, some e-government solution and accessibility is strict requirement for your system.

Next time we will tell you about things application developers and operation team are aware of. Stay tuned!

9/18/2012

Event Store was launched yesterday

We are proud to present you Event Store - an awesome, rock-solid, super-fast persistence engine for event-sourced applications. It was launched yesterday in London by Greg Young and his team from ELEKS. You can see video of this presentation on Skills Matter web-site.
Check out more details on Event Store web-site and GitHub.

8/28/2012

Amazon Glacier: why and how does it may work?

Recently Amazon announced Glacier - a new service in their AWS suite. It allows you to store large amounts of data very cheaply - just $0.01/Gb per month plus some expenses for traffic and API calls. It is incredible cheap price, compare it with S3: $0.125 /Gb for first Tb of data. But where is the trick? Well, actually, there is one important detail: retrieval time for you data could be up to several hours.

Why do we need it?

Such a long retrieval time means that you can use it only for some data that should be stored probably for a long time, but there is no need to access it quickly. Consider some kind of historical financial data: in some countries there are government regulations that require financial institutions to store every single transaction for several years after it occurs. It turns out that most of these transactions would never been accessed. It could happen only in case of some investigation or system audit, which happens not very often. Nowadays most of these data is stored on some hard drives or even magnetic tapes and usually they are not connected to the network, so that retrieval time is also up to several hours. And that is the target market for Glacier.

Perito Moreno Glacier. Patagonia, Argentina (photo taken by Luca Galuzzi)

Amazon targets customers who want to store lots of data for a very long time, do not access it very often and quickly, but data should be stored in a very reliable way. Glacier offers you 99.999999999% durability. That's right, eleven nines - impressive reliability! It is very expensive to build such a reliable storage in-house, so only really big corporations had access to such a reliable storage in past. There are several services that adress same issue, but to be honest they don't look seriously enough to be enterprise vendors. Amazon is the first enterprise level vendor on this market.

How does it (may) work?

As a disclaimer: I am not an Amazon employee and there are no information about Glacier architecture available in any public sources. So, I can only imagine and suppose how does it actually may work.

Lets imagine that we want to build service like Glacier. First of all we would need lots of storage hardware. And it must be pretty cheap (in terms of cost per gigabyte) hardware, because we want to sell it for such a little amount of money. There are only two types of hardware that fit these requirements: hard disk drives and magnetic tape. Last one is much cheaper, but less reliable because of magnetic layer degradation. It means one should perform periodic data refresh to prevent data loss. They may use special custom hard drives with big capacity an slow access time, simply because speed is not critical for them. It makes overall solution cost even less. I don't know what kind of data storage hardware Amazon uses, but I think hard drives is little bit more possible option.

Second component of big data warehouses is infrastructure that connect users with their data and make it available for them in timeframe described in SLA. It could be network, power supplies, cooling and lots and lots of things you can find in modern datacenters. If you would build service like S3 infrastructure cost would be even bigger than storage cost. But here are one important difference between S3 and Glacier: you don't have to provide access to data quickly. It means that you don't have to keep your hard drive turned on, which means reduced power consumption. It means that you don't even have to keep your hard drive plugged into server case! It could be stored in simple locker. And all you need is employee who is responsible for finding and plugging your hard drive into server when user asks access to data. And several hours are definitely enough to do it even for human being. Or little cute orange robot:

Sounds crazy? Well, lets look at this solution from the other side. What is Amazon, first of all? Cloud vendor? Nope. They are retail company. One of the biggest in the world. And they have probably the best logistics and warehouse infrastructure in the world. Lets imagine you order hard drive on Amazon web site. How much time does it usualy take for Amazon to find it in their warehouse, pack and send it to you? Several hours? Just imagine that they don't send it, but plug it into a server and turn on instead. Sounds like pretty similar task, isn't it?

It is amazing how Amazon integrates their businesses with each other. AWS was a side product of their main retail business. Product they started to sell just because they realized that it has value not only for their business, but also for other people. And now we can see how AWS uses offline infrastructure of Amazon to provide absolutely new kind of service. Fantastic fusion of online and offline infrastructures working together to create something new!

8/23/2012

EventStore Launch by Greg Young

We're proud to announce that one of project we've been working last couple of months is going to be presented to public. Greg Young will launch EventStore, distributed open-source storage for events at SkillsMatter eXchange London, September 17. Join Greg and his team from ELEKS there.

8/01/2012

MPI or not MPI - that is the question

One of the most popular questions to us at latest GPU Technology Conference was "Why wouldn't we use MPI instead of writing our own middleware?". The short answer is "Because home-made middleware is better for projects we did". If you are not satisfied with such explanation - the longer answer follows.

MPI vs. custom communication library

First of all, one should always keep in mind that MPI is universal framework, built for general-purpose HPC. It is really good for, lets say, academic HPC where you have some calculation that you need to run only once, get results and forget about your program. But in case if you have commercial HPC cluster, designed to solve some particular problem many times (let's say, do some kind of simulation using Monte-Carlo method), you should be able to optimize every single component of your system. Just to make sure that your hardware utilization rate is high enough to make your system cost-efficient. With your own code-base you can make network communications as fast as possible without any limitations. And what is very important you can keep this code simple and easy for understanding - which is not always possible with general-purpose frameworks like MPI.

But what about complexity of your own network library? Well, it is not so complex as you could imagine. Some tasks (like Monte-Carlo simulations) are embarrassingly parallel, so that you don't need complex interactions between your nodes. You have coordinator that sends task to workers and then aggregate results from them (see our GTC presentation for more details about that arhcitecture). It is relatively easy to implement lightweight messaging library with raw sockets, you just need good enough software engineer for that task.

And last, but definitely not least: lightweight solution, written to solve some particular problem is much faster and predictable than universal tools like MPI.

Benchmark

Our engineers compared performance of our network code with Open MPI on Ubuntu and Intel MPI on CentOS (for some reasons Intel MPI refused to work on Ubuntu). They tested multicast performance, because it is critical for architecture we use in our solutions. There were three benchmarks (described with kind of pseudo-code):

1. MPI point-to-point

if rank == 0:
 #master
 for j in 0..packets_count:
  for i in 1..procesess_count:
   MPI_Isend(…) #async send to slave processes
  for i in 1..procesess_count:
   MPI_Irecv(…) #async recv from slave processes
  for i in 1..procesess_count:
   MPI_Wait(…) #wait for send/recv complete 
else:
 #slave
 for j in 0..packets_count:
  MPI_Recv(…) #recv from master processes
  MPI_Send(…) #send to master processes

2. MPI broadcast

if rank == 0:
 #master
 for j in 0..packets_count:
  MPI_Bcast(…) #broadcast to all slave processes
  for i in 1..procesess_count:
   MPI_Irecv(…) #async recv from slave processes
  for i in 1..procesess_count:
   MPI_Wait(…) #wait for recv
else:
 #slave
 for j in 0..packets_count:
  MPI_Bcast(…) #recv broadcast message from master processes
  MPI_Send(…) #send to master processes

3. TCP point-to-point

#master 
controllers = [] 
for i in 1..procesess_count: #waiting for all slaves
 socket = tcp_accept_as_blob_socket(…)
 controllers.append(controller_t(socket), …)

for j in 0..packets_count:
 for i in 1..procesess_count:
  controllers[i].send(…) #async send to slave processes
 for i in 1..procesess_count:
  controllers[i].recv(…) #wait for recv from slave processes

#slave
socket = Tcp_connect_as_blob_socket(…)#connecting to master
for j in 0..packets_count:
 sock.read(…)#recv packet from master
 sock.write(…) #send to packet to master

We ran it with 10, 20, 40, 50, 100, 150 and 200 processes, by sending packets of size 8, 32, 64, 256, 1024 and 2048 bytes. Each test included 1000 packets.

Results

First of all, lets look at open-source MPI implementation results:

Open MPI @ Ubuntu, cluster of 3 nodes, 10 workers:

Open MPI @ Ubuntu, cluster of 3 nodes, 50 workers:

Open MPI @ Ubuntu, cluster of 3 nodes, 200 workers:

So, Open MPI is slower than our custom TCP messaging library in all tests. Another interesting thing, Open MPI broadcast sometimes is even slower than iterative point-to-point messaging with Open MPI.

Let's look at proprietary MPI implementation by Intel. For some reasons it didn't work on Ubuntu 11.04 we use on our test cluster, so we decided to do a benchmark on another cluster with CentOS. Please keep that fact in mind - you can't directly compare results of Open MPI and Intel MPI as we tested them on different hardware. Our main goal was to compare MPI and our TCP messaging library, so these results work for us. Another thing: Intel MPI broadcast didn't work for us, so we tested only point-to-point communication performance.

Intel MPI @ CentOS, cluster of 2 nodes, 10 workers:

Intel MPI @ CentOS, cluster of 2 nodes, 50 workers:

Intel MPI @ CentOS, cluster of 2 nodes, 200 workers:

Intel MPI is much more serious opponent for our library than Open MPI. It has 20-40% faster results on 10 workers configuration. It has comparable performance on 50 workers (sometimes faster). But on 200 workers it is 50% slower than our messaging library.

You can also download Excel spreadsheet with complete results.

Conclusions

In general, Open MPI doesn't fit requirements for middleware in our projects. It is slower than custom library and (what is even more important) it is quite unstable and unpredictable.
Intel MPI point-to-point messaging looks much more interesting on small clusters, but on large it becomes slow in comparison with custom library. We experienced problems with running it on Ubuntu and it might be a problem in case you want to use Intel MPI with that Linux distributive. Broadcast is unstable and hangs up.
So, sometimes decision to write your own communication library looks not so bad, right?