ELEKS Labs: November 2012

11/28/2012

What is Exploratory Testing

"Exploratory software testing is a powerful approach, yet widely

misunderstood. In my experience, it can be orders of magnitude

more productive than scripted testing. All testers who create tests

at all practice some form of exploratory testing, yet many don't

even realize it. Few of us study this approach, and it doesn't get

much respect in our field. This attitude is beginning to change

as companies seek ever more agile and cost effective methods of

developing software."

James Bach

Exploratory testing (ET) is an approach to test software where the tester does not need to follow a specific test design. But rather, ET should facilitate the tester in testing the complete system comprehensively. ET is seen by some, as a way to conduct simultaneous learning, test design and execution of tests simultaneously. Today, ET is defined by most researchers as an activity where a developer/tester simultaneously learns, design and execute the tests. To summarise this, it means that the tester is exploring the software, learning its functionality and performing test execution on the basis of her intuition. No specific systematic approach is followed in terms of following a scripted test case document that leads the tester to execute the tests on a step by step basis. The tester himself controls the design of the tests while executing and learning more about the software. This helps her in building tests effectively while exploring the undiscovered parts of the software.

Open-source WPF Layout to Layout Transitions library

Modern users are cockered with well-designed fluid UIs. Based on our own experience we know what an wow effect can layout to layout transitions make for application. Unfortunately, by this time there were no free and open source solution to support them in WPF. Yuriy Zanichkovskyy has implemented an open-source library that for Layout-to-Layout transitions in WPF. You can find detailed description of the library on CodeProject and fork the code on GitHub.

11/27/2012

A deep look into the Event Store

A deep look into the Event Store from Øredev Conference on Vimeo.

What if I told you that the new Event Store (OSS geteventstore.com) is an ACID compliant database with only 24 bytes of mutable data? This session will look deep inside the Event Store and architectural decisions and trade offs made in the development of it.

Greg Young
Greg Young is a loud mouth about many things including CQRS, Event Sourcing, and getting your tests to do something more than validating your code. He is currently involved with Event Store a functional database geteventstore.com

11/19/2012

Penetration Testing vs Vulnerability Assessment

There seems to be a certain amount of confusion within the security industry about the difference between Penetration Testing and Vulnerability Assessment, they are often classified as the same thing when in fact they are not.

Penetration Testing may sound a lot more exciting, but most people actually want a VA not a pentest, many projects are labelled as pen tests when in fact they are 100% VA.

A Penetration Test mainly consists of a VA, but it goes one step further..

A penetration test is a method of evaluating the security of a computer system or network by simulating an attack by a malicious hacker. The process involves an active analysis of the system for any weaknesses, technical flaws or vulnerabilities. This analysis is carried out from the position of a potential attacker, and can involve active exploitation of security vulnerabilities. Any security issues that are found will be presented to the system owner together with an assessment of their impact and often with a proposal for mitigation or a technical solution.

A vulnerability assesment is what most companies generally do, as the systems they are testing are live production systems and can't afford to be disrupted by active exploits which might crash the system.

Vulnerability assessment is the process of identifying and quantifying vulnerabilities in a system. The system being studied could be a physical facility like a nuclear power plant, a computer system, or a larger system (for example the communications infrastructure or water infrastructure of a region).

Vulnerability assessment has many things in common with risk assessment. Assessments are typically performed according to the following steps:

1. Cataloging assets and capabilities (resources) in a system

2. Assigning quantifiable value and importance to the resources

3. Identifying the vulnerabilities or potential threats to each resource

4. Mitigating or eliminating the most serious vulnerabilities for the most valuable resources

This is generally what a security company is contracted to do, from a technical perspective, not to actually penetrate the systems, but to assess and document the possible vulnerabilities and recommend mitigation measures and improvements.

On the other hand, a pen test simulates the actions of an external and/or internal attacker that aims to breach the security of the organization. Using many tools and techniques, the penetration tester attempts to exploit critical systems and gain access to sensitive data. Depending on the scope, a pen test can expand beyond the network to include social engineering attacks or physical security tests. Also, there are two primary types of pen tests: "white box", which uses vulnerability assessment and other pre-disclosed information, and "black box", which is performed with very little knowledge of the target systems and it is left to the tester to perform their own reconnaissance. Typically, pen tests follow these steps:

Determination of scope
Targeted information gathering or reconnaissance
Exploit attempts for access and escalation
Sensitive data collection testing
Clean up and final reporting

by Andriy Skop

11/14/2012

NVIDIA Tesla K20 benchmark: facts, figures and some conclusions

Newest GPGPU flagman, Tesla K20 was announced by NVIDIA at Supercomputing conference in Salt Lake City yesterday (BTW, you can meet Roman Pavlyuk, ELEKS' CTO and Oleh Khoma, Head of HPC Unit there). Due to partnership with NVIDIA we got access to K20 couple of months ago and did lots of performance tests. Today we're going to tell you more about it's performance in comparison with several other NVIDIA accelerators that we have here at ELEKS.

Test environment

We implemented set of synthetic micro-benchmarks that measure performance of following basic GPGPU operations:

Host/Device kernel operations latency
Reduction time (SUM)
Dependent/Independent FLOPs
Memory management
Memory transfer speed
Device memory access speed
Pinned memory access speed

You can find more information and benchmark results below. Our set of tests is available on GitHub, so that you can run them on your hardware if you want. We ran these tests on seven different test configurations:

GeForce GTX 580 (PCIe-2, OS Windows, physical box)
GeForce GTX 680 (PCIe-2, OS Windows, physical box)
GeForce GTX 680 (PCIe-3, OS Windows, physical box)
Tesla K20Xm (PCIe-3, ECC ON, OS Linux, NVIDIA EAP server)
Tesla K20Xm (PCIe-3, ECC OFF, OS Linux, NVIDIA EAP server)
Tesla M2050 (PCIe-2, ECC ON, OS Linux, Amazon EC2)
Tesla M2050 (PCIe-2, ECC ON, OS Linux, PEER1 HPC Cloud)

One of the goals was to determine the difference between K20 and older hardware configurations in terms of overall system performance. Another goal: to understand the difference between virtualized and non-virtualized environments. Here is what we got:

Host/Device kernel operations latency

One of the new features of K20 is Dynamic Parallelism that allows you to execute kernels from each other. We did a benchmark that measure latency of kernel schedule and execution with and without DP. Results without DP look like that:

Surprisingly, new Tesla is slower than old one and GTX 680, probably because of the driver which was in beta version at the moment we measured performance. It is also obvious that AWS GPU instances are much slower than closer-to-hardware PEER1 ones, because of virtualization.

Then we tried to run similar benchmark with DP on:

Obviously we couldn't run these tests on older hardware because it doesn't support DP. Surprisingly, DP scheduling is slower than traditional one, but DP execution time is pretty much the same with ECC ON and traditional is faster with ECC OFF. We expected that DP latency would be less than traditional. It is hard to say what is the reason of such slowness. We suppose that it could be a driver, but it is just our assumption.

Reduction time (SUM)

Next thing we tried to measure was reduce execution time. Basically we calculated array sum. We did it with different arrays and grid sizes (Blocks x Threads x Array size):

Here we got expected results. New Tesla K20 is slower on small data sets, probably because of less clock frequency and not fully-fledged drivers. It becomes faster when we work with big arrays and use as many cores as possible.

Regarding virtualization, we found that virtualized M2050 is comparable with non-virtualized one on small data sets, but much slower on large data sets.

Dependent/Independent FLOPs

Peak theoretical performance is one of the most misunderstood properties of computing hardware. Some people says it means nothing, some says it is critical. The truth is always somewhere between these points. We tried to measure performance in FLOPs using several basic operations. We measured two types of operations, dependent and independent in order to determine if GPU does automatic parallelization of independent operations. Here's what we got:

Surprisingly, but we haven't got better results with independent operations. Probably we have some issue with our tests or misunderstood how does automatic parallelization work in GPU, but we couldn't implement the test where independent operations are automatically paralleled.
Regarding overall results, Teslas are much faster than GeForces when you work with double precision floating point numbers, which is expected: consumer accelerators are optimized for single precision because double is not required in computer games, primary software they were designed for. FLOPs are also highly dependent on clock speed and number of cores, so newer cards with more cores are usually faster, except of one case with GTX 580/680 and double precision: 580 is faster because of higher clock frequency.
Virtualization doesn't affect FLOPs performance at all.

Memory management

Another critical thing for HPC is basic memory management speed. As there are several memory models available in CUDA it is also critical to understand all the implications of using each of them. We wrote a test that allocate and release 16 b, 10 MB and 100 MB blocks of memory in different models. Please note: we got quite a different results in this benchmark, so it makes sense to show them on charts with logarithmic scale. Here they go:

Device memory is obviously the fastest option in case you allocate big chunk of memory. And GTX 680 with PCIe-3 is our champion in device memory management. Teslas are slower than GeForces in all the tests. Virtualization seriosly affects Host Write Combined memory management. PCIe-3 is better than PCIe-2 which is also obvious.

Memory transfer speed

Another important characteristics of an accelerator is speed of data transfer from one memory model to other. We measured it by copying 100 MB blocks of data between Host and GPU memory in both directions using regular, page locked and write combined memory access models. Here's what we got:

Obviously, PCIe-3 configurations are much faster than PCIe2. Kepler devices (GTX 680 and K20) are faster than other. If you use Page Locked and Write Combined models it makes your transfer speed faster. Virtualization slightly affects regular memory transfer speed, and doesn't affect others at all. We also tested internal memory transfer speed (please note, we haven't multiplied it by 2 as NVIDIA does usually in their tests):

Tesla K20s are faster than GeForce, but difference is not so big. M2050 are almost two times slower then their succesors.

Device memory access speed

We also measured device memory access speed for each configuration we have. Here they go:

Alligned memory access is way faster than non-aligned (almost 10 times difference). Newer accelerators are better than older. Double precicion read/write is faster than single for all the configurations. Virtualization doesn't affect memory access speed at all.

Pinned memory access speed

Last metric we measured was pinned memory access speed when device interacts with host memory. Unfortunately we weren't able to run these tests on GTX 680 with PCIe-3 due to issue with big memory blocks allocation in Windows.

New Tesla is faster then old one. PCIe-3 is obviously faster. Aligned access is almost ten times faster and if you read double precision floats your memory access speed is two times bigger than if you work with single precision floats. Virtualized environment is slower than non-virtualized.

Conclusions

All-in-all new Tesla K20 performs slightly faster than their predecessors. There is no revolution. There is evolution - we got better performance, new tools that make programmer's life easier. There also are several things that are not mentioned in this benchmark, like better support of virtualization and as a result cloud-readyness of K20. Some results were surprising. We expect better results of K20 in several months when new, optimized version of drivers will be available (NVIDIA always has some issues with new drivers just after release, but usually fix them after several updates).

You can find spreadsheet with complete results at Google Docs. Benchmark sources are available at our GitHub.

11/13/2012

Tesla K20 benchmark results

Recently we've developed a set of synthetic tests to measure NVIDIA GPU performance. We ran it on several test environments:

GTX 580 (PCIe-2, OS Windows, physical box)
GTX 680 (PCIe-2, OS Windows, physical box)
GTX 680 (PCIe-3, OS Windows, physical box)
Tesla K20Xm (PCIe-3, ECC ON, OS Linux, NVIDIA test data center)
Tesla K20Xm (PCIe-3, ECC OFF, OS Linux, NVIDIA test data center)
Tesla M2050 (PCIe-2, ECC ON, OS Linux, Amazon EC2)

Please note, that next generation Tesla K20 is also included into our results (many thanks to NVIDIA for their early access program).
You can find results at Google Docs. Benchmark sources are available at our GitHub account.
Stay tuned, we're going to make some updates on this.

UPD: Detailed results with charts and some conclusions: http://www.elekslabs.com/2012/11/nvidia-tesla-k20-benchmark-facts.html

11/12/2012

Windows Azure Backup Tools available on GitHub

Migrating applications to the cloud involves a big step in the way we deploy and maintain our software. While leveraging all the tasty features provided by cloud platforms, such as high availability and seamless scalability, a good IT professional also wants to make sure that the cloud version of the application is every bit as reliable and secure as the on-premises version.

From the operations team's point of view, there are numerous aspects of running a cloud application properly, typical of which are:

Application data must be regularly backed up, the time it takes to restore the data must be as little as possible. Quick restore means less downtime, less downtime means happier customers.
It is preferable for the cloud application to be portable, which means it can be moved back and forth between a cloud-hosted datacenter and your on-premises environment without any modifications to the source code.
Maintentance tasks should be automated and include as few steps as possible to reduce the probability of human error.

Nowadays, public cloud vendors offer quite different functionality as regards application maintenance. While some of them concentrate on rich web-based management UI, others invest their efforts in building a powerful API to automate these tasks. The more experienced and mature vendors do both. With this in mind, you have to weigh your typical operation tasks against the management features provided by concrete cloud vendor.

Having had some experience with migrating on-premises applications to Windows Azure, we must admit that while the new Metro-style management portal is quite pleasant and easy to use, it does not yet provide some features commonly required by our IT pros. For example, automatically backing up Windows Azure SQL Databases and restoring them locally is possible, but involves quite a lot of manual steps. Things become a little more difficult when you encounter such tasks as backing up data from on-premises applications to cloud storage as well as restoring such backups later: if you use private blob containers, managing such blobs is quite tedious because of the lack of UI tools.

In order to help the operations staff with common tasks, we have developed a few automated command-line tools that utilize various Windows Azure APIs behind the scenes. The source code is released under MIT License and is available on GitHub.

1. Backup Windows Azure SQL Database to Blob Storage.

This tool allows you to perform an automated backup of your SQL Database and store the backup to your Windows Azure Storage account as a blob (in BACPAC format). Later, this backup can be used to restore the database on another cloud SQL Database server as well as an on-premises Microsoft SQL Server instance. Internally, this tool utilizes DAC web service endpoints hosted on Windows Azure datacenters. Note that for every location the URL of the web service is different.

Usage example:


  AzureSqlDbBlobBackup.exe
    --dac-service-url https://by1prod-dacsvc.azure.com/DACWebService.svc
    --db-server-name abcdef1234.database.windows.net
    --db-name Northwind
    --db-username db-admin
    --db-password db-admin-secret
    --storage-account northwindstorage
    --storage-account-key s0e1c2r3e4t==
    --blob-container backups
    --blob-name NorthwindBackup.bacpac
    --append-timestamp

2. Archive local files or folders to Blob Storage.

This tool allows you to upload zip-compressed copies of your local data to Windows Azure Storage, which can be helpful if you frequently use cloud as a reliable off-site storage for your digital assets.

Usage example:

ZipToAzureBlob.exe --source-path E:\MyData --compression-level=9 --storage-account northwindstorage --storage-account-key s0e1c2r3e4t== --blob-container backup --blob-name MyDataBackup.zip --append-timestamp

3. Download files from (private) Windows Azure Blobs.

The purpose of this tool is quite straightforward: it enables you to download a blob from Windows Azure Blob Storage to your local filesystem, which works especially good when the blobs are stored in a private container, thus not so easily downloadable from the management portal. This tool, combined with the zip-archiving tool above, provides a pretty quick and easy solution for automating the process of data backup/restore that utilizes reliable cloud storage.

Usage example:

DownloadAzureBlob.exe --storage-account northwindstorage --storage-account-key s0e1c2r3e4t== --blob-container backup --blob-name MyDataBackup.zip

Storage Emulator notice

Since these tools are primarily intended to be used in a production environment, we did not currently add support for Windows Azure Storage emulator (UseDevelopmentStorage=true), although stay tuned for upcoming updates to our GitHub repository.

11/09/2012

HTML5 Canvas: performance and optimization. Part 2: going deeper

Last time we were talking about JavaScript optimization, we used some basic optimization techniques to achieve better performance of Flood Fill algorithm on HTML5 Canvas. Today we’re going to go deeper.

We discussed results internally and came out with several low-level fixes. Here they go:

Minimize created objects count

First of all we thought about enormous number of objects we created during execution of our algorithm. As you probably remember, we’d applied fix named ‘Temp object creation’, when we tried to minimize number of performed arithmetic operations. It had negative effect on performance because of increased memory allocation and garbage collector overhead. So, the less objects you created – the better performance you get. It is not hard to notice that most of objects in our code are created here:
What if we don’t create new objects here at all? Let’s store in stack individual coordinates instead of creating wrapper objects. Sure, it makes code more complicated and unreadable, but performance is our main goal today. So, we came out with this:
Results follows:

Please note, we removed two bad fixes from the previous article. We’ve got nice results in all browsers, but in Safari results were really amazing: about 45% performance boost. If we remember bad “Temp object” fix from the previous article we notice that Safari was dramatically slower than other browsers after that fix, so such result is logical consequence of some issues Safari has with objects allocation and/or garbage collection.

Inline functions

Let’s go deeper. Most modern compilers do function inlining automatically or provide you with ability to mark function with some kind of inline attribute (think C’s inline keyword or AggressiveInliningAttribute from .NET 4.5). JavaScript don’t allow you to do that. Although function inlining might have dramatic performance effect in case you call function very often. We call isSameColor about 2 million times and setPixelColor about 700K times. Let’s try to inline them manually:
Again, it makes our code less readable and understandable, but we really want better performance, so we don’t care about code readability here.

Isn’t it amazing? Absolutely incredible results: we’ve got 70% boost on Firefox, 56% on IE and 36% on Safari. And what about Chrome? Surprisingly it is 9% slower. It seems that Google already implemented automatic function inlining in V8 optimizer and our manual fix is worse than theirs. Another interesting thing: with that fix Firefox is almost two times faster than Chrome, previous leader.

Optimize CPU cache utilization

There are one thing we never thought before in context of such a high-level language as JavaScript: CPU cache utilization. It is quite an interesting topic and it deserves separate article. As for now you can read more about it here, for example.
ImageData array has two dimensions that are packed into one dimensional array line by line. So, 3x3 pixel matrix with coordinates (x;y), is basically stored in memory like that:

Let’s look at the following lines in our code:
Let’s think about the order we access neighbor pixels if we are in (1;1):

So, we’re going left, then once again left, then right, then again right. According to best practices of cache utilization optimization it is better to access memory sequentially, because it minimizes a chance to have cache miss. So, what we need is something like that:

Let’s rewrite our dx,dy arrays:

Here is what we’ve got:

Well, the only browser reacted significantly was Chrome: we’ve got about 10% better performance with it. Other browsers were up to 3% faster which is in the area of statistical error. Anyway, this is interesting result that means we should pay attention even to such low-level optimizations when we write code on JavaScript – they are still important.

Fixing own bug – reorder if statements

Those of you who read inlined code carefully might already noticed that we actually did a mistake there:

We check pixel color and only then make sure that we don’t go outside array bounds. In statically typed language we’d got some kind of IndexOutOfBoundsException or Access Violation error here. But in JavaScript arrays are basically hash tables, so noting prevents you from accessing negative index here. But due to the cost of array element access operation it makes sense to check array bounds before checking colors:

Results are surprising:

Most of the browsers results were in the area of statistical error, but Chrome was more than two times faster and got its crown of fastest browser back in our benchmark! It is hard to tell what the reason of such a dramatic difference is. Maybe other browsers use instruction reordering and already applied that optimization by themselves, but it looks strange Chrome don’t do it. There also may be some hidden optimization heuristics in Chrome that help it understand that stack is used as simple array, not hash-table and it makes some significant optimization based on that fact.

Conclusions

Low-level optimizations matter even if you write your code in high-level language such as JavaScript. Code is still executed on same hardware as if you write it on C++.
Each JavaScript engine is different. You can have significant performance boost in one browser, but at the same time your fix may make your code slower in another browser. Test your code in all browsers your application must work with.
Keep the balance between code readability and performance considerations. Sometimes it makes sense to inline function even if it makes your code less readable. But always make sure that it brings you desired value: it doesn’t make sense to sacrifice code readability for 2 ms performance boost for code that is already fast enough.
Think about object allocation, memory usage and cache utilization, especially if you work with memory intensive algorithms such as Flood Fill.

You can find all the results with exact numbers at Google Spreadsheets: https://docs.google.com/open?id=0B1Umejl6sE1raW9iRkpDSXNyckU
You can check our demo code at GitHub: https://github.com/eleks/canvasPaint
You can play with app, deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html
Thanks to Yuriy Guts for proposed low-level fixes.
Stay tuned!

11/08/2012

Cloud Solution for Global Team Engagement at Your Fingertips

Effective management of global teams is one of the most common issues for Localization services providers. Many LSPs are looking for ways to minimize their costs and time through automation. Generally, LSPs perform localization services engaging both their in-house staff and sub-contractors who are usually located all over the world.

To address this challenge, ELEKS has developed a cloud-base system that can be used by both subcontractors and in-house teams. This system allows for a full automation of products installation needed for the teams, whereas automation is synchronized with the products localization process.

The cloud solution has already shown its numerous benefits:

· measurability by projects, resources, hours

· increased efficiency

· decreased deployment and support (in man-hours)

· centralized storage and management system

· easier support and daily backups

· server uptime – 99.8%

Full Presentation in Prezi http://prezi.com/jyt9qe-5cqum/cloud-solution-for-global-team-engagement/

The solution was showcased at Localization World Conference in Seattle 2012 by Taras Tovstyak. The presentation included the case study and the tasty features of cloud system. Also there was presented a glimpse into the future of localization –the video of Dynamic Localization in action.

11/07/2012

HTML5 Canvas: performance and optimization

It's no doubt that HTML5 is going to be next big platform for software development. Some people say it could even kill traditional operating systems and all applications in future will be written with HTML5 and JavaScript. Others say HTML5 apps will have their market share, but never replace native applications completely. One of the main reasons is poor JavaScript performance, they say. But wait, browser vendors say they did lots of optimizations and JavaScript is fast as it was never before! Isn't it true?
Well, simple answer is yes... and no. Modern JavaScript engines such as Google's V8 have impressive performance in case you compare them with their predecessors five-ten years ago. Although, their results are not so impressive if you compare them with statically typed languages such as Java or C#. And of course it will be absolutely unfair competition if we compare JavaScript with native code written with C++.
But how one can determine if their application could be written in JavaScript or should they choose native tools?
Recently we had a chance to make such kind of decision. We were working on proposal for tablet application that should include Paint-like control where user can draw images using standard drawing tools like Pencil and Fill. Target platforms were Android, Windows 8 and iOS, so cross-platform development tools had to be taken into consideration. From the very beginning there was a concern that HTML5 canvas could be too slow for such task. We implemented simple demo application to test canvas performance and prove if it is applicable in that case. Leaping ahead, let us point out that we have mixed fillings about gathered results. On the one hand canvas was fast enough on simple functions like pencil drawing due to native implementation of basic drawing methods. On the other hand, when we implemented classic Flood Fill algorithm using Pixel Manipulation API we found that it is too slow for that class of algorithms. During that research we applied set of performance optimizations to our Flood Fill implementation. We measured their effect on several browsers and want to share them with you.

Initial implementation

Our very first Flood Fill implementation was very simple:

We tested it with 3 desktop browsers running on Core i5 (3.2 GHz) and 3rd generation iPad with iOS 6. We got following results with that implementation:

Surprisingly, IE 10 is even slower than Safari on iPad. Chrome proved that it is still fastest browser in the world.

Optimize pixel manipulation

Let's take a look at getPixelColor function:
Code looks little bit ugly, so let's cache result of ((y * (img.width * 4)) + (x * 4)) expression (pixel offset) in variable. Also it makes sense to cache img.data reference into another variable. WE also applied similar optimizations to setPixelColor function:

At least code looks more readable. And what about performance?

Impressive, we got 40-50% performance gain on desktop browsers and about 30% on Safari for iOS. IE 10 now has comparable performance to mobile Safari. It seems that Safari's JavaScript compiler already applied some of optimization we did, so effect was less dramatic for it.

Optimize color comparison

Let's take a look at getPixelColor function again. We mostly use it in if statement to determine if pixel already was filled with new color: getPixelColor(img, cur.x + dx[i], cur.y + dy[i]) != hitColor. As far as you probably know, HTML5 canvas API provide access to individual color components of each pixel. We use this components to get whole color in RGB format, but here we actually don't need to do it. Let's implement special function to compare pixel color with given color:
Here we use standard behavior of || operator: it doesn't execute right part of the expression if left part returns true. This optimization allows us to minimize array reads and arithmetic operations count. Let's take look at its effect:

Almost no effect: 5-6% faster on Chrome and IE and 2-3% slower on FF and Safari. So, problem must be somewhere else. We left this fix in our code because the code is little bit faster in average with it than without.

Temp object for inner loop

As you probably noticed, our code in main flood fill loop looks little bit ugly because of duplicated arithmetic operations:

Let's rewrite it using temp object for new point we work with:
And test effect:

Results are discouraging. It seems that side-effect of such fix is higher garbage collector load and as a result overall slowness of the application. We tried to replace it with two variables for coordinates, defined in outer scope but it didn't help at all. Logical decision is to revert that code, what we actually did.

Visited pixels cache

Let's think again about pixel visiting in Flood Fill algorithm. It is obvious that we should visit each pixel only once. We guarantee such behavior by comparing colors of neighbor pixels with hit pixel color, which must be slow operation. In fact, we can mark pixels as visited and compare colors only if pixel is not visited. Let's do it:
So, what are results? Well, here they go:

Again, absolutely unexpected results: IE 10 is 10% faster with that fix, but other browsers are dramatically slower! Safari is even slower than initial implementation. It is hard to tell what is the main reason of such behavior, but we can suppose that it could be garbage collector. It also makes sense to apply it in case you don't target mobile Safari and want to have maximum performance in worst case (Sorry IE, it is you. As usual).

Conclusions

We tried to make some more optimizations but it didn't help. Worst thing about JavaScript optimizations is that it is hard to predict their effect, mainly because of implementation differences. Remember, there are two basic rules when you optimize JavaScript code:

benchmark results after each optimization step
test in each browser you want your application work with

HTML5 is cool, but still much slower than native platforms. You should think twice before choosing it as a platform for any compute-intensive application. In other words, there will be no pure HTML5 Photoshop for a long time. Probably you can move some calculations to server-side, but sometimes it is not an option.
You can check our demo code at GitHub: https://github.com/eleks/canvasPaint
You can play with app, deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html
Stay tuned!

UPD: Part 2: going deeper!

11/05/2012

How does good web application look like? Part 2: dev/ops point of view.

Last time when we were writing about web applications our main focus was on user perspective. It is time to discuss another dimensions. So, how does good web application look like for developers and operations team?

We prepared short list of most important properties for that guys:

Availability - ability to operate within a declared proportion of time. Usually is defined in a Service Level Agreement (SLA) as a specific number of "nines" (e.g. four nines = 0.9999, hence the system can be unavailable for at most 0.0001 time = one hour per year). Availability of your application is not the only matter of well-written code, but also depends on hardware, network configuration, deployment strategy, operation team proficiency and many other things.
Scalability - ability to serve increasing amount of requests without a need for architectural changes. In case you have scalable application you can simply add more hardware into your cluster and server more and more clients. In case you host your application in a cloud you even can dynamically scale it up and down making your application incredibly cost-efficient.
Fault tolerance - ability to operate in case of some unpreventable failure (usually hardware). Usually it means that system can lose some part of functionality in case of failure, but other parts should be working. Fault tolerance is related to availability and some people claim it to be one of the properties of highly available applications.
Extensibility - system functionality can be extended without a need for core and/or architectural changes. Usually in this case the system is extended by adding plug-ins and extension modules. Sometimes it could be quite tricky to implement extensibility, especially for SaaS.
Multitenancy - ability to isolate logical user spaces (e.g. individual or organization) so that the tenants feel like they are the only user/organization in the system. It sounds easy, but could be a challenge on a large scale.
Interoperability - ability to integrate with other systems usually by providing or consuming some kind of API. With comprehensive API your service can leverage full power of developers community. Consuming other services API you can extend your application functionality in an easiest way.
Flexibility - architectural property that describes the ability of the system to evolve and change without a need to perform significant changes in its architecture. Holy Grail of software architecture - it is almost impossible to achieve it for fast growing web applications, but you should always do your best.
Security - ability to prevent information from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction. Another critical property for both enterprise and consumer markets. Nobody wants their data to be available for unauthorized access.
Maintainability - ease of maintenance, error isolation and correction, upgrading and dealing with changing environment. Of course it is better to have system that doesn't require maintenance at all, but in real world even the best systems do require it. You have to provide comprehensive maintenance toolset to your operations team in order to have your system up and running most of the time.
Measurability - ability to track and record system usage. Usually it is required for analytic purposes and in pay-per-use scenarios. Even if you don't have pay-per-use scenarios it is always better to understand your hardware utilization rate in order to optimize costs.
Configurability - ability to change system look and behavior without need to change anything in its code. Being critical for web products that are installed on premise it is also important for software-as-a-service model.
Disaster Recovery - ability to recover after significant failures (usually hardware). This usually includes a disaster recovery plan that lists possible failure scenarios and steps the operations team should perform to recover the system from failure.
Cost Efficiency/Predictability - ability to operate efficiently in relation to the cost of operation. Being closely related to measurability this property concentrates on financial effectiveness of web application.

You have to account lots of things when you're developing your web application. Hope this list would be helpful for you. Stay tuned!

11/02/2012

New ELEKS web-site

Several facts about our new web-site:

pure HTML5/CSS3
responsive design - check it out on your mobile device
Metro/Windows 8 like interface
hosted on Windows Azure cloud
based on open-source Orchard CMS

Making of: