We discussed results internally and came out with several low-level fixes. Here they go:
Minimize created objects count
First of all we thought about enormous number of objects we created during execution of our algorithm. As you probably remember, we’d applied fix named ‘Temp object creation’, when we tried to minimize number of performed arithmetic operations. It had negative effect on performance because of increased memory allocation and garbage collector overhead. So, the less objects you created – the better performance you get. It is not hard to notice that most of objects in our code are created here:What if we don’t create new objects here at all? Let’s store in stack individual coordinates instead of creating wrapper objects. Sure, it makes code more complicated and unreadable, but performance is our main goal today. So, we came out with this:
Results follows:
Please note, we removed two bad fixes from the previous article. We’ve got nice results in all browsers, but in Safari results were really amazing: about 45% performance boost. If we remember bad “Temp object” fix from the previous article we notice that Safari was dramatically slower than other browsers after that fix, so such result is logical consequence of some issues Safari has with objects allocation and/or garbage collection.
Inline functions
Let’s go deeper. Most modern compilers do function inlining automatically or provide you with ability to mark function with some kind of inline attribute (think C’s inline keyword or AggressiveInliningAttribute from .NET 4.5). JavaScript don’t allow you to do that. Although function inlining might have dramatic performance effect in case you call function very often. We call isSameColor about 2 million times and setPixelColor about 700K times. Let’s try to inline them manually:Again, it makes our code less readable and understandable, but we really want better performance, so we don’t care about code readability here.
Isn’t it amazing? Absolutely incredible results: we’ve got 70% boost on Firefox, 56% on IE and 36% on Safari. And what about Chrome? Surprisingly it is 9% slower. It seems that Google already implemented automatic function inlining in V8 optimizer and our manual fix is worse than theirs. Another interesting thing: with that fix Firefox is almost two times faster than Chrome, previous leader.
Optimize CPU cache utilization
There are one thing we never thought before in context of such a high-level language as JavaScript: CPU cache utilization. It is quite an interesting topic and it deserves separate article. As for now you can read more about it here, for example.ImageData array has two dimensions that are packed into one dimensional array line by line. So, 3x3 pixel matrix with coordinates (x;y), is basically stored in memory like that:
Let’s look at the following lines in our code:
Let’s think about the order we access neighbor pixels if we are in (1;1):
So, we’re going left, then once again left, then right, then again right. According to best practices of cache utilization optimization it is better to access memory sequentially, because it minimizes a chance to have cache miss. So, what we need is something like that:
Let’s rewrite our dx,dy arrays:
Here is what we’ve got:
Well, the only browser reacted significantly was Chrome: we’ve got about 10% better performance with it. Other browsers were up to 3% faster which is in the area of statistical error. Anyway, this is interesting result that means we should pay attention even to such low-level optimizations when we write code on JavaScript – they are still important.
Fixing own bug – reorder if statements
Those of you who read inlined code carefully might already noticed that we actually did a mistake there:We check pixel color and only then make sure that we don’t go outside array bounds. In statically typed language we’d got some kind of IndexOutOfBoundsException or Access Violation error here. But in JavaScript arrays are basically hash tables, so noting prevents you from accessing negative index here. But due to the cost of array element access operation it makes sense to check array bounds before checking colors:
Results are surprising:
Most of the browsers results were in the area of statistical error, but Chrome was more than two times faster and got its crown of fastest browser back in our benchmark! It is hard to tell what the reason of such a dramatic difference is. Maybe other browsers use instruction reordering and already applied that optimization by themselves, but it looks strange Chrome don’t do it. There also may be some hidden optimization heuristics in Chrome that help it understand that stack is used as simple array, not hash-table and it makes some significant optimization based on that fact.
Conclusions
Low-level optimizations matter even if you write your code in high-level language such as JavaScript. Code is still executed on same hardware as if you write it on C++.Each JavaScript engine is different. You can have significant performance boost in one browser, but at the same time your fix may make your code slower in another browser. Test your code in all browsers your application must work with.
Keep the balance between code readability and performance considerations. Sometimes it makes sense to inline function even if it makes your code less readable. But always make sure that it brings you desired value: it doesn’t make sense to sacrifice code readability for 2 ms performance boost for code that is already fast enough.
Think about object allocation, memory usage and cache utilization, especially if you work with memory intensive algorithms such as Flood Fill.
You can find all the results with exact numbers at Google Spreadsheets: https://docs.google.com/open?id=0B1Umejl6sE1raW9iRkpDSXNyckU
You can check our demo code at GitHub: https://github.com/eleks/canvasPaint
You can play with app, deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html
Thanks to Yuriy Guts for proposed low-level fixes.
Stay tuned!
V8 does not like out-of-bounds array accesses. If you code constantly accesses arrays out-of-bounds then this function will end up running in the non-optimized code.
ReplyDeleteThe reason for that is mostly due to complicated semantics of JavaScript: out of bounds access has to traverse prototype chain etc.
The reodering of conditions that you did actually changes semantics of the code so I doubt any engine actually does it.
Erf, that's slow...
DeleteI get ~5ms on every browser for floodfilling a bitmap of 940x720 px using
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/display/BitmapData.html#floodFill()
And it's a real paintbucket not a fillrect()
I can get even better performance with C++, but the problem is that it doesn't work in browser (unless you use ActiveX, which is ridiculous).
DeleteFlash is fast, but it doesn't work on modern tablets and smartphones by default.
Browser vendors should think about this problem more if they want to replace native apps completely.
The array pop,/push approach well probably make the browser hang in some cases, i'd use slice once I've reached the maximum number of elements I want in the array
ReplyDeleteCould maybe hitcolor and newcolor 3 calcs be precalculated outside the loop?
ReplyDeleteNextpointoffset can be autoincremented instead of calculated inside the loop
ReplyDelete