ELEKS Labs: July 2013

7/24/2013

Boosting Android Apps Performance

While mobile vendor giants struggle to produce the most intellectual device, it appears that a huge amount of users want to buy a device as cheap as possible but still prefer smartphones over old-school feature-phones. This trend is mostly driven by a huge demand from emerging markets such as China and other Asian countries. Sub-$100 Android devices become a significant part of the market. There are rumors that the next version of Android will be aimed at low-end phones market, which means that this trend will continue to emerge.
This trend is leading to a great opportunity for app developers. However, with every great change come new challenges. Low cost devices usually have limited memory, small screen, slow CPUs and other limitations. App developers should care about performance of their code like never before.
There are three basic rules of writing applications with a great performance:

Write efficient code.
If the code is not efficient - profile and fix it.
Continuously ensure that code is still efficient (ideally, after every single commit to your source control system).

While items 1 and 2 are obvious, item 3 requires additional explanation. Our experience shows that most software developers start to care about performance of their app after users already posted angry comments on Google Play. It comes even harder to track app performance metrics when your app is big and complex and you ship new versions frequently. The problem is that it is much easier to fix performance issue right after it was introduced than in case it is here for a few weeks and tons of code dependent on that issue were written. Of course manual regression testing is inefficient in this case - you simply can’t do it frequently enough. Fortunately, there are things like continuous integration that make it possible to automate this routine and thus detect and fix performance issues as early as possible.
In this document we are going to focus on items 2 and 3, as the art of writing efficient code is bit complicated for a single publication.

Performance Optimization

Performance tuning of every Android application differs from application to application. However there is a general flow. It looks like this:

This is the so-called measure-evaluate-improve-learn cycle. This process can be applied to a variety of life activities. When tuning Formula 1 car, the same technique would be used. You just need different tools.
To start the optimization cycle we need to learn about the problem This can be done in the beginning of development, after manual testing with unsatisfactory performance results. This can be a report from CI tool, which was triggered after unsafe commit. This can also be a report from users on Google Play with complaints about low performance on specific device. Such an optimization may be needed before implementing new features, which is dependant on existing, not-fast-enough functionality.
After we learned about the problem we start the optimization. In order to improve performance, we need to identify the problem. And this is the place where measuring comes handy. Android SDK has built-in capabilities for profiling. With the help of traceview and DDMS one can profile and analyze performance of the application. Apart from that, measuring is helpful for automation of testing process, by giving determined data for comparison.
After the measurements are done, detecting the bottleneck is pretty straightforward. When the problem is found, it’s time to fix it by optimizing the code. That’s where you would need all your creativity and knowledge of the domain. If you know some hardcore techniques and cheats which still maintain the sustainability of code, try them! In case of Android, think about NDK and RenderScript usage. Good idea is to check how often GC is called inside your app. On low-cost devices heap size can be surprisingly small, which will lead to often garbage collection and therefore low responsiveness. Good practice is reusing memory, instead of allocating new blocks. Also in case of intensive IO operations, asynchronous reads/writes are a must, and probabilistic variations of algorithms together with intelligent caching techniques can save the day.
After each step of improvements we should again test the application, i.e. learn if the problem is fixed. If the results are not satisfactory, the process goes all over again until the problem is gone.
Performance optimizations usually involve a decent amount of refactorings. In order to make refactorings safe it’s a good practice to cover code with tests. In this case Continuous Integration tools can greatly simplify testing and maintenance job.

Continuous Application Support

Rome wasn’t build in a day. Even though people can make alpha mockups of applications in several days, a good application needs a lot more time to become mature.
When you own 1 application, you can choose couple of target devices and test your app on them. If you have 2 apps, you need twice as much time to test everything by hand. If you are an established mobile development company with a big mobile portfolio, costs for manual application testing can blow up your budget. Moreover, it’s hard to track how all the apps evolve over time.
Continuous Integration tools automate the process of application testing, deployment and maintenance. They help finding bugs early, save time and money during the lifespan of the project.
In case of Android projects, CI tools have proven their effectiveness, especially when there is a need to test applications on a variety of configurations. Currently, the most used CI for Android is Jenkins. It has an active and vibrant community of users, lots of useful plugins and is easily extensible.
Below is the proposed architecture for the private continuous integration testing environment, based on Jenkins CI:

This environment would allow:

testing multiple device configurations
tracking multiple projects
measuring test execution performance over time
analysing source code on every commit
finding android lint issues
automating deployment tasks

Actually, there’s almost no limit for the Jenkins CI improvements and customizations. With the help of custom plugins this system can be enriched with various kinds of actions, which automate daily DevOps routines.
Of course, there is no silver bullet and implementing CI environment also needs a lot of resources, intelligence and of course people. Tests have to be implemented by people who have programming skills, testing infrastructure has to be set up and maintained by an administrator, and whole environment needs additional hardware and software expenses. The effectiveness of automation start to show off after certain amount of manual work redundancy, which is often the case with Android development.

Conclusion

Keeping up with fast-changing industry standards is far from trivial. Supporting thousands of different devices with different OS versions each is indeed not easy. Supporting several apps for this market is even harder. Keeping it all fast is close to impossible. Maintaining this with every new release or bugfix is a pure nightmare. Without a right tool.
Implementing functionality changes is very similar to production of goods on factory. Every good, as every update, is designed with accuracy and intelligence. But before reaching end users each one of them passes same strict quality control, keeping the production massive, continuous and reliable.

by Ostap Andrusiv, Victor Haydin and Markiyan Matsekh

7/11/2013

A short note on automatic differentiation

Do you remember your undergraduate calculus course? Frankly speaking, I don’t. But I do remember a big blackboard completely covered in chalk scribbles. That was a nightmare! Hopefully, those times are gone. I mean, why would we care about derivatives now?

Here at ELEKS, we do care about derivatives. We create and deliver precise models of various dynamic processes. The application field doesn’t matter: it may be physics, finances, or biology. What really matters is once you need to simulate something – most likely you’ll need to compute derivatives. And you need to compute them fast and with great precision. Thus, we created ADEL – a C++ template based-library that fits our needs. But let’s start with some background information.

The requirements listed above do not allow for solutions such as symbolic derivation or finite difference schemes. The first approach generates the exact formulae for derivatives of all levels and directions. This is a very memory/time consuming method. It produces enormous formulae. While the second one computes the numerical approximations of the value of a derivative. These suffer from various round-offs, cancellation and discretization errors.

Fortunately, there is a middle ground – automatic differentiation (AD). The way it computes derivatives is essentially through a chain rule. It generates evaluations (but not formulii) of the derivatives. As the result, there are none of the drawbacks(accuracy loss) of numerical differentiation where the step size needs to be carefully chosen to avoid errors. All intermediate expressions are evaluated as soon as possible; this saves memory, and removes the need for later simplication. Moreover AD is not as complicated as symbolic representation.

There are two modes of automatic differentiation – the forward mode and the reverse mode – both use different directions of the chain rules to propagate the derivative information. There are two main implementations of automatic differentiations:

Operator overloading is straightforward and easy to understand – one overloads the operators not only to compute function values but also to compute derivative information or build up the computational graph.

Source transformation works like a compiler, which reads in the code that computes the function, analyzes the code and exports a source code that computes the derivatives. This implementation normally generates more optimized programs but involves a lot of implementation efforts.

ADEL contains the implementation of operator overloaded versions of both forward and reverse modes. This is an open source project; one may find all the required materials at: https://github.com/eleks/ADEL.

The main feature of ADEL’s forward mode is careful handmade optimization. Template based implementation with loop unrolling is compiled into very efficient sequential code. This implementation suits GPU computing architecture very well. The support of CUDA makes the solution truly unique.

The reverse mode of the ADEL library is special in the way it stores the computational graph. It uses the stack of the application for this purpose. Each overloaded operation creates the data structure that is placed on the top of the stack; the life-time of these structures is just enough to compute the derivative data. This makes the algorithm extremely efficient since the most of the operations could be inlined.

We compared our implementation with some of already existing solutions in order to see how it stands up against the others. We implemented Newton’s method for testing the accuracy of our tools. For testing the performance, a random expressions generator was used. It is capable of generating some monstrous things (your professor would never dream of such things). Here is an example of a test function:

template<typename ADType>

ADType TestFunction(const ADType(&x)[6]) {

y[0] = x[1]; y[1] = -2.730; y[2] = 4.555;

for (int i = 0; i < 1000; i++) {

ADType y[3];

y[0] = x[1]+x[5]+4.624/x[0]+(x[3]-y[0])/x[4]+(atan(-y[2])-x[2])*y[1];

y[1] = (sin(y[1])-y[2])/x[2]/x[4]*(x[3]+log(x[0])/(y[0]-x[5]))+x[1];

y[2] = atan(x[5])*(-y[2])/x[3]/(x[4]/(x[2]/x[1]+acos(y[1])))*y[0];

}

return 2.658/log(x[0])*(log(y[2])/x[4])*(x[1]+x[5])*(y[0]/y[1])*x[3];

};

The main characteristic of the test case is the number of independent (x) and dependent (y) variables as well as total number of functions/operators. The given task was to compute the gradient and hessian of the return value with respect to all independent variables. In the example above, we have 6 independent and 3 temporary (dependent) variables that produce a single output.

We tested the following third-party AD implementations:

FADBAD (http://www.fadbad.com/fadbad.html) – a lightweight, user-friendly library. We used a combined mode: the gradient was computed via the reverse mode, while the hessian – via the forward mode.

ADOL-C (http://www.coin-or.org/projects/ADOL-C.xml) – a huge math package that is capable of computing literally everything. The reverse mode was used in the experiments.

There were four test cases of differing sizes with 5/10/20/30 input and 1/5/10/15 intermediate variables. The results were normalized, in such way that the slowest execution was scored with 100 time units, therefore less is better.

Looking at the results one can see that we did not create an ultimate AD tool - not yet. In smaller tests both ADEL modes greatly overrun their rivals. When the number of variables increases, the performance of ADEL [Forward] degrades because of the large memory overhead per variable. ADEL [Reverse] shows the best average performance; FADBAD comes second; ADOL-C received no prize at all.

Automatic differentiation is a very promising but extremely underused tool. The majority of the community still does not recognize it. This article is just a small but solid step into the bright future of AD tools. As a final word, we encourage you to download the source code and to experiment with ADEL on your own. If there is something we can improve, feel free to contact us.

7/09/2013

Games Localization – Mind the pitfalls

When working on the adaptation of the game, we keep seeing the same issues, which mostly result from bad game internationalization. Open source projects are the most notable example.

Internationalization bugs are usually caused by the fact that the game was not even considered to be localized in future. This is where the troubles for localization engineers start. It does not necessarily mean that it is impossible to adapt these games for other locales, but it makes the localization much more difficult.

As it often happens, if the game was developed without localization in mind, the whole localization process becomes not just a mere translation of text strings and fixing geometry bugs or adapting the graphics. It is an incredible process of looking for work-arounds at different program levels.

As a rule the internationalization problems require attention of developers to avoid extra complicated fixes multiplied by the number of target languages.

In particular, I would like to draw your attention to the following types of problems:

Hardcoded strings. These are texts coming from the functional part of the game that are visible to users. Even having access to the source code but without knowledge of the whole project structure, it is difficult to get the hardcoded strings translated without the risk of bugs which will be caused by this translation.
Regional settings support. This presents the most painful issue: support of fonts, correct displaying of regional standards for dates, units etc.
The last point worth highlighting is the separation of functionality and UI parts. Otherwise the engineering of localized files will become a headache for localization engineers.

Please keep in mind that even if you work on the realization of the fantastic idea and want to implement it as quickly as possible, do not forget that this great game may get so popular that users from other locales would want to play it too… in their native language.

by Ihor Kuznietsov

Senior Localization Engineer