Lies, damned lies and non production-like performance testing

Chasing cost efficiency, business often cuts back on money spent on UAT boxes used for performance testing. More often than not, this is a bad-decision, because the only thing worse than not having a UAT environment is having a UAT environment that is nothing like production. It gives a false sense of security while exposing your application to all sorts of nasty surprises. In this post I tried to summarize a few typical configuration differences between UAT and production which can affect performance test results in a major way.

CPU

It may seem such a waste to have as many CPUs on a test system as on production. CPUs are expensive. Production really needs them. And with UAT — well, you can always scale your workload down during testing! Or can you?

Not really — not everything behaves linearly. And even if you could, it will not solve all problems. For instance, there are many parameters that Oracle sets based on the number of CPUs on the system. And if your production server has more, this can make a lot of difference. Log parallelism discussed in my previous post would be a good example of that: if you don’t have enough CPUs on your UAT then the chances of catching the redo I/O serialization problem before going live are virtually zero.

Memory

Imagine that your production has so much memory that you can do okay with regular day-to-day UAT testing on the half of memory your production server has. But then one day you need to make a change to your schema that involves backpopulating some big tables, and your release window is really tight. Would you be able to finish it on time or not? How can you figure it out if your buffer cache is two times smaller compared to production? You cannot simply re-scale your UAT results: if your buffer hit ratio for that process is 90% it doesn’t mean that it will be 180% on production. You simply won’t have an answer you could trust.

Another scenario: your production box has more memory because it also serves as a disaster-recovery solution for another database or application. Of course it would be a waste to over-allocate memory like that on UAT, too? No, it wouldn’t. Unused memory is used by the operating system for buffering I/O operations, and without extra memory you’ll have no OS-level caching on UAT, which would make test results incomparable to production for non-functional testing.

Replication

As a part of a disaster-recovery solution, your production system may involve synchronous storage-level replication to a remote site. UAT storage is rarely replicated, which means that I/O latencies between UAT and production can differ up to an order of magnitude. Then for example if one day you deploy a new write-intensive application component and within a few hours ‘free buffer waits’ bring your production performance to its knees (even though performance tests were fine), don’t be surprised.

ASM vs filesystem

There might be a myriad of reasons why you’d want to put production on a filesystem and UAT on ASM or vice versa, but none of them is good enough to outweigh all the trouble you’ll be getting yourself into. Such a dramatic difference in storage would cause problems testing patches and upgrades, but more importantly, it would mislead you about things that behave differently on ASM and filesystems (once again, we can use log parallelism as an example).

Archivelog

Because production databases need to be able to perform point-in-time recovery, they would typically run in ARCHIVELOG mode. Most UATs, on the other hand, are set up in NOARCHIVELOG, because it’s simpler that way. When database is put under stress, ARCHIVELOG can make a lot of difference (for instance it can negatively impact log writer performance because read and write operations against same I/O devices share the same I/O bandwidth).

Workload

It is a common practice to disable heavy jobs and processes on UAT databases because “they impact performance”. Doing so defeats the very purpose of such databases. They don’t exist so you could enjoy best performance possible. They exist so that you could get an early warning when you’re about to deploy something that would cause a problem on production.

To sum up: setting up a production-like database for performance testing costs time and money, but it is time and money well spent.

4 thoughts on “Lies, damned lies and non production-like performance testing”

  1. I don’t know how you know that most UAT/PT environments are set up in non-archivelog mode. In my experience, perhaps because I recommend it in the first place, both archive logging and flashback logging are enabled to allow guaranteed restore points for repeat testing.

    1. John, maybe my sample of UAT/PT databases was biased. I don’t think it really matters that much if it’s “most” or just “many”. My goal was not to conduct a sociological study to see what % of database have certain kinds of configuration differences compared to production. My goal was to list certain things often affect performance test results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s