2015/04/02

Random Data in Test Automation

Using random data can be pretty tricky. As always there are two types of people - who uses random a lot, and who don't.

Let's compare these two ways.

Random No Random
  • Every run you get a new set of data to use
  • Coverage is increasing with each test run
  • You don't need to think about duplicates in data, assuming random is working well
  • Every run is on same (or identical) set of data
  • Pesticide Paradox in place
  • You need to handle duplicates - delete data from previous runs, or in some other way
From the table above using Random seems to be the best choice, but there are some pitfalls.


Tests are not repeatable

One of the main ideas about automated testing is - tests should be repeatable. This means that no matter how many times you run the same test against the same system - it should provide the same outcome.

With using random values - this is not the case. You always run slightly different test case, and if some value appears to be buggy - you will see this only in logs, and won't be able to run the automated test on the same values again.

Tips:
  • Even though your tests are using random data - leave an option to provide specific data for test run.
  • Write separate test cases with specific data for found bugs. This will help you to find regressions.


Random generated value needs to be in the same Equivalence class

If you don't know about partitions - please read about Equivalence Partitioning. There is a good article about it on Software Testing Class -  Boundary Value Analysis and Equivalence Class Partitioning With Simple Example

Let's assume that we need to test checkout in online shop - we need to select a product, and enter quantity of items we want to buy. And the requirement is - nobody can order more than 10 pcs.

So valid values will be 1-10 items, and invalid values - below or equal to 0, above 10.

As you see we can't just put random number in here - we need to limit randomness range.

Your testing solution need to be able to generate two kind of random quantity values - valid and invalid. 

And for such a simple case it sounds easy and obvious. But when it come to something more complex (for example Full name for a person, Address or Filename) - preparing random data can be troublesome.

Tips:
  • Randomly generate only valid values for positive cases, you probably have some rules for valid values and you can stick to them
  • Use non-random values for negative cases - range of invalid values is usually far bigger than valid ones, while outcome of test coverage is pretty small
  • Separate generation of random data from other code. If you will have some separate Class/Generator which will contain all the rules - it will be much easier for you to modify the rules


DB overload with randomized test data

If you don't care about clearing you data - you will soon be overloaded.

So no matter that random values give you an opportunity to bypass duplicates validation - you need to keep in mind this validation, and clear the created data.


Timestamp is not effectively random/unique

Let's say you need a Last name - just add an UNIX timestamp to the end and it will be random each time. Easy and seems unique. But in fact it's not.

UNIX timestamp is unique only across one machine for executing tests, without any changes of time and timezone (including DST). But in automation we usually run the tests across different machines - so timezones can overlap and provide duplicates for our "unique" values.

Tips:
  • Use GUID or some similar mechanism to maintain uniqueness
  • If your random value contains some illegal characters (for example dashes in GUID) - just replace them with some specific character. The value will still be unique.
  • If you are still using Timestamps - don't try to truncate or process them in any way, because any processing will limit it uniqueness even more.


In summary

If you are not concerned about Pesticide Paradox - I recommend you to not deal with Random data in your tests, or use it as least as possible. This way you won't need to handle all the things mentioned above, but you will need to implement some mechanism of clearing created data.