Tests that hit the database are slow, fragile, and difficult to automate. All the cool kids are using mocking and stubbing and in-memory databases to keep their unit tests isolated and fast, and that's awesome. I do that too (though my "cool kid" status is debatable).
However, there are times when talking to a real database is necessary. Maybe you're testing actual data access logic, or maybe you're writing some high end integration/acceptance tests, or maybe you're just working in an architecture that doesn't let you mock/stub/inject your way to isolated bliss. If any of that sounds familiar, then this post is for you!
Below is a list of strategies and suggestions for effective data testing that I've collected from years of experience testing large, "enterprisey", data-driven applications. Data tests will never be painless, but following these rules makes it suck less.
Rules for good data tests
- Tests should create their own scenario data; never assume it already exists. Magic row IDs kill kittens!
- Make liberal use of data helper and scenario setup classes.
- Don't use your data access layer to test your data access layer.
- Tests should make no permanent changes to the database - leave no data behind!
Rule 1: Tests should create their own data
One of the worst things you can do in a data test is to assume that some record (a customer, an order, etc) exists that fulfills your scenario requirements. This is a cardinal sin for many reasons:
- It's extremely fragile; databases change over time, and tests that rely on pre-existing data often break (causing false-negative test failures).
- It obscures the test's purpose. A test's setup communicates the data context in which our assertions are valid. If you omit that setup logic, you make it hard for other programmers to understand the scenario that you are testing.
- It's not maintainable; other programmers won't know what makes customer ID 5 appropriate for one test and customer ID 7 appropriate for another. Once a test like this breaks, it tends to stay broken or get deleted.
In other words: relying on pre-existing data means your tests will break often, are painful to maintain when they do break, and don't clearly justify why another program should spend time fixing them.
The solution is simple: each test should create each and every test record it will rely on. If that sounds like a lot of work, it can be.... but keep reading to see how to keep it manageable.
Rule 2: Liberal use of data helper and scenario setup classes
Setting up the supporting data for a test sucks. It's time consuming and generally results in a lot of duplication, which then reduces test maintainability and readability. Test code is real code and should be kept DRY like anything else!
I've found it useful to create two different types of helper classes to attack this problem:
- Data helpers are utility classes that expose methods for quickly creating entities in the system. These classes:
- Are generally static, for convenience.
- Exist in the test project, not the main data access project.
- Create a single object (or object graph), such as a Customer or an Order with its OrderItem children.
- Create data with meaningful defaults, but allow the important fields to be explicitly specified where needed. (Optional parameters in .NET 4 FTW!)
- Scenario objects (aka "fixtures") represent specific data scenarios that might apply to multiple tests, such as the scenario in which a Customer has placed an Order and one of the items is backordered. These classes:
- Exist in the test project.
- Have public properties that identify key data in the scenario (e.g. the Customer ID, Order ID, and backordered Item ID).
- Are instantiated by a test, at which time the scenario data is created.
In short, data helpers are low-level utilities for creating a specific data record in a specific state, while scenario classes represent larger contexts consisting of multiple entities. I have found that while the time needed to create these objects is not trivial, it quickly pays off as new tests are easier and easier to write.
Rule 3: Don't use your DAL to test your DAL
Tests for DAL code generally set up some data in the database, invoke the DAL, and then verify that the database was properly modified. I've generally found it difficult to use the primary DAL to quickly and concisely verify those assertions.
In some cases, the primary DAL may not expose a suitable API for doing record-level verification. For example, when there are significant differences between the logical schema exposed through the domain layer and the physical schema of the database, it may be impossible (or at least difficult) to write low-level data assertions.
In other cases, especially early in development, using the DAL to test the DAL creates dependency issues. For instance, many tests involve a sequence of events like "get entity by ID, save changes to it, then verify it was changed". If both the GetById() and Save() methods are currently under development then your test will give you inconclusive results until both methods are implemented.
In all of these cases I've found it valuable to verify data assertions using a LINQ to SQL data context. This provides a convenient, object-based representation of the data schema that is perfectly suited for verifying row-level operations were performed properly. This data context lives in the test project and is automatically regenerated (using SQLMetal.exe) whenever the schema changes, so it's a close-to-zero-effort solution.
You could also use a micro ORM like Massive, or anything else that makes it quick and easy to interact directly with the database.
Rule 4: Tests make no permanent changes to the database
Tests should be run often, and if you follow Rule #1 your tests create a lot of new data when they run. If you don't clean up after them, your test database will quickly grow in size. Also, if you point your test suite at the same database you use to run your app, you'll quickly get tired of seeing that test data accumulate in your views.
The easiest way to prevent this is to wrap each test in a database transaction, and then rollback that transaction at the end of the test. This performs the desired cleanup and also isolates tests running in parallel from interfering with each other's data.
There are a few different ways to approach this. Depending on your needs, check out this or this.
Conclusion
None of these techniques are particularly clever or game changing, but when used together they can significantly improve your data tests:
- When tests create their own scenario data, you don't need to run them against a particular known state. This reduces maintenance costs significantly.
- Investing in data helpers and scenario classes makes it easy to add new tests. The easier it is to write tests, the more likely that developers will actually do it.
- "Close to the metal" abstractions like LINQ to SQL make it easy to write row- and field-level assertions against the database.
- Adding some "auto rollback" behavior to your data tests keeps your database trim and tidy, no matter how many times you run your test suite.
Happy data testing!