Why the database is often overlooked in the ALM
So you have a great ALM process. Your developers are checking in regularly into Subversion. TeamCity triggers a build for you. The build succeeds and a ton of test run. You feel pretty good about your test coverage because code analysis tells you that 50 percent of your code gets tested. Once the tests pass, the code automatically deploys to a test server. You like peeking in there and watching the progress your developers are making, except every once in a while, a deployment to the test server fails. And sometimes, a deployment to production completely fails. You hear about issues with schema differences from time to time. This makes your spider sense tingle, so you roll up your sleeves and start investigating. Eventually, you discover that your database objects are not part of this infrastructure at all. Stored procedures don’t go into source control. Tables are never tested. Database objects are manually moved into test and then production.
And you were feeling pretty good about this process up until a few minutes ago. What happened? How could your fantastic coders let you down like this?
It turns out that putting a database into an ALM process is challenging and takes a lot of work. Here are the main reasons:
1) Source control is typically file based. An RDBMS’s files are always under management and impossible to get at. Even if you could put them in source control, they are in binary. This means that a developer needs to manually write out the objects to a file and then check them into source control. What is a one step process for Java files becomes a three or four step process for RDBMS files. Developers will avoid it without accountability.
2) Writing tests in stored procedures is difficult. Stored procedures can return tables and tables are notoriously difficult to test. Data can change, row counts can change, and multiple values are difficult to navigate. This means that extra effort is required to make this portion testable.
3) Where Eclipse and Visual Studio let you navigate between test and procedure, RDBMS’s files are notoriously bad for code navigation. This makes developer weary of writing tests.
4) Branching and merging requires multiple copies of the database. Forget that, just giving every developer their own copy of the database is a challenge. If the database is 500GB, this is a daunting task. It’s difficult to backup and restore that much data, particularly if your developers are addicted to branching.
5) RDBMS code is super easy to change in production. If developers were forced to put database objects in an ALM process, they would be cut-off from a quick and easy way of deploying hotfixes. They sometimes balk at that.
6) Databases are often shared between applications and processes. This means that changes to a database can affect work by multiple teams, which makes folding the database into an ALM process a political challenge.
But the database should be in your ALM process. Overlooking it has the following major drawbacks:
1) Poor schema changes will make the same processes break over and over again.
2) Data problems can lead to lost data. RDBMS restore processes that are complicated and lengthy.
3) Poor schema changes can cause a loss of service to multiple apps, leading to expensive downtime.
4) A poorly groomed database is a complicated database, filled with poorly named fields, hard to discover data, and reports that don’t align with each other. This is a symptom of a database the developers don’t want to change because they don’t have it under test.
Now that the depressing stuff is out of the way, what’s the solution?
Folding the database into your ALM process is going to be challenging, it’s going to require effort and elbow grease, and it’s going to be a long process. I recommend introducing change gradually and with precision. It’s difficult for me to cover all the ways to do this, considering the RDBMS products are so diverse. I’ll use Microsoft SQL Server for example, but there are similar products and processes for PostGres, Oracle, and MySQL.
There are three main ways to put a SQL Server under source control:
► Manually–The developer scripts out every table/stored procedure/trigger/function/view by hand. Any changes to these objects require the developer to script out the object and then check it like in any other file. You can use any source control system you want.
► Visual Studio Database Project–This visual studio type will script all of your database objects out for you and check them into any source control system you want. This is an offline method of managing database objects and it requires you to publish your code changes to a development server by hand, but it will still save a ton of time over the manual method.
► Red Gate–This is an online method for source control. Red Gate connects directly to the SQL Server and does not allow database changes unless the objects are checked out and then checked back in. Red Gate handles conflicts and merging.
There is a free and open-source testing framework for SQL Server called tSQLt. It works like JUnit, NUnit, and others, but is meant for the SQL language. It has all the popular mechanisms you expect in a unit testing framework, like Assert, Failure, and Mocking. In addition, it rolls back every completed test, so tests are repeatable. It also allows you to inspect tables.
In addition, RedGate and Microsoft each have respective tools that make code navigation much easier.
Automatic Deployment/ Continuous Integration
There are two products that can help here. Microsoft Team Foundation Server (TFS) has hooks into SQL Server. If you’re using a different build server, like TeamCity, CruiseControl, Jenkins, or Hudson, then RedGate has a plugin that will work with SQL Server.
The point is, with a little research and a lot of effort, you can find products to help you with all of the above challenges. Well, not all. If your database is 500GB, you’ll have to pair that down manually to a manageable level in order to have multiple copies for developers. You have to do that anyway in order to be HIPAA and/or PCI compliant.
If multiple applications share databases, this is a symptom of not having a wide data management strategy or architecture. You’ll hit change lock soon enough and will require slowing down the weeding of your overgrown data garden.
But things can get better, as long as you commit to keeping the database a major component of your ALM process, change things slowly, and don’t take no for an answer.