Is Your SQL Server Environment Stable? Check 5 Key Areas: CLA

The human body can survive weeks without food. It can go days, possibly, without water, but only a few minutes without air. These are our bodies’ basic needs.

In 1943, a psychologist by the name of Abraham Maslow came up with what is referred to as “Maslow’s hierarchy of needs.” They are, in order from the bottom (most important) of the pyramid building on each other: physiological, safety and security, love and belonging, self-esteem, and self-actualization.

Databases, regardless of system, have a hierarchy of needs as well, but many professionals don’t always view it in that light. Take SQL Server, for instance. SQL Server is a relational database management system designed to store, retrieve, and manage data in a structured format using SQL (Structured Query Language). Overall, SQL Server is a comprehensive and versatile platform used by organizations of all sizes for managing their data efficiently and securely.

Brent Ozar, a SQL Server Master, came up with a hierarchy of database needs: Backups, Security, Capacity, Performance, and Future Proofing. If your team spends effort addressing these needs out of order, you run the risk of having an unstable database environment.

In this blog, I’ll unpack each of these needs, paying special attention to backups and performance, to help you better understand the importance of a stable SQL server environment in your manufacturing organization.

1. Backups

When I am asked to look at a client’s SQL Server environment, the first thing I do is review the database backup strategy. If anything doesn’t look right, I start asking questions about the setup. Some clients use third-party tools to manage database backups, and that’s fine. But I always want to make sure the databases are properly backed up before attempting any maintenance or configuration changes.

There are two key concepts in database backups that can support stable SQL server environments: Recovery Point Objective (RPO) and Recovery Time Objective (RTO). They are defined as follows:

Recover Point is the latest available point in time to which a database can be restored.
Recover Time is a measure of how long it takes to restore a database.
RPO and RTO are simply the stated desired amount of time (objectives) involved in these two.

Suppose you are running FULL database backups every night at midnight. At noon the following day, it is discovered something bad happened sometime that morning. The RECOVERY POINT is as of midnight the night before, or 12 hours. If it takes 4 hours to restore the database, that is the RECOVER TIME. Now your business has just lost an entire days’ worth of data.

The decision of deciding the Recover Point Objective and Recovery Time Objective needs to be made jointly between operations and finance leads, not the IT managers. The latter needs to be asking some probing questions of the former:

“How much time (data) are you willing to lose in the event of a database corruption event that requires a full restore?” This is the data (business, sales, orders, invoices, etc.) that has already been generated and recorded but is now lost.
“How long are you willing to wait after a corruption event to get the database back online?” Data workers are going to be sitting idle while the database is restored.
“How much are you willing to budget to get these times to an acceptable level?”

This last question is always the kicker. In responding to the first two questions, the answer will probably be something like “As little as possible, obviously: One minute of lost data, and one minute of restore time.” When you drop the hammer that a support setup would require multiple data centers, clustered servers, Always-On Availability Groups, etc. with a price tag around $1 million, managers and finance people start to think about a compromise.

The takeaway here is that the RPO and RTO need to be achievable within the assigned budget, and that the business, not IT, decides what is acceptable. IT just implements the solution. If a business does not allocate enough resources to establish decent RPO and RTO, they can’t complain when unacceptable amounts of data are lost. On the other hand, if IT does not communicate back to the business what RPO and RTO they have implemented within the given budget, data-loss catastrophes may be on them.

To maintain stable SQL server environments, these recover objectives should be well documented in writing for management to see and review as necessary.

Another aspect that many leave out is testing of backups. I have adopted a mantra many years ago of, “Database backups are only as good as the last time you tried to restore them.” In other words, don’t blindly assume that your backups are good. Do a regular test by restoring a database. Also, measure how long that takes. Does that meet the RTO?

2. Security

Stable SQL server environments include review of security. Server security is quite complex, but not that much different than other systems’ security. It boils down to keeping out those who have no business being in, and letting in those that need it. There are obvious complexities and nuances around server certificates, credentials, and object (table, row, view, stored procedure) level security. When you let someone in, what are they going to be allowed to do once connected?

There is also, in some cases, the requirement to keep track of who viewed or changed what, and when did they do that?

3. Capacity

Does your database system have the needed disk space to do what it needs to do? Backups should be stored on drives separately from the databases themselves for two reasons:

If the drive gets corrupted, it is taking the backups down along with the database itself.
If the backup runs out of space, you don’t want to stop the server and databases from doing their thing.

To establish a stable SQL server environment, you should have sufficient space for the database to grow for the next 6 to 12 months without concern. This should be reevaluated every year as part of a standard IT systems review. Likewise, as databases grow, the backup files grow as well. Further, the most recent backup should not be the ONLY backup. A standard practice is to have at least a few full database backups readily available — going back, for example, one, two, and three days ago, one week ago, and one month ago.

4. Future proofing

Observant readers will realize this should be the last subject covered, but I am saving that spot for “Performance,” as it is quite extensive.

Future proofing involves making sure the server and databases will continue to operate smoothly over the next five to ten years. Future proofing is a key component to a stable SQL server environment.

This involves, among other things, OS and server patching, eventual upgrades, addressing evolving security threat vectors, not getting locked into an antiquated technology or third-party add-on, and regular review of the drive capacities. Almost as important, it involves financial and time budgeting for each aspect.

5. Performance

Performance tuning involves getting the most out of the physical resources allocated to SQL Server. To “make it go faster,” many times people simply add more resources to the server. While no database administrator ever complained about having too much RAM, too many processors, or a storage system that was too fast, all those things cost money. The goal of performance tuning is to make it go faster with what you have on hand already.

To that end, there are a lot of aspects of SQL Server that can affect performance:

Server and database settings

SQL Server is a complex animal. There are some default settings that were fine for when SQL Server first hit the mainstream 25 to 30 years ago that were geared to the hardware capabilities at the time. But improvements both in hardware and SQL Server’s internal process have made those settings obsolete, sometimes even detrimental to good performance.

Database model and table structure

A database normalized to the proper extent is going to be the most efficient. Typically, transactional databases, like line-of-business databases for taking orders or processing invoices, will have more tables, relationships, and foreign keys than analytic databases.

This aspect is usually one that cannot be adjusted when attempting to improve database performance.

Indexing strategy

“Indexes speed up data reading, we need more of them!” “No! Indexes slow down data writing, we need fewer!” Where is the balance point? Do we have the right number of indexes? Do we have the right types of indexes?

Efficient queries

Reading data from a database usually takes one of three forms: reading data to support a front-end application, reading data to support an analytic system or report, and one-off or ad-hoc queries.

Any one of these has the potential to be written in such a way that the query consumes enough resources (CPU, RAM, data reads, data writes, etc.) that it drags down overall performance.

Table and column structure

This involves using the most efficient data type and size to store data. For example, dates with a time element should be stored in a DATETIME data type field; a field to store U.S. state codes need be only two characters in size, city maybe 100, and ZIP code no more than 10.

Usage patterns

Usage patterns is another aspect over which a performance tuner has no control. Telling people to simply wait until the server is more responsive before beginning their task is not an acceptable solution.

If enough users or processes are competing for the same piece of data at the same time, the result is going to be locking, blocking, and in the worst case, deadlocks. The first two may manifest themselves in slow performance, possibly noticed by the end users, but not always. Deadlocks, on the other hand, result in at least one winner and one loser. The winner(s) gets to keep their data operation(s), while the loser(s) lose what they thought they were changing. In extreme cases, they can get kicked out, or lose their session and need to log in again.

Key takeaways for SQL Server stability

If your team focuses on the five major aspects of database needs out of order from Backups to Future Proofing, or if you have negated any in between, your system may be unstable.
With a moderate SQL Server environment and proper backup strategies in place, recovery objectives can be quite narrow, measured perhaps in minutes instead of hours or days.
Security reports should be developed to show who has access to what. In addition, changes to security need to be monitored.
Storage use and capacity should be reviewed at least annually, and adjustments made if needed.
Performance tuning can be an ongoing task. Changes in data volume and usage patterns means what was once an acceptable indexing or query strategy is now a performance bottleneck.
Plan to stay on top of upgrades, patches, leading practices, and general system improvements. Make sure your database systems can survive the next 5 to 10 years.

How we can help

Investing in stability can yield significant long-term benefits for your organization.

CLA’s digital team can help you rethink your business and turn challenges into opportunities with modernized systems and processes adapted for the manufacturing industry.

This blog contains general information and does not constitute the rendering of legal, accounting, investment, tax, or other professional services. Consult with your advisors regarding the applicability of this content to your specific circumstances.

Contact us

Want to learn more? Complete the form below and we'll be in touch. If you are unable to see the form below, please complete your submission here.